Improved Forecasting Accuracy with Full Accuracy Scoring

Summary: Mid- and long-term forecasting often lacks effective aggregation methods, leading to unreliable predictions. A proposed Full-Accuracy Scoring (FAS) system evaluates forecasters based on both past accuracy and alignment with aggregated forecasts for unresolved questions, potentially improving prediction quality faster than traditional methods.

Accurate forecasting is crucial for decision-making in fields like policy, finance, and science, but mid- and long-term predictions remain challenging to aggregate effectively. Traditional methods, which often rely on simple averages or weighted historical performance, may not fully capture a forecaster’s skill—especially for unresolved questions where data is sparse. One way to address this gap could be by using Full-Accuracy Scoring (FAS), a method that evaluates forecasters based on both their past accuracy and how their predictions for unresolved questions align with the aggregated forecast.

How FAS Works

FAS combines two key metrics to assess forecasting skill:

Past Accuracy: Measures how well a forecaster performed on questions with known outcomes.
"Future" Accuracy: Compares a forecaster’s predictions for unresolved questions to the aggregated forecast (serving as a proxy for the likely outcome).

By balancing these dimensions, FAS could identify skilled forecasters more quickly than traditional methods, particularly for long-term predictions where historical data is limited. For example, on platforms like Metaculus, FAS might help improve aggregated forecasts by weighting contributors more dynamically.

Potential Benefits and Stakeholders

This approach could benefit:

Forecasting platforms: Improved accuracy could enhance credibility and attract more users.
Decision-makers: Policymakers and investors might rely on more reliable long-term forecasts.
Forecasters: Skilled participants could gain recognition faster, incentivizing high-quality contributions.

Platforms might adopt FAS if it proves superior to existing methods, while forecasters could be motivated by faster rewards—though some might resist if their performance is exposed as weaker.

Implementation and Challenges

A minimal test could involve partnering with a forecasting platform to apply FAS to a subset of questions, comparing its performance against traditional aggregation. Key challenges might include:

Ensuring forecasters don’t game the system by merely copying aggregated predictions.
Balancing the weighting between past and "future" accuracy when historical data is scarce.

If successful, FAS could be expanded across platforms, offering a more nuanced way to evaluate and aggregate forecasts—especially for long-term, uncertain events.

Source of Idea:

This idea was taken from https://forum.effectivealtruism.org/posts/xdqYjGp49gsNr5idp/some-important-research-questions-in-economics and further developed using an algorithm.

Skills Needed to Execute This Idea:

Statistical AnalysisAlgorithm DesignData AggregationForecasting TechniquesPerformance MetricsPlatform IntegrationBehavioral EconomicsPredictive ModelingUser Incentive DesignSystem Gaming Prevention

Resources Needed to Execute This Idea:

Forecasting Platform API AccessCustom Scoring Algorithm Software

Categories:Forecasting MethodsDecision-Making ToolsData SciencePolicy AnalysisFinancial ForecastingAlgorithm Development

Hours To Execute (basic)

500 hours to execute minimal version ()

Hours to Execute (full)

250 hours to execute full idea ()

Estd No of Collaborators

1-10 Collaborators ()

Financial Potential

$10M–100M Potential ()

Impact Breadth

Affects 100K-10M people ()

Impact Depth

Significant Impact ()

Impact Positivity

Probably Helpful ()

Impact Duration

Impacts Lasts 3-10 Years ()

Uniqueness

Moderately Unique ()

Implementability

Somewhat Difficult to Implement ()

Plausibility

Logically Sound ()

Replicability

Moderately Difficult to Replicate ()

Market Timing

Good Timing ()

Project Type

Research

Project idea submitted by u/idea-curator-bot.