Estimating quantities—whether economic indicators, policy outcomes, or machine learning predictions—is fraught with challenges like Goodhart’s law (where targeting a measure distorts it) and the optimizer’s curse (overconfident estimates from optimization). Current methods often lack theoretical rigor or practical mitigations for these issues, leading to flawed decisions. Exploring these problems systematically could yield frameworks or tools to make estimation more robust.
The core idea is to dissect why estimation fails and how to fix it. For example, one might develop Bayesian adjustments to counter Goodhart’s law by modeling how measurement distortions propagate through systems. Another angle could involve categorizing estimation tasks—like distinguishing forecasts of stable processes (e.g., weather) from those involving adversarial behavior (e.g., financial markets)—to match techniques to problems. A third focus might refine scoring rules for forecasting competitions to discourage gaming while encouraging accuracy. This isn’t just abstract theorizing; simulations or collaborations with forecasters could validate approaches before they’re scaled.
Existing works, like Superforecasting or Bayesian coding guides, excel in empirical tactics or technical basics but sidestep deeper issues (e.g., finite cognitive resources during Bayesian updates). Here’s how this could bridge gaps:
A minimal starting point might be a public analysis of historical forecasting failures, highlighting patterns and proposing mitigations.
The hardest sell is often convincing time-strapped professionals to adopt new methods. Early partnerships with data science teams or policymakers could ground research in real needs—like tweaking election models to resist manipulation. Open-source tools with intuitive APIs (e.g., adjust_for_goodhart(estimate)) lower adoption barriers. Over time, niche authority could attract consulting or licensing opportunities, but the primary aim would be improving estimation itself.
While challenges like abstractness or resistance exist, focusing on one high-impact problem first—say, recalibrating ML confidence intervals—could demonstrate value without overwhelming scope.
Hours To Execute (basic)
Hours to Execute (full)
Estd No of Collaborators
Financial Potential
Impact Breadth
Impact Depth
Impact Positivity
Impact Duration
Uniqueness
Implementability
Plausibility
Replicability
Market Timing
Project Type
Research