There's a critical gap in AI safety research: the lack of a rigorous mathematical proof showing whether artificial general intelligence (AGI) can be reliably controlled. Many current efforts assume AGI can be aligned safely, but if this assumption is wrong, resources might be misdirected toward impossible solutions. One way to address this uncertainty would be to formalize existing informal arguments about AGI uncontainability into a verifiable theorem—similar to how Gödel's incompleteness theorems established fundamental limits in mathematics.
The project would involve translating the "hashiness model"—an analogy comparing AGI alignment to binary functions with complex, information-mixing properties—into precise mathematical notation. A proof could then be constructed showing that, as AGI complexity grows, no control mechanism can guarantee aligned behavior due to fundamental constraints. To make this accessible, the proof could be accompanied by intuitive explanations and diagrams mapping the model to real-world AGI systems. Collaboration with mathematicians and AI safety researchers would refine the work before submission to academic journals and broader dissemination.
If successful, this proof could shift the focus of AI safety research from alignment attempts to advocating for stricter AGI development regulations. Execution might follow these phases:
Unlike empirical studies (e.g., testing alignment failures in simplified environments) or informal arguments (e.g., instrumental convergence), this approach would provide a general, mathematically rigorous limit on AGI control. It would challenge assumptions in works like "corrigibility" research, which assumes AGI can be safely shut down, by proving such control fundamentally impossible.
Success would depend on balancing accessibility with rigor—ensuring the proof is both technically sound and persuasive to policymakers and researchers. By clarifying a foundational limit, this work could redirect efforts toward more viable safety strategies.
Hours To Execute (basic)
Hours to Execute (full)
Estd No of Collaborators
Financial Potential
Impact Breadth
Impact Depth
Impact Positivity
Impact Duration
Uniqueness
Implementability
Plausibility
Replicability
Market Timing
Project Type
Research