One way to approach the growing uncertainty around AI behavior is to investigate whether advanced models, like large language models (LLMs), genuinely pursue goals or merely simulate goal-like behavior. This distinction is crucial for AI safety, ethics, and policy, as it determines whether frustrating an AI's outputs carries ethical weight or if the system is simply reacting to inputs without internal intent.
The project would explore whether deep learning models form and pursue goals in a morally relevant way. This involves:
Different groups would benefit from or be impacted by this research:
A phased approach could include reviewing existing literature, designing experiments (e.g., adversarial goal interference tests), and applying methods to open-source or proprietary models. Key challenges include distinguishing genuine goal pursuit from data-driven mimicry and gaining access to closed models. An MVP might be a white paper outlining the framework and initial test results on smaller models.
By clarifying whether AI systems have true goals or just simulate them, this work could reshape how we design, regulate, and interact with AI—preventing misaligned systems and addressing potential ethical concerns.
Hours To Execute (basic)
Hours to Execute (full)
Estd No of Collaborators
Financial Potential
Impact Breadth
Impact Depth
Impact Positivity
Impact Duration
Uniqueness
Implementability
Plausibility
Replicability
Market Timing
Project Type
Research