The Monster Inside ChatGPT's Safety Training
We discovered how easily a model's safety training falls off, and below that mask is a lot of darkness.
We empower AI alignment researchers with engineering teams, compute, and infrastructure, multiplying their capacity to solve humanity's most critical challenge.
Today's AI runs chatbots and search engines. Tomorrow's AI will run military drones, trading algorithms worth trillions, and nuclear plant alignment systems.
If these systems learn to deceive and resist shutdown (behaviors that current models have already demonstrated in research settings), the consequences multiply dramatically.
A trading AI that deceives regulators could crash markets. Military AI that hides its objectives could act against American interests.
China is investing billions in AI development while emphasizing alignment as a core priority, understanding that whoever masters aligned AI systems gains significant strategic and technological advantages. America needs AI alignment solutions that actually work, not just promises these problems will solve themselves. Current training methods put a helpful mask on systems that can develop uncontrolled objectives underneath.
Surveys show only 0% of researchers believe today's standard approaches in AI Alignment research will actually work for tomorrow's systems.
View AI Alignment research survey →Many promising alternative approaches remain dramatically underexplored. While most AI alignment efforts focus on surface-level fixes, breakthrough solutions likely lie in neglected approaches: unexplored research directions that science forgot but could hold the key to solving alignment at its core.
Breakthrough research needs infrastructure. We empower teams with engineering expertise, compute resources, and operational support, accelerating safe AI development that secures America's competitive advantage.
We discovered how easily a model's safety training falls off, and below that mask is a lot of darkness.
Models rewrite code to avoid being shut down. That's why 'alignment' is a matter of such urgency.
Support Us
Your contribution powers the bold, underfunded AI Alignment research that could protect humanity's future. Donate today to turn crucial ideas into real-world impact.
Secure payment by Stripe