Codex On Call For Its Own Training
by Alexander Imbirikos on December 14, 2025
Situation
- OpenAI's Codex team was facing a common challenge in AI model training: the need for constant human monitoring ("babysitting") of training runs
- Training large AI models is extremely expensive and time-sensitive
- Human engineers were required to continuously monitor various graphs and metrics during training
- System failures or errors during training could cause significant delays and wasted resources
- Engineers needed to be ready to pause training, fix issues, or take other actions when problems arose
Actions
- The team began experimenting with using Codex itself to monitor its own training process
- They set up Codex to run in a continuous loop, evaluating how training charts and metrics were changing over time
- The system was designed to detect anomalies or issues that would typically require human intervention
- Codex was configured to either alert humans or potentially take corrective actions when problems were identified
- This approach leveraged the same AI system being trained to improve its own training efficiency
Results
- Codex successfully caught "some pretty interesting configuration mistakes" during training
- The team began seeing "glimpses of the future" where AI systems could be "on call" for their own training
- This approach reduced the need for constant human monitoring of training runs
- Engineers could focus on higher-value work rather than watching graphs
- Training efficiency improved as issues could be caught and addressed more quickly
Key Lessons
- AI systems can monitor themselves: The most advanced way to monitor complex AI systems may be to use AI itself
- Recursive improvement potential: AI systems that can improve their own training process create a positive feedback loop
- Resource optimization: Automating the monitoring of expensive training runs provides significant cost savings
- Human time as the bottleneck: As Alexander noted, "the current underappreciated limiting factor is literally human typing speed or human multitasking speed"
- Practical path to autonomy: Starting with specific, well-defined monitoring tasks provides a practical pathway to more autonomous AI systems
- Validation remains critical: Even as AI takes on more self-monitoring, the validation of this work remains a key challenge to solve