Skip to content

Codex On Call For Its Own Training

by Alexander Imbirikos on December 14, 2025

Situation

  • OpenAI's Codex team was facing a common challenge in AI model training: the need for constant human monitoring ("babysitting") of training runs
  • Training large AI models is extremely expensive and time-sensitive
  • Human engineers were required to continuously monitor various graphs and metrics during training
  • System failures or errors during training could cause significant delays and wasted resources
  • Engineers needed to be ready to pause training, fix issues, or take other actions when problems arose

Actions

  • The team began experimenting with using Codex itself to monitor its own training process
  • They set up Codex to run in a continuous loop, evaluating how training charts and metrics were changing over time
  • The system was designed to detect anomalies or issues that would typically require human intervention
  • Codex was configured to either alert humans or potentially take corrective actions when problems were identified
  • This approach leveraged the same AI system being trained to improve its own training efficiency

Results

  • Codex successfully caught "some pretty interesting configuration mistakes" during training
  • The team began seeing "glimpses of the future" where AI systems could be "on call" for their own training
  • This approach reduced the need for constant human monitoring of training runs
  • Engineers could focus on higher-value work rather than watching graphs
  • Training efficiency improved as issues could be caught and addressed more quickly

Key Lessons

  • AI systems can monitor themselves: The most advanced way to monitor complex AI systems may be to use AI itself
  • Recursive improvement potential: AI systems that can improve their own training process create a positive feedback loop
  • Resource optimization: Automating the monitoring of expensive training runs provides significant cost savings
  • Human time as the bottleneck: As Alexander noted, "the current underappreciated limiting factor is literally human typing speed or human multitasking speed"
  • Practical path to autonomy: Starting with specific, well-defined monitoring tasks provides a practical pathway to more autonomous AI systems
  • Validation remains critical: Even as AI takes on more self-monitoring, the validation of this work remains a key challenge to solve