Skip to content

Claude Opus 4 Unlocked Writing Quality Judgment

by Dan Shipper on July 17, 2025

Situation

  • Every, a 15-person AI-first company, was developing Spiral, a content automation product that creates written content from documents
  • The team had spent three months trying to build a complex system that could judge the quality of AI-generated writing
  • Previous AI models (including earlier Claude versions) consistently gave mediocre content inflated ratings (B+ initially, then A- after revisions)
  • This limitation blocked the product's development, as effective self-evaluation was critical to the workflow

Actions

  • The team initially attempted to solve the problem through prompt engineering, creating templates and other workarounds
  • They invested significant engineering resources (three months) trying to build a custom evaluation system
  • When Anthropic released Claude Opus 4, they immediately tested its ability to judge writing quality
  • They integrated Claude Opus 4 into Spiral's workflow, allowing it to:
    • Create a to-do list for itself
    • Generate multiple content options (e.g., three tweet drafts)
    • Self-evaluate the quality of each draft
    • Improve drafts before presenting them to users

Results

  • Claude Opus 4 demonstrated a previously unavailable "gut sense" for judging writing quality
  • The product immediately became viable, eliminating the need for the complex custom evaluation system
  • The team could shift from solving the evaluation problem to shipping the product
  • This capability opened up new use cases where language models could serve as effective judges

Key Lessons

  • Recognize when to wait for model improvements: Sometimes the most efficient solution is to wait for model capabilities to catch up rather than building complex workarounds.

  • Identify critical capabilities for your use case: Understanding exactly what capability was missing (genuine quality assessment) helped the team recognize when a new model solved their problem.

  • Design workflows that leverage self-improvement: Building systems where AI can evaluate and improve its own work creates more autonomous and effective products.

  • Look for "gut sense" capabilities in models: The ability to make subjective quality judgments represents a significant advancement that enables new applications.

  • Be ready to pivot quickly when new capabilities emerge: Teams that closely monitor model advancements can rapidly integrate new capabilities that solve previously intractable problems.