Claude Opus 4 Unlocked Writing Quality Judgment
by Dan Shipper on July 17, 2025
Situation
- Every, a 15-person AI-first company, was developing Spiral, a content automation product that creates written content from documents
- The team had spent three months trying to build a complex system that could judge the quality of AI-generated writing
- Previous AI models (including earlier Claude versions) consistently gave mediocre content inflated ratings (B+ initially, then A- after revisions)
- This limitation blocked the product's development, as effective self-evaluation was critical to the workflow
Actions
- The team initially attempted to solve the problem through prompt engineering, creating templates and other workarounds
- They invested significant engineering resources (three months) trying to build a custom evaluation system
- When Anthropic released Claude Opus 4, they immediately tested its ability to judge writing quality
- They integrated Claude Opus 4 into Spiral's workflow, allowing it to:
- Create a to-do list for itself
- Generate multiple content options (e.g., three tweet drafts)
- Self-evaluate the quality of each draft
- Improve drafts before presenting them to users
Results
- Claude Opus 4 demonstrated a previously unavailable "gut sense" for judging writing quality
- The product immediately became viable, eliminating the need for the complex custom evaluation system
- The team could shift from solving the evaluation problem to shipping the product
- This capability opened up new use cases where language models could serve as effective judges
Key Lessons
-
Recognize when to wait for model improvements: Sometimes the most efficient solution is to wait for model capabilities to catch up rather than building complex workarounds.
-
Identify critical capabilities for your use case: Understanding exactly what capability was missing (genuine quality assessment) helped the team recognize when a new model solved their problem.
-
Design workflows that leverage self-improvement: Building systems where AI can evaluate and improve its own work creates more autonomous and effective products.
-
Look for "gut sense" capabilities in models: The ability to make subjective quality judgments represents a significant advancement that enables new applications.
-
Be ready to pivot quickly when new capabilities emerge: Teams that closely monitor model advancements can rapidly integrate new capabilities that solve previously intractable problems.