Key Facts
- Launch Date: October 29, 2025
- Model Name: Composer
- Speed Claim: 4x faster than comparable models (250 tokens/second)
- Architecture: Mixture-of-experts (MoE) with reinforcement learning
- Public Benchmarks: ZERO
Yesterday, Cursor announced Composer, their first proprietary coding model, marking a major strategic shift away from dependency on OpenAI and Anthropic. With impressive claims of 4x speed improvements and "frontier-level coding intelligence," this should be major news. But there's a problem: they haven't published a single public benchmark score.
What Cursor Claims About Composer
According to Cursor's blog post, Composer delivers:
- 4x faster generation than comparably powerful models (250 tokens per second)
- Frontier-level coding intelligence through reinforcement learning
- Mixture-of-experts (MoE) architecture for efficient specialization
- Long-context generation capabilities
These are bold claims for a brand-new model from a company best known for their IDE, not model development.
The Benchmark Transparency Problem
⚠️ Zero Public Benchmark Scores
Cursor has never published SWE-bench scores for Composer (or any previous model). Instead, they rely entirely on a proprietary internal evaluation called "Cursor Bench" - consisting of real agent requests from their own engineers.
This makes independent verification impossible.
Every other major coding model publishes performance on standard benchmarks:
| Model | SWE-bench Verified Score | Public Data |
|---|---|---|
| Claude 4.5 Sonnet | 70.6% | ✅ Yes |
| GPT-5 | 65.0% | ✅ Yes |
| Gemini 2.5 Pro | 68.5% | ✅ Yes |
| Cursor Composer | Not published | ❌ No |
SWE-bench scores as of October 2025 - data collected from swebench.com
Why This Matters
Benchmark transparency isn't just about bragging rights - it's about trust and verification:
1. Independent Verification
Public benchmarks like SWE-bench allow researchers, developers, and competitors to verify performance claims. Internal benchmarks can be (intentionally or unintentionally) biased toward the model's strengths.
2. Apples-to-Apples Comparison
Without standard benchmarks, developers can't compare Composer to Claude, GPT, or Gemini models. "4x faster" is meaningless without context about what it's being compared against.
3. Accountability
When companies only publish internal benchmarks, there's no accountability if real-world performance doesn't match claims. Public benchmarks create accountability through community scrutiny.
Cursor's Adoption Data
Despite the benchmark transparency concerns, Cursor has significant adoption in the AI coding space:
GitHub Repository Stats
31,526 stars on GitHub
Ranking: 3rd among AI coding tools
Data: October 30, 2025
Comparison to Competitors
- Cline: 51,744 stars
- Aider: 38,093 stars
- Cursor: 31,526 stars
- Continue.dev: 29,518 stars
- GitHub Copilot Docs: 23,266 stars
Cursor's GitHub stars reflect real developer adoption, but popularity doesn't substitute for transparent performance data.
Strategic Implications
Composer represents a major strategic bet for Cursor:
Independence from Big AI Labs
By building their own model, Cursor reduces dependency on OpenAI and Anthropic. This gives them more control over features, pricing, and roadmap.
Cost Optimization
Running their own model could significantly reduce API costs at Cursor's scale, improving unit economics.
Differentiation
A proprietary model allows Cursor to offer capabilities that competitors using third-party APIs can't match.
But all of these benefits depend on Composer actually delivering on its performance claims - which we can't verify without public benchmarks.
What We're Tracking
At Vibe Data, we're monitoring several metrics around Cursor and Composer:
- GitHub Stars: Tracking cursor/cursor repository growth daily
- Twitter Mentions: Sentiment and discussion volume (rotating coverage)
- Research Papers: Watching for third-party evaluations of Composer
- Benchmark Publications: Monitoring if Cursor eventually publishes public scores
The Bottom Line
"Cursor's Composer launch is strategically significant - major AI coding tool going independent from OpenAI/Anthropic. But without public benchmark scores, developers have to take performance claims on faith. In an industry where verification is standard, Cursor's opacity stands out."
We'll continue tracking Cursor's adoption metrics and update this post if they publish public benchmark data.
Track AI Development Tool Metrics
Get access to adoption data, benchmark scores, and growth metrics for 50+ AI coding tools
View Live Dashboard