Cursor Launches Composer: First In-House Model But No Public Benchmarks

Key Facts

Launch Date: October 29, 2025
Model Name: Composer
Speed Claim: 4x faster than comparable models (250 tokens/second)
Architecture: Mixture-of-experts (MoE) with reinforcement learning
Public Benchmarks: ZERO

Yesterday, Cursor announced Composer, their first proprietary coding model, marking a major strategic shift away from dependency on OpenAI and Anthropic. With impressive claims of 4x speed improvements and "frontier-level coding intelligence," this should be major news. But there's a problem: they haven't published a single public benchmark score.

What Cursor Claims About Composer

According to Cursor's blog post, Composer delivers:

4x faster generation than comparably powerful models (250 tokens per second)
Frontier-level coding intelligence through reinforcement learning
Mixture-of-experts (MoE) architecture for efficient specialization
Long-context generation capabilities

These are bold claims for a brand-new model from a company best known for their IDE, not model development.

The Benchmark Transparency Problem

⚠️ Zero Public Benchmark Scores

Cursor has never published SWE-bench scores for Composer (or any previous model). Instead, they rely entirely on a proprietary internal evaluation called "Cursor Bench" - consisting of real agent requests from their own engineers.

This makes independent verification impossible.

Every other major coding model publishes performance on standard benchmarks:

Model	SWE-bench Verified Score	Public Data
Claude 4.5 Sonnet	70.6%	✅ Yes
GPT-5	65.0%	✅ Yes
Gemini 2.5 Pro	68.5%	✅ Yes
Cursor Composer	Not published	❌ No

SWE-bench scores as of October 2025 - data collected from swebench.com

Why This Matters

Benchmark transparency isn't just about bragging rights - it's about trust and verification:

1. Independent Verification

Public benchmarks like SWE-bench allow researchers, developers, and competitors to verify performance claims. Internal benchmarks can be (intentionally or unintentionally) biased toward the model's strengths.

2. Apples-to-Apples Comparison

Without standard benchmarks, developers can't compare Composer to Claude, GPT, or Gemini models. "4x faster" is meaningless without context about what it's being compared against.

3. Accountability

When companies only publish internal benchmarks, there's no accountability if real-world performance doesn't match claims. Public benchmarks create accountability through community scrutiny.

Cursor's Adoption Data

Despite the benchmark transparency concerns, Cursor has significant adoption in the AI coding space:

GitHub Repository Stats

31,526 stars on GitHub

Ranking: 3rd among AI coding tools

Data: October 30, 2025

Comparison to Competitors

Cline: 51,744 stars
Aider: 38,093 stars
Cursor: 31,526 stars
Continue.dev: 29,518 stars
GitHub Copilot Docs: 23,266 stars

Cursor's GitHub stars reflect real developer adoption, but popularity doesn't substitute for transparent performance data.

Strategic Implications

Composer represents a major strategic bet for Cursor:

Independence from Big AI Labs

By building their own model, Cursor reduces dependency on OpenAI and Anthropic. This gives them more control over features, pricing, and roadmap.

Cost Optimization

Running their own model could significantly reduce API costs at Cursor's scale, improving unit economics.

Differentiation

A proprietary model allows Cursor to offer capabilities that competitors using third-party APIs can't match.

But all of these benefits depend on Composer actually delivering on its performance claims - which we can't verify without public benchmarks.

What We're Tracking

At Vibe Data, we're monitoring several metrics around Cursor and Composer:

GitHub Stars: Tracking cursor/cursor repository growth daily
Twitter Mentions: Sentiment and discussion volume (rotating coverage)
Research Papers: Watching for third-party evaluations of Composer
Benchmark Publications: Monitoring if Cursor eventually publishes public scores

The Bottom Line

"Cursor's Composer launch is strategically significant - major AI coding tool going independent from OpenAI/Anthropic. But without public benchmark scores, developers have to take performance claims on faith. In an industry where verification is standard, Cursor's opacity stands out."

We'll continue tracking Cursor's adoption metrics and update this post if they publish public benchmark data.

Track AI Development Tool Metrics

Get access to adoption data, benchmark scores, and growth metrics for 50+ AI coding tools

View Live Dashboard

← Back to Blog