Cursor Launches Composer: First In-House Model But No Public Benchmarks

Published October 30, 2025 by Vibe Data
Cursor Composer Benchmarks AI Coding

Key Facts

Yesterday, Cursor announced Composer, their first proprietary coding model, marking a major strategic shift away from dependency on OpenAI and Anthropic. With impressive claims of 4x speed improvements and "frontier-level coding intelligence," this should be major news. But there's a problem: they haven't published a single public benchmark score.

What Cursor Claims About Composer

According to Cursor's blog post, Composer delivers:

These are bold claims for a brand-new model from a company best known for their IDE, not model development.

The Benchmark Transparency Problem

⚠️ Zero Public Benchmark Scores

Cursor has never published SWE-bench scores for Composer (or any previous model). Instead, they rely entirely on a proprietary internal evaluation called "Cursor Bench" - consisting of real agent requests from their own engineers.

This makes independent verification impossible.

Every other major coding model publishes performance on standard benchmarks:

Model SWE-bench Verified Score Public Data
Claude 4.5 Sonnet 70.6% ✅ Yes
GPT-5 65.0% ✅ Yes
Gemini 2.5 Pro 68.5% ✅ Yes
Cursor Composer Not published ❌ No

SWE-bench scores as of October 2025 - data collected from swebench.com

Why This Matters

Benchmark transparency isn't just about bragging rights - it's about trust and verification:

1. Independent Verification

Public benchmarks like SWE-bench allow researchers, developers, and competitors to verify performance claims. Internal benchmarks can be (intentionally or unintentionally) biased toward the model's strengths.

2. Apples-to-Apples Comparison

Without standard benchmarks, developers can't compare Composer to Claude, GPT, or Gemini models. "4x faster" is meaningless without context about what it's being compared against.

3. Accountability

When companies only publish internal benchmarks, there's no accountability if real-world performance doesn't match claims. Public benchmarks create accountability through community scrutiny.

Cursor's Adoption Data

Despite the benchmark transparency concerns, Cursor has significant adoption in the AI coding space:

GitHub Repository Stats

31,526 stars on GitHub

Ranking: 3rd among AI coding tools

Data: October 30, 2025

Comparison to Competitors

Cursor's GitHub stars reflect real developer adoption, but popularity doesn't substitute for transparent performance data.

Strategic Implications

Composer represents a major strategic bet for Cursor:

Independence from Big AI Labs

By building their own model, Cursor reduces dependency on OpenAI and Anthropic. This gives them more control over features, pricing, and roadmap.

Cost Optimization

Running their own model could significantly reduce API costs at Cursor's scale, improving unit economics.

Differentiation

A proprietary model allows Cursor to offer capabilities that competitors using third-party APIs can't match.

But all of these benefits depend on Composer actually delivering on its performance claims - which we can't verify without public benchmarks.

What We're Tracking

At Vibe Data, we're monitoring several metrics around Cursor and Composer:

The Bottom Line

"Cursor's Composer launch is strategically significant - major AI coding tool going independent from OpenAI/Anthropic. But without public benchmark scores, developers have to take performance claims on faith. In an industry where verification is standard, Cursor's opacity stands out."

We'll continue tracking Cursor's adoption metrics and update this post if they publish public benchmark data.

Track AI Development Tool Metrics

Get access to adoption data, benchmark scores, and growth metrics for 50+ AI coding tools

View Live Dashboard
← Back to Blog