Back to insights

#ai-tooling

#engineering-effectiveness

#governance

Claude Code vs Cursor vs Copilot: a field report from a team running all three

Length:

9 min

Published:

June 5, 2026

Claude Code vs Cursor vs Copilot: a field report from a team running all three

Same model, different rooms

In late May 2026, GitHub shipped Claude Opus 4.8 inside Copilot. The same backbone now powers Anthropic's native CLI and Microsoft's IDE wrapper. A week earlier, GitHub published a four-phase AI-adoption cohort framework. The procurement question is shifting from "which AI tool" to "which team phase".

For us, that timing matters. DX Heroes ships production code with all three tools, across very different stacks and clients. We watched the same model behave like a different product depending on the wrapper. We watched mid-sized teams burn six figures on a Copilot rollout that did not match their inner loop. We watched a senior engineer hand-roll a workflow that stitched Claude Design, Cursor, and a real type system into something faster than any single-vendor demo.

This is what we learned. Not a feature matrix; those age in weeks. A failure-mode map: where each tool wins, where it breaks, and which team it actually fits today.

Three tools, three philosophies

Claude Code is terminal-native and agent-first. Anthropic-only model lineup. Designed for long-running tasks, large refactors, custom skills, and multi-step agent flows that humans check at boundaries. It treats your shell, your repo, and your filesystem as first-class context.

Cursor is IDE-native and multi-model. Built around inner-loop speed: the editor controls context, the diff is the unit of work, and the model is whoever you pick today. Best for prototype-to-production loops and the design-to-code path, where structured context is already in the editor.

GitHub Copilot is IDE-native, multi-model (now including Claude Opus 4.8), and tightly bound to the GitHub estate: issues, pull requests, Actions, and Copilot Memory. It wins on integration depth with the platform most enterprises are already on.

The reality on a single engineer's machine is often all three at once. As Prokop on our team put it bluntly: "I use Claude Code and Codex CLI a lot, and I typically run them inside Cursor. Cursor itself I barely use anymore. For me it is the editor surface." The wrappers are not in competition for the same minute of work; each fills a different role in a workflow.

How they fail differently

Claude Code: the model loops when the task is loose

Claude Code's failure mode is rarely subtle. With autonomous tool calls, agent mode, and a long context window, it will happily spend hours on an under-scoped prompt and return very little.

The cleanest example we have right now is from Jakub on our team, on Claude Code 1.9659.3 with Opus 4.8:

"Claude 4.8 cycling in Claude Code 1.9659.3: at higher reasoning-effort settings the model loops inside reasoning and fails to call the connected MCPs. It happens even on relatively simple tasks that Sonnet completes cleanly."

A specific version, a specific failure mode, a known-good baseline. That is the failure profile to plan around, not "Claude Code is bad". It looks like this: bigger model, higher reasoning effort, more tool surface, less discipline on inputs → expensive nothing.

The fix is task-shape. Smaller plans, explicit boundaries, human checkpoints at every meaningful state change. Several engineers on the team who only use Claude Code for narrow, well-defined work simply never see the looping at all. Miloš said it plainly: he sticks to smaller tasks where looping cannot get a foothold. The failure mode is real, but it is task-scope, not tool.

A secondary friction, worth naming: portable skills. Prokop again, on the gap between Claude Code's CLAUDE.md and the cross-tool AGENTS.md convention other tools are coalescing on:

"I still do not understand why they only support CLAUDE.md and not AGENTS.md."

It is a small thing, until you try to use the same agent definition across Claude Code, Cursor, and Copilot. Then it is a tax.

Cursor: the IDE controls the context, and the magic stops when you leave it

Cursor works because the IDE controls context. The diff is the unit of work, the editor knows your open files, the LSP feeds types into prompts, and the model never has to guess what "the codebase" means. That is exactly where Cursor breaks: when work leaves the IDE.

Terminal-heavy infra work, CLI-driven agents, CI pipelines, long-running multi-file orchestration: Cursor is not in the room. The right move there is to switch wrappers, not to fight Cursor. Most of our senior people do exactly that: Cursor for the inner loop, Claude Code for the agent loop.

Cursor's strongest current play, in our experience, is the design-to-code path. Matyáš walked us through his workflow:

"Claude Design is great mostly because there is real UX expertise prompted into the background. I can export to HTML/JSX and paste it into the IDE, where I have my real tech stack set up. Then I tell the agent to translate the design into that stack. That is a much richer context than describing it in words."

That is Cursor at its best: structured upstream context (a designed UI), a real codebase the editor already understands, and a model translating between the two. Matyáš ships from wireframe to clickable prototype to first working version faster than any monolithic generator we have tried.

The lesson generalises. Cursor is not a magic IDE; it is a context-engineering surface. The more pre-structured the input, the better the output. Articulating your tech stack and your design intent into the editor is the workflow, not asking the model to invent them.

GitHub Copilot: the pricing story is the procurement story

Copilot has closed an enormous feature gap over the last twelve months. Agent mode, Memory, repository-scoped persistence, MCP tooling, AI-adoption cohorts in the API. The wrapper is a real product now.

The honest enterprise failure mode is on the commercial side, not the technical one. The discount curve that made Copilot attractive at 50+ seats has been steepening for premium models. As of June 2026, GitHub adjusted how Opus-class models consume usage units inside Copilot. The change was a meaningful jump in multipliers for the heaviest models, hitting hardest on teams that adopted early. As Prokop framed it:

"GitHub Copilot was pricing those models very cheaply, and now they are raising prices because they have already locked enterprises into their ecosystem."

We are not chasing exact multipliers here; GitHub iterates pricing fast. The pattern matters: per-seat headline pricing is no longer the full story. Memory, agent features, and Opus-class models push usage into different units, and the procurement math needs redoing whenever any of those ship.

Where Copilot genuinely wins is governance trust inside the GitHub estate. Prokop, who has sat in those rooms:

"From an enterprise standpoint, GitHub Copilot was the obvious choice because Microsoft already has the enterprise relationship. Buyers automatically trust governance from Microsoft more than from another proprietary tool from Anthropic or OpenAI."

That is not a technical comparison. It is a procurement reality. For a 200-developer organisation already standardised on GitHub Enterprise, "another AI vendor" is a different conversation from "expand Copilot".

Same model, different rooms: what actually changes

With Claude Opus 4.8 now inside both Claude Code and Copilot, you can directly compare how the wrapper shapes the same backbone. Two observations from our team have been hard to unsee.

First, memory is a wrapper-level feature, not a model-level one. Jakub:

"Copilot is the only runtime where memory actually works out of the box. It works so well that I now improve my Copilot agents, defined via .agent.md, based on memory analysis: I look at what problems it stored in memory and try to address them directly in the agent definition."

The same model, on the other side of a different wrapper, has no comparable loop. That is not a model property; it is a product property.

Second, context surface area shapes which work is even tractable. Claude Code gives you a full skill and agent surface; you can hand it your shell, your filesystem, custom MCPs, and reusable skills. Copilot binds you tightly to a repository scope, GitHub Memory, and the IDE's view of the world. For a refactor across five repositories, Claude Code is the wrapper that matches the work. For a focused change inside a repository where the GitHub conversation history is part of the context, Copilot is.

Pick the wrapper that matches how your team actually works, not the model that sounds best in the release notes.

Pick by adoption phase, not by tool

GitHub's four-phase cohort framing from late May tracks with what we see in client engagements:

Phase 1, code completion. Copilot, no real difference at this layer. Anything works.
Phase 2, assistive agents. Cursor wins the inner loop, especially with structured context. Claude Code wins multi-file refactors and shell-bound work.
Phase 3, autonomous agents. Claude Code leads today; Copilot Agent mode is closing fast and has memory on its side.
Phase 4, orchestration and multi-agent. All three are immature. This is where governance outranks tool choice, as discussed below.

Mapping a team to a phase is more useful than mapping it to a tool. A team stuck at Phase 1 will not get Phase 3 value from Claude Code, no matter how senior the seats. A team already running Phase 3 workflows informally on Cursor will pay a tax to move them sideways to Copilot without a clear governance reason.

Governance is the silent decider

For everything we just said about wrappers, the picture goes blurry the moment you ask the question every CIO eventually asks: are people using these tools safely?

Jakub spent the last few months trying to answer it from inside both Copilot and Claude Code, and the verdict was uncomfortable:

"Neither platform exposes enough data around MCP and tool usage to decide whether users are using AI safely and efficiently."

Prokop, looking at the same problem from a product-design angle, arrived independently at the same diagnosis:

"The problem is keeping the same set of MCP servers and skills across every tool. Synchronisation across tools is harder than it looks, and it has to be solved. Our MCP Gateway consolidates every MCP server and tool into one profile that we can manage through a single interface."

Two senior engineers, two different vantage points, the same conclusion: the wrapper layer is not where governance gets solved. The protocol layer is. Memory, persistence, audit, tool inventory, and identity all live below the IDE, and any serious multi-tool team eventually needs an answer at that layer. We have written more about that in Building MCP governance for enterprise and the MCP governance landscape; the short version: picking Claude Code or Cursor or Copilot is not a governance decision. It is a workflow decision. The governance decision sits underneath all three.

What we would actually do on Monday

If you are a 5–10-person product team shipping React/TypeScript and you want to ship faster this quarter: Cursor for the inner loop, Claude Code for refactors and agent work, skip Copilot. The integration overhead is not worth it at that size.

If you are a 50+ engineering organisation already inside the GitHub estate: Copilot Business as the floor, plus targeted Claude Code seats for the senior 10% doing refactor and agent work. As Jakub put it: "For truly large enterprise teams, GitHub Copilot is furthest along, mostly thanks to existing GitHub infrastructure." The reverse rollout (Cursor first, Copilot second) is almost always slower and more expensive.

If you are a regulated or governance-first organisation: do the MCP governance work before the tool rollout. Standardise on Copilot as the wrapper, defer Claude Code until your skills and agent-identity story is internally clear, and put MCP Gateway (or equivalent) between every wrapper and your real systems. The cost of ungoverned tool sprawl in regulated environments dwarfs any productivity delta between wrappers.

And whatever you pick, treat tool selection as a workflow decision, not a model purchase. Prokop again:

"It is more about what a person learns. It does not matter which IDE or CLI they use; what matters is what they learn to use and what works for them."

That sounds simple, and it is. It is also the line the loudest comparison posts in this space refuse to write.

How DX Heroes works with this stack

We ship production code with Claude Code, Cursor, and Copilot. We run workshops that meet teams at their current adoption phase. We build MCP governance for enterprises that lets you say yes to all three wrappers without losing audit trails. And we write field reports like this one, because the comparison content out there is mostly missing the part that matters: which failure mode you can live with.

If your team is mid-rollout and the failure modes here look familiar, talk to us. We have been through the same questions with engineering leaders from 10-person product teams up to 500-developer enterprises, and the answer almost always starts with the workflow, not the wrapper.

Back to insights

Want to stay one step ahead?

Don't miss our best insights. No spam, just practical analyses, invitations to exclusive events, and podcast summaries delivered straight to your inbox.