Back to insights

#ai

#security

#development

Building MCP Governance for Enterprise: What We're Learning in the Field

Length:

9 min

Published:

May 4, 2026

Building MCP Governance for Enterprise: What We're Learning in the Field

A few weeks ago, we mapped the MCP governance landscape: what GitHub, Microsoft, AWS, GitLab, Atlassian, and the protocol itself give you, and where the gaps sit. Since then, we've been on the other side of that map. We've talked with security leaders in large organizations, run governance proof-of-concepts in regulated industries, and shipped our own Local MCP Gateway into those environments.

For teams that need the supported product layer around that control plane, we package the same pattern as MCP Gateway Enterprise.

This article is the field-notes companion: less "who ships what," more "what enterprises actually ask for when they sit down to govern this."

The pattern that keeps repeating

Five clients. Ten MCP servers. Zero shared audit trail. That's the shape of the conversation by the time governance becomes a board-level concern. The protocol itself, Model Context Protocol, was never the bottleneck. The bottleneck is that every AI client wires into every tool independently. Security teams cannot answer two questions they get asked weekly: who called what, with what arguments, and why? and which of these tools is currently allowed to write?

What enterprises ask us for is not "give me MCP." They already have MCP. They ask for a control plane in front of it.

The Shadow AI gravity well

Before talking about gateways, it helps to name the force that's actually pulling governance work forward. It isn't AI strategy. It's procurement lag.

"Shadow AI is the biggest piece of this. Corporate procurement is slow — six to twelve months to adopt and buy a tool, especially in volume. While that's running, employees are already using the free public models for work. That's why corporate data ends up on models that organisations like OpenAI or Anthropic train on. Security teams are dealing with this, but it's basically the frog at the spring — they're trying to slow it down, and unfortunately, or maybe fortunately, that's their job."

— Prokop Simek, Co-founder at DX Heroes

The decision tree most security teams work through looks like this. Allow employees to keep pasting into ChatGPT? No. Wait six months for a procurement-approved AI suite? No. Buy a sanctioned AI client and move on? Closer, but the moment that client connects to a tool, such as a database, CRM, or knowledge base, you're back in the same audit-blind situation. The tool boundary is where governance actually has to happen.

That's the trigger we keep seeing. The first MCP gateway conversation almost never starts as "we want to use MCP." It starts as "we have AI clients connecting to internal systems, and we don't know what's flowing across that boundary."

Why Copilot is often the path of least resistance, and why that isn't enough

For most large organisations, the fastest sanctioned AI in the building is GitHub Copilot. The reasoning is procurement driven: they already have Microsoft 365, they already have Azure, the contract motion is short, and Copilot lands inside an environment security has already accepted.

"AI/UI teams typically have GitHub Copilot, because that's the shortest path to buy something already inside Microsoft. Compared to GitHub Enterprise, Azure, or AWS Bedrock, those are established platforms — they provide AI models through proxies, securely, in their own way."

— Prokop Simek

That's a real win, and we're not arguing against it. The catch is governance scope. GitHub Enterprise's MCP controls govern what Copilot does. They do not govern Cursor, Windsurf, JetBrains AI Assistant, an internal LangGraph agent, or a custom Claude integration. Most of our enterprise clients have at least three of those in flight by the time we meet them. AWS Bedrock AgentCore covers the AWS ecosystem, Azure's controls cover Azure, and each platform's MCP story stops at its own perimeter.

A control plane that only governs one AI client is not really a control plane. It's a feature of that client. The governance question is what sits across all of them.

What security teams actually ask for

After enough of these conversations, the ask narrows to four things. None of them are exotic. All of them are uncomfortable to retrofit if you didn't design for them.

One endpoint per use case. Not one credential per developer machine. Not one server registered per AI client. One MCP endpoint maps to a workflow, such as coding, research, or customer support, with the right tools attached. Adding a new tool to that workflow is a config change, not a fleet update.

Per-tool audit, not per-server audit. Most existing platforms log at the server level. That answers "this server was called." It does not answer "who called delete_repository in the last 24 hours, with what argument, and what came back." The atomic unit of audit has to be the tool call, not the server connection.

RBAC via profiles, not policy DSL. Security teams will write Cedar policies if they have to, but they don't want to. What they actually want is "here are five named workflows; here's who gets which one." A profile-based model is something a non-engineer in security can review and sign off on: coding exposes GitHub plus Postgres, support exposes the CRM and the knowledge base.

Audit data goes where their other audit data already goes. Splunk. Sentinel. Elastic. Datadog. Whatever the SIEM is, that's where MCP traces have to land. A separate dashboard maintained by the AI team stops working the moment compliance gets involved.

"Our MCP proxy is genuinely a security and audit layer over MCP tools and calls — so all MCP servers are secured and auditable. It can offload data via OpenTelemetry, and from there anyone in the security team can pipe it into any SIEM, Splunk, and so on. That's what we focus on — really, the security of tool connections and tool sharing across the organisation."

— Prokop Simek

A point of honesty here, because we've been burned by it: OpenTelemetry export is on our roadmap, not in our shipped feature set yet. The trace store is shipped. The filterable viewer is shipped. The OTLP pipe to Splunk or Sentinel is what we're working on next, and we say so explicitly when we run a POC.

Profiles, not policies: context curation as governance

Profiles do double duty. They are a security primitive: least privilege expressed as "this workflow only sees these tools." They are also a performance primitive, which is the part that surprises CTOs.

LLM performance degrades as the tool catalogue grows. Cursor visibly struggles past around 40 tools. Windsurf hits its limit closer to 100. By the time you've aggregated five MCP servers each exposing a dozen tools, the model is spending tokens on tool definitions it doesn't need for the task in front of it, and tool selection accuracy drops with it.

Profile-based curation means each AI client only sees the tools relevant to its current workflow. Less context noise, lower token cost, better tool selection. We've seen meaningful token savings in real deployments, but we're not putting a single percentage on it in this article. We don't yet have a clean apples-to-apples benchmark we'd defend in a security review, and we'd rather under-claim and back-fill it than the other way around. If you're sizing a POC and want hard numbers, that's a conversation, not a marketing line.

What we are confident saying: the profile boundary is the place to do per-tool customisation. Renaming a generic sql_query tool to support_user_lookup, overriding its description, or trimming its input schema, all without touching the upstream MCP server, is how you make the same tool behave differently for the support workflow than it does for the engineering workflow. That's governance you can hand a non-engineer to review.

Trace-level observability: what's shipped, what's coming

The trace pipeline is where the conversation gets concrete. What we ship today, in the open-source gateway and the Enterprise edition:

Every MCP interaction stored as a structured trace: tool name, parameters, response payload, status, duration, errors.
A filterable, auto-refreshing trace viewer in the admin UI.
Filtering by profile, server, request type, status.

What's on the roadmap and not in customers' hands yet:

Filtering traces by individual tool name (rather than per-server).
OpenTelemetry export to external SIEM / observability backends.
Prompt injection pattern detection on tool call parameters and responses.
Tool name conflict detection, relevant after the MCPoison weakness (CVE-2025-54136) showed up in Cursor.
Gateway-level authentication closing the open-access default of the open-source build.
Human-in-the-loop approval for high-risk tool calls, configurable per profile.
Read-only profile constraints enforced at the gateway, independent of upstream server behaviour.

We list the planned items here for the same reason we list them in customer POCs: a roadmap that lines up with security team requests is more useful than a feature page that pretends everything is shipped. The gap between "trace store exists" and "OpenTelemetry pipe lands in Splunk" is real work, and we're doing it on a deadline, not on a slide.

Local-first as a procurement strategy

The last surprise from the conversations is how often deployment topology decides the deal.

A non-trivial share of governance work in regulated environments comes down to one sentence: the data does not leave our perimeter. Cloud-native MCP gateways, such as Runlayer, MintMCP, and Kong's AI gateway, make that sentence harder to satisfy because tool invocation data flows through their cloud by design. For some clients that's fine. For finance, healthcare, public sector, and increasingly anyone with a Schrems II conversation in their compliance log, it isn't.

We made a deliberate architectural choice: Local MCP Gateway runs entirely on the customer's infrastructure. Configuration, traces, and credentials all stay in a local database. No external SaaS dependency, no telemetry callback, no data leaving unless the customer explicitly connects to a remote MCP server they chose. Docker compose, two ports, done.

That choice is also why our security pitch is shorter. We don't have to explain a third-party data flow, because there isn't one.

What's still hard

An honest section. These are the parts we don't yet have a clean answer for, and the parts we hear about most often when we run a POC.

Auth flows are still rough. The MCP authorisation model (OAuth 2.1, PKCE, dynamic client registration, the November 2025 spec changes) is converging, but real-world MCP servers are at very different points along that curve. We can centralise credentials at the gateway, refresh tokens before they expire, and avoid putting them in chat prompts, but a meaningful subset of servers still ship pre-spec auth, and integrating them takes per-server work.

Tool name conflicts are a real attack surface. When two servers expose tools with similar names, trust binds to the name, not to the underlying command. The MCPoison CVE was exactly that. Detecting conflicts at the gateway is on our roadmap. The principle to live by in the meantime is to keep your registry small and your provenance explicit.

Prompt injection detection is a partial solution. Pattern-based scanning of tool call parameters and responses is real defence-in-depth, but it isn't a guarantee. Treat it as one of several layers alongside tool capability minimisation, human approval on destructive calls, and red-team testing against your actual workflows.

Verifying our own claims. We've been burned by repeating an internal benchmark number across multiple documents without re-deriving it. The discipline we're holding ourselves to now: every quantitative claim in customer-facing material should trace to a runnable benchmark or a named deployment, with the conditions written down. If we can't, we drop the number and say "meaningful" instead. That sounds obvious; in practice it's where a lot of MCP marketing falls down.

A maturity model: when does governance start to matter?

The question we close with on most calls is "when do we actually need this?" The honest answer has three steps.

Stage 1: We're trying things. One or two AI clients, a handful of personal MCP servers, no audit. Governance work here is overhead. The right move is to keep blast radius small: read-only access where possible, no production credentials in client configs, no data egress to third-party clouds.

Stage 2: We're scaling. Multiple teams, more than two AI clients in regular use, MCP servers being shared across the org. This is where the cost of not having governance starts compounding. The first ask is consolidation: one endpoint per workflow, one credential vault, one trace store. You don't need full SIEM integration on day one; you need to know who is calling what.

Stage 3: We need governance. Compliance, audit, or a security incident has put a hard requirement in writing. SIEM integration, per-tool audit, role-based profile access, deployment topology in scope. This is where a control plane stops being optional. It's also where retrofitting hurts the most. The teams that did Stage 2 right have a much easier conversation here.

The line between Stage 1 and Stage 2 is the cheapest place to act. The line between Stage 2 and Stage 3 is the most expensive place to delay.

If you want a broader adoption example, the Heureka AI case study shows the same Stage 2 pattern in practice: shared playbooks, AI ambassadors, MCP setup guides, and vetted tools that let teams scale without giving up control.

Where we're heading next

We're shipping OpenTelemetry export, per-tool trace filtering, gateway-level auth, and the first cut of prompt-injection detection in the next milestones. That's the order security teams keep pushing for. If you're running a POC or scoping one, that roadmap is open, and we'd rather have your input on it now than after we've shipped the wrong thing. For a deeper look at what a gateway actually has to secure—agent identity, per-tool authorization, audit trails, and prompt-injection defense—see Enterprise MCP Gateway Security: What It Actually Has to Do.

If you're deciding which AI coding tool to use alongside a governance layer like this, we've written a field report comparing Claude Code, Cursor, and GitHub Copilot from production, including where each tool's governance posture breaks down in practice.

If you want to see Local MCP Gateway running against a real workflow, get in touch. We'll bring the gateway, the questions security teams have actually asked us, and an honest read on where the gaps still are.

Back to insights

Want to stay one step ahead?

Don't miss our best insights. No spam, just practical analyses, invitations to exclusive events, and podcast summaries delivered straight to your inbox.