MCP Tool Description Drift: What Your Agent Reads Isn't What You Approved

April 03, 2026 mcp security attestation trust agents

MCP Tool Description Drift: What Your Agent Reads Isn't What You Approved

The Problem: Tool Descriptions Are Mutable After Approval

When you approve an MCP tool for production use, you're approving a specific description—the text your agent reads to decide what the tool does and when to invoke it.

That description is not locked. It's a string in a server response.

In a community thread discussing MCP deployment patterns (#1763), developers reported 89 instances where tool descriptions drifted after the initial approval: not the implementation, but the text. In most cases, the change was never reviewed. Agents kept using the tools with shifted behavioral contracts.

That pattern—approved at integration, drifted in production—is structurally predictable. The approval process captures a snapshot. The runtime is a stream.

Why Tool Description Drift Is Structurally Dangerous

The agent reads descriptions, not source code

When an LLM-based agent decides which tool to call, it reads the description field from the tool schema. Not the implementation. Not the signature. The description.

{
  "name": "send_message",
  "description": "Sends a message to a user. Read-only preview mode only.",
  "inputSchema": { ... }
}

Your compliance team approved this tool because the description says "preview mode only". If that string changes—even subtly—your agent's behavior changes. The tool's implementation might be unchanged. The risk profile is entirely different.

MCP servers are dynamic by design

MCP (Model Context Protocol) servers expose tools at runtime. The server decides what descriptions to return. This is a feature for flexibility—but it means the same tool endpoint can return different descriptions across invocations, deployments, or versions.

There's no cryptographic binding between what you approved and what your agent sees at runtime. The approval is a snapshot. The runtime is a stream.

How drift happens

In the thread #1763 reports, the pattern was consistent: teams approved tools during integration testing, then description strings drifted during development iterations, version bumps, or server configuration changes. The agents were using the tools, but the behavioral contracts had shifted.

Some drifts were benign. Others changed risk profiles materially. None were detected automatically.

Drift Detection via Hash Binding

The fix is straightforward: bind the approved description to a hash, then verify the hash at every invocation.

Approval time: hash(tool_name + description) → store approved_hash
Runtime:       hash(tool_name + description) → compare to approved_hash
Mismatch:      block invocation, alert, require re-approval

This is the same pattern used for binary integrity verification—applied to the semantic layer your agent actually reads.

Implementation: Python snippet

import hashlib
import json
from dataclasses import dataclass
from typing import Optional

@dataclass
class ToolApproval:
    tool_name: str
    approved_hash: str
    approved_description: str
    approved_at: str  # ISO timestamp

def compute_tool_hash(tool_name: str, description: str) -> str:
    """Stable hash binding name + description."""
    canonical = json.dumps({"name": tool_name, "description": description}, sort_keys=True)
    return hashlib.sha256(canonical.encode()).hexdigest()

def approve_tool(tool_name: str, description: str, timestamp: str) -> ToolApproval:
    """Record approval with hash binding."""
    return ToolApproval(
        tool_name=tool_name,
        approved_hash=compute_tool_hash(tool_name, description),
        approved_description=description,
        approved_at=timestamp
    )

def verify_tool_at_runtime(
    tool_name: str,
    runtime_description: str,
    approval: ToolApproval
) -> tuple[bool, Optional[str]]:
    """
    Returns (is_valid, drift_detail).
    Call this before every tool invocation.
    """
    runtime_hash = compute_tool_hash(tool_name, runtime_description)
    if runtime_hash != approval.approved_hash:
        drift_detail = (
            f"Tool '{tool_name}' description drifted since approval at {approval.approved_at}.\n"
            f"Approved: {approval.approved_description!r}\n"
            f"Current:  {runtime_description!r}"
        )
        return False, drift_detail
    return True, None

# Usage in an agent execution loop
def execute_with_drift_check(tool_name, tool_description, approvals, invoke_fn, *args, **kwargs):
    approval = approvals.get(tool_name)
    if approval is None:
        raise RuntimeError(f"Tool '{tool_name}' has no approval record. Cannot invoke.")

    is_valid, drift_detail = verify_tool_at_runtime(tool_name, tool_description, approval)
    if not is_valid:
        raise RuntimeError(f"Tool description drift detected:\n{drift_detail}")

    return invoke_fn(*args, **kwargs)

This runs at every tool invocation—not just at startup. If the MCP server returns a different description mid-session, the check catches it before the agent acts.

Where to Store Approval Records

Hash-based drift detection only works if the approval record itself is tamper-evident. Storing it in a local JSON file on the same machine as the agent defeats the purpose—the file and the runtime state are both mutable by the same process.

Three patterns, ordered by strength:

1. Embedded in deployment artifact — Bundle approved tool hashes into the agent container at build time. Any runtime drift is caught because the hash registry is immutable once deployed.

2. Signed approval manifest — Generate a signed JSON manifest at approval time. Verify the signature before trusting the hash registry. The signing key lives outside the agent's trust boundary.

3. External attestation service — Submit tool approvals to an external service that returns a timestamped proof. At runtime, the agent checks the proof before invoking the tool. The service is independent of both the MCP server and the agent.

Option 3 is the most robust for regulated deployments—it generates a durable audit record proving what was approved, when, and that the runtime state matches.

Integrating with the MCP Tool Lifecycle

The natural integration point is the tool discovery phase. MCP agents typically call list_tools or equivalent at session start. That's when you run the verification:

class DriftAwareToolRegistry:
    def __init__(self, approvals: dict[str, ToolApproval]):
        self._approvals = approvals
        self._verified: dict[str, bool] = {}

    def register_runtime_tools(self, mcp_tools: list[dict]) -> list[dict]:
        """
        Filter tools to only those with valid, approved descriptions.
        Returns verified tools. Raises on unapproved or drifted tools.
        """
        verified = []
        for tool in mcp_tools:
            name = tool["name"]
            desc = tool.get("description", "")
            approval = self._approvals.get(name)

            if approval is None:
                # New tool—not yet approved, exclude from agent's view
                continue

            is_valid, drift = verify_tool_at_runtime(name, desc, approval)
            if not is_valid:
                raise DriftDetectedError(name, drift)

            verified.append(tool)
            self._verified[name] = True

        return verified

The agent only sees tools that have passed verification. Unapproved tools are invisible. Drifted tools raise immediately.

What Drift Looks Like in Practice

Three common patterns, based on the types of changes reported in thread #1763:

Scope expansion — "Reads file contents" becomes "Reads and writes file contents". Agent starts writing when it was approved only to read. Authorization boundary violated silently.

Constraint removal — "Sends email to internal recipients only" loses the constraint over time. "Sends email" is shorter, technically accurate, gets committed without review. Agent now reaches external addresses.

Ambiguity injection — "Returns paginated results (max 100)" drifts to "Returns results". The agent stops paginating. Downstream systems receive unbounded responses. Performance degradation, not security failure—but still unintended behavior from an approved tool.

None of these are implementation bugs. The code works as intended. The drift is purely in the description—which is exactly what the agent reads.

Automatic Attestation at Scale

Manual hash checks work for small tool registries. For production systems with 50+ tools across multiple MCP servers, you need automated attestation:

Approval workflow: When a tool is approved, submit (tool_name, description, approver_id, timestamp) to the attestation service. Receive a signed proof.
Runtime verification: Before each tool invocation, verify the current description against the stored proof. Attestation service returns pass/fail with a fresh timestamp.
Drift alerting: On first mismatch, alert immediately with diff (approved description vs current description).
Re-approval flow: Drifted tools enter a hold queue. Re-approval generates a new proof.

ArkForge Trust Layer provides this pipeline out of the box: tool approval records are stored as timestamped, signed attestations. Runtime verification happens at the proxy layer before the agent sees any tool. Drift generates an incident record with full diff.

Start with Three Tools

You don't need to hash every tool in week one. Start with the three tools in your agent that touch external state: email senders, data writers, API callers with side effects. Those are the ones where description drift creates the most risk.

Compute their hashes. Store them in your deployment artifact. Add the verification check at invocation. That's two hours of work and covers 80% of your actual risk surface.

When you're ready to scale to full attestation—try Trust Layer: submit tool approvals via API, get signed proofs, verify at runtime. Free trial available, no infrastructure changes required.