You Authorized an Agent. That Doesn't Prove It Ran.

June 08, 2026 ai-governance agentic-systems compliance cryptography eu-ai-act agent-identity trust

Authorization records tell you what you decided. They don't tell you what ran.

This distinction is invisible in most agent deployments — until a regulator asks the wrong question. Not "was this agent authorized to execute?" but "can you prove that the authorized agent is the one that actually executed?" Those two questions have different answers with different evidence requirements. Most teams have evidence for the first. Almost none have evidence for the second.

Two Events, One Record

Consider what happens when you approve an agentic workflow. You record a decision: on this date, this agent configuration, bound to these tools, with these permissions, was authorized. The record exists. It's signed. It's timestamped.

Then the agent runs. Somewhere between authorization and execution, a series of things happen that your authorization record says nothing about: the model version that loads, the system prompt that the orchestrator constructs at runtime, the tool descriptions that the MCP server delivers, the context window content that shapes the actual inference. None of those are in the authorization record. They can't be — they don't exist at authorization time.

This creates a structural gap:

Authorization record ──────────────────────────────── Execution
       ↑                                                   ↑
 "Agent X, version Y,                         "Something ran and produced
  tools T1/T2/T3,                              this output."
  approved by Alice"
                        ← unverified gap →

The authorization record doesn't bind to the execution. The execution record doesn't bind to the authorization. You have two isolated facts and no proof they describe the same event.

Why the Gap Is Larger Than It Looks

The structural gap is real in single-agent deployments. It compounds as systems grow.

Model updates. Your authorization record captures agent version 1.3. The model provider updates 1.3 to 1.3.1 three days later — a "non-breaking" patch that changes safety filtering behavior. The next execution runs 1.3.1. Your authorization record still says 1.3. You can't prove the entity that ran matched the entity you authorized.

Prompt evolution. System prompts change through normal engineering iteration: improved formatting, edge case handling, tone calibration. Each change alters the effective agent identity — the agent that runs with the updated prompt is a different agent in any meaningful compliance sense. Most teams don't have authorization records that capture prompt hash. The authorization record becomes a proxy for a configuration that no longer exists.

Multi-provider routing. You authorize Claude for task A. A router falls back to Mistral when Claude rate-limits. The fallback is logged. It's not bound to the authorization. Regulators reviewing the authorization record see Claude. The execution trace shows Mistral. There's no documented link between the two — just a gap where the routing decision happened.

MCP tool substitution. Tool descriptions in MCP servers are dynamic. The tool your agent called at authorization review time can have a materially different description at execution time — without any version change in your registry. Your authorization record says you approved tool X. The agent called tool X. What the agent was told tool X does is different from what you reviewed. The authorization holds. The execution diverges.

Free tier: 500 proofs/month, no credit card required.

See plans & get free key

What EU AI Act Articles 13 and 14 Actually Require

Article 13 requires that high-risk AI systems provide "relevant information" to allow users to "correctly interpret the AI system's output." Article 14 requires human oversight measures sufficient to "detect and address as soon as possible signs that the AI system may not be functioning as intended."

Both requirements assume you can characterize what the AI system actually did at execution time — not just what it was authorized to do. They assume you can compare expected behavior against actual behavior. You cannot do that comparison without execution-time binding between the authorization record and the execution record.

The compliance gap isn't that you're missing documentation. It's that the documentation you have answers the wrong question. "We authorized this agent" answers a governance question. "The authorized agent is what ran" answers a verification question. EU AI Act Articles 13 and 14 require verification, not governance.

What Cryptographic Execution Binding Looks Like

The mechanism has been solved in adjacent domains. TLS certificates bind a public key to a domain identity via a chain of trust. Git commits bind content to authorship via cryptographic hash chains. JWT tokens bind claims to a signing authority with a verifiable signature. The pattern is the same: you bind the identity claim to the execution evidence at the moment of execution, not retrospectively.

For agentic systems, execution binding requires three things at inference time:

1. Identity commitment. At execution start, the agent commits to its own identity: model ID, version, provider endpoint hash, system prompt hash, tool manifest hash. This commitment is cryptographically signed. It can't be modified retroactively — changing any input invalidates the signature.

2. Authorization reference. The execution record embeds a reference to the authorization it's executing under. Not a human-readable label — a cryptographic pointer that allows anyone to verify that the authorization record and the execution record describe the same configuration.

3. Output binding. The agent's output is committed alongside the identity and authorization reference. Anyone with the execution record can verify: this output was produced by this identity, under this authorization, at this time. The three facts are inseparable.

What this looks like as a verifiable execution record:

{
  "execution_id": "exec_7f3a...",
  "timestamp": "2026-06-08T09:00:00Z",
  "identity": {
    "model_id": "claude-sonnet-4-6",
    "provider_did": "did:web:api.anthropic.com",
    "system_prompt_hash": "sha256:4a1b...",
    "tool_manifest_hash": "sha256:9c2e..."
  },
  "authorization_ref": "sha256:2f8d...",
  "output_hash": "sha256:b73a...",
  "execution_jws": "eyJhbGci..."
}

The execution_jws field is the signed commitment. A verifier with the public key can confirm that all fields were present at signing time — none were added after the fact.

The Retroactive Reconstruction Problem

Without execution binding, compliance evidence is reconstructed after the fact. When an incident occurs 60 days later, your team pulls provider logs, LangSmith traces, database writes, and builds a narrative. That narrative is plausible. It is not verifiable.

The distinction matters in two scenarios. First, regulatory audits under EU AI Act Article 9 require documentation that "accurately reflects reality." Reconstructed narratives are documentation. They don't accurately reflect reality — they reflect a best-effort reconstruction of reality from available logs. The regulation's intent is verifiable evidence, not plausible explanation.

Second, legal liability. When your agentic system causes a financial loss and the question becomes "which model version ran under which authorization" — reconstructed logs are contested. The other party's lawyers will question whether your logs were selectively retained, whether your reconstruction is accurate, whether alternative explanations fit the same evidence. Execution binding eliminates the contestation because the evidence is self-verifying.

Where to Start

The full binding stack requires changes at inference time — specifically, something that can sign at the point of output generation. That's a harder change than adding logging.

The practical starting point is authorization-side commitment: when you create an authorization record, cryptographically commit to every configurable element that will vary at execution time. System prompt hash, model version, tool manifest state. This doesn't prove what ran, but it establishes a ground truth to compare against. You can then verify execution records against the authorization commitment after the fact.

The next step is execution-side commitment at the model boundary — either through an intercepting proxy that signs inputs and outputs before and after the model call, or through a gateway that performs the signing on behalf of the agent. Trust Layer provides both: a signing gateway that generates execution proofs at inference time, bound to authorization records using the same cryptographic commitment scheme.

The gap between authorization and execution is where most AI governance frameworks stop. It's also where most compliance incidents start. Independent execution proof is what transforms authorization governance from a policy exercise into a verifiable compliance practice.


Prove it happened. Cryptographically.

ArkForge generates independent, verifiable proofs for every API call your agents make. Free tier included.

Compare plans → or get free key directly