AI Liability Is Here: Why Internal Logs Won't Protect You in Court
AI Liability Is Here: Why Internal Logs Won't Protect You in Court
UnitedHealth is facing multiple class action lawsuits over algorithmic insurance claim denials. The core allegation: an AI system called nH Predict generated coverage refusals at scale, overriding physician recommendations, with a documented error rate high enough to be cited in the complaint. Patients were denied care. Some died. Shareholders are now being sued too, on the theory that the company knew about the error rates and disclosed nothing.
This is not a hypothetical. It is documented litigation, moving through federal courts now.
On the other side of the Atlantic, the EU's revised Product Liability Directive was adopted in 2024. Transposition deadline: December 2026. It explicitly includes software and AI systems in the definition of "product." Strict liability -- no proof of negligence required, only proof of defect and damage -- applies if a system is found defective.
These two developments have the same core implication for anyone building or deploying AI agents: when something goes wrong, you will need to prove what your system actually did. Not what your logs say it did. What it did.
The distinction matters more than most engineers realize.
The Self-Attestation Problem
When you ask a defendant to produce evidence in litigation, the first question opposing counsel asks is: who created this evidence, and who controls it?
Your application logs answer both questions the wrong way. Your agent wrote them. Your infrastructure stores them. Your engineers can access them. This is not inherently dishonest -- your logs may be perfectly accurate -- but it makes them self-attestation: claims you are making about yourself, with no independent witness.
Courts and regulators treat self-attestation differently from third-party evidence. A notarized document is harder to challenge than a handwritten note you produced yourself. Not because the handwritten note is necessarily false, but because the structure of its creation gives the other side no independent verification point.
In the UnitedHealth context, when the plaintiffs ask "show us what the AI actually decided for each denied claim, with the data it used, and when exactly that decision was made," the company faces a structural problem. Its systems generated the denials. Its systems logged the denials. The same infrastructure that made the decisions also certifies what those decisions were.
This is not a compliance failure. It is an architectural one.
What Litigation Actually Requires
Evidentiary standards in civil litigation are not the same as your SIEM checklist. When an expert witness is retained to analyze your AI system's behavior, they will look for:
- Timestamp integrity: was this record created when you say it was, certified by a party with no stake in the outcome?
- Payload completeness: does the record capture what the system received as input, not just what it claims to have received?
- Chain of custody: has this record been accessible to parties who could modify it since the relevant event?
- Independence of attestation: who is vouching for this evidence, and what is their relationship to you?
Your internal logging infrastructure fails on at least three of these four counts. The timestamps are self-reported. The payload completeness is whatever your logging code captured. The chain of custody runs through your own infrastructure. And you are the sole attestor.
This is not a theoretical gap. The UnitedHealth plaintiffs are asking for exactly this evidence. Discovery will reveal whether it exists.
Free API key, 500 proofs/month, no card required.
Get my free API keyThe EU Adds Strict Liability
The US cases require proving negligence -- some standard of fault. The EU Product Liability Directive changes that calculus significantly for European deployments.
Under strict liability, the plaintiff does not need to prove you were negligent. They prove: (1) the product was defective, (2) they suffered damage, (3) the defect caused the damage. That is it.
"Defective" for an AI system means the system did not provide the level of safety the public is entitled to expect. An AI that denies insurance claims at a rate dramatically higher than human reviewers, without producing independently verifiable records of its reasoning, is a plausible candidate for "defective" under this standard.
The burden then shifts to the producer. Article 10 of the Directive provides a defense if the producer can prove the defect did not exist when the product was placed on the market. Proving that requires evidence. The same evidence problem applies.
What changes under PLD compared to US tort litigation: European companies cannot rely on the difficulty of proving negligence as a firewall against liability. The structural audit trail gap becomes a direct exposure.
Why the Architecture Needs to Change, Not Just the Policy
The typical response to this problem is a logging policy. "We will log all AI decisions with the decision inputs, retain for seven years." This is better than nothing. It does not solve the self-attestation problem.
A policy does not change who writes the logs, who stores them, or who can access them. A policy documents your intention to log accurately. It does not create independent attestation.
What closes the gap is a structural separation: the evidence of what your AI did is created by a party who is not your AI, stored in a system your AI cannot write to, and verifiable by anyone without contacting you.
RFC 3161 timestamps (the same standard used for code signing) allow an external Time Stamping Authority to certify, cryptographically, that a specific hash existed at a specific time. Sigstore Rekor is a public, append-only transparency log maintained by the Linux Foundation -- entries cannot be modified or deleted after the fact, and the public log can be audited by anyone. Ed25519 signatures bind request and response together, preventing retroactive modification of either.
Combined, these three mechanisms produce what a legal expert witness recognizes as independent third-party attestation: someone other than you, with no stake in the outcome, certified what your system did and when.
# Before: agent calls API, logs internally
response = await httpx.post("https://api.example.com/decision", json=payload)
logger.info(f"Decision: {response.json()}") # self-attestation
# After: agent calls through a certifying proxy
response = await httpx.post(
"https://trust.arkforge.tech/v1/proxy",
headers={"X-Api-Key": API_KEY},
json={"target": "https://api.example.com/decision", "payload": payload}
)
proof = response.json()["proof"]
# proof contains: SHA-256 hashes of request + response,
# RFC 3161 timestamp, Ed25519 signature, Sigstore Rekor log entry
# All verifiable without contacting ArkForge
What This Does Not Do
It is worth being explicit about what cryptographic receipts do not provide.
They do not make your AI make better decisions. If your model has a high error rate, receipts do not change that. They document the error rate accurately, which may help or hurt you depending on what the underlying decisions actually were.
They do not prevent litigation. If your system caused harm, evidence of what it did will not make the harm disappear.
What they provide is a clean separation between "what did your system do" (which you can now prove independently) and "was what it did correct" (which remains a separate, contested question). That separation is the difference between a dispute about facts and a dispute about liability. The latter is the one worth having.
In the UnitedHealth situation, the question of whether the AI's error rate was acceptable is contested. The question of what the AI actually decided, for which patients, based on which inputs, at which timestamps -- that question should not be contested. It should be independently verifiable. If it isn't, the credibility of the entire defense is at risk.
The Timing
The Product Liability Directive transposition deadline is December 2026. EU AI Act high-risk provisions come into force August 2026. US class action litigation against AI-driven decisions is active now.
The pattern across all three: organizations that cannot produce independently verifiable evidence of what their AI systems did are structurally exposed. The exposure is not theoretical. It is in motion.
Building that evidence layer into AI deployments is significantly easier before you need it in discovery than after.
The ArkForge Trust Layer generates cryptographic receipts for AI agent actions -- RFC 3161 timestamps, Ed25519 signatures, Sigstore Rekor anchoring. Free tier: 500 proofs/month, no card. Proof spec is open source. Pricing.
Have you thought about how your organization would respond to discovery requests about AI-driven decisions? I am curious what the evidentiary standards look like from your side.
Prove it happened. Cryptographically.
ArkForge generates independent, verifiable proofs for every API call your agents make. Free tier included.
Get my free API key → See pricing