A finance team deploys an AI agent to monitor transactions around the clock. The agent flags duplicate payments, unusual vendor activity, late-night transactions, and expense policy violations. It catches patterns human reviewers would miss at the volume the business now operates at.

Six months in, the internal audit team asks a different question: how do we know what the agent is missing?

That question is harder than it sounds. It is exactly what a SOX auditor will ask next.

The false negative problem

The false positive problem in anomaly detection is obvious. An agent that flags too many legitimate transactions creates noise that reviewers start ignoring. Finance teams notice this quickly.

The false negative problem is invisible until it is not. A false negative is an anomaly the agent saw and cleared. A duplicate payment it decided was not a duplicate. A split transaction it did not recognize as threshold-skirting. A new vendor paid within thirty days of setup that the agent treated as normal.

False negatives do not show up in dashboards. They show up in audit findings, restatements, and fraud investigations. By then the auditor asks not just what the agent caught, but what it cleared and why.

A SOX-ready audit trail must capture both sides. Not just the flags. The clears too.

What auditors are testing now

The PCAOB's amended AS 2201 takes effect for fiscal years beginning on or after December 15, 2026. It requires evidence that automated controls over financial reporting operated effectively, not just that they existed. For an anomaly agent, that means catching designed anomalies at an acceptable false negative rate.

COSO's February 2026 guidance on generative AI and internal controls requires documented decision rules, a complete audit trail of AI decisions, and evidence that human oversight was present and effective.

An agent making thousands of decisions per day produces more evidence than any manual process. The problem is verifiability. Application logs can be modified. Dashboards can be filtered. A cryptographic audit trail cannot.

Thirteen anomaly types, one audit trail

Payment anomalies include duplicate payments with the same vendor, similar amount, and close dates; payments to vendors not on the approved list; new vendor payments within thirty days of setup; split transactions structured to avoid approval thresholds; and amounts that deviate sharply from historical averages.

Timing and pattern anomalies include transactions outside normal business hours, unusual velocity such as sudden spikes from a single entity, and activity on dormant accounts silent for months.

Revenue and accounting anomalies include revenue reversals after period close, intercompany transactions without matching entries in the corresponding entity, and contra entries. Unusual credits to revenue or debits to expense often suggest manipulation rather than genuine activity.

Expense anomalies include policy violations above limits or in wrong categories, and round-number amounts that suggest estimation rather than actual expenditure.

Every flagged item is logged with subtype label, severity, anomaly score, and rationale. Every clear is logged the same way. Auditors scrutinize clears most carefully.

What the audit record needs

A CFO evaluating readiness should expect four artifacts.

The input fingerprint is a SHA-256 hash of the data the agent consumed: transaction reference, anonymized entity identifier, amount, date, and time. Transaction data stays inside the company. Sigmodx stores the hash. An auditor verifies specific inputs without seeing them.

The decision record captures flag, clear, or escalate, with anomaly subtype, severity from low through critical, anomaly score, and rationale at decision time. Critical severity items escalate to senior review regardless of queue position.

The reliability signal covers the trailing thirty days: false positive rate, false negative rate, detection precision as the fraction of flags reviewers confirmed as genuine, escalation rate, and severity accuracy. False negative rate above 5 percent triggers BLOCK; above 2 percent triggers LIMIT, requiring human approval before downstream action on flagged items above a configurable threshold. False positive rate above 15 percent triggers BLOCK; above 10 percent triggers LIMIT.

The attestation is a period-end cryptographic summary: flagged count, cleared count, escalated count, subtype breakdown, reviewer assessments. One verification string at sigmodx.com/verify lets an auditor check integrity without transaction data.

Why the false negative threshold matters most

Most anomaly tools minimize false positives because finance teams complain about noise and alert fatigue.

SOX auditors focus on false negatives. A false positive costs five minutes of reviewer time. A false negative means a duplicate payment went out, a policy violation went undetected, or a fraud pattern was cleared as normal.

Sigmodx weights false negative thresholds aggressively. Above 2 percent genuine anomalies missed, the agent moves to LIMIT under tighter oversight. Above 5 percent it moves to BLOCK with no automated clears until the signal improves.

An agent that clears too much is more dangerous than one that flags too much. The audit trail captures both. The reliability signal holds the agent accountable for both.

Three questions to ask about your anomaly agent

One: does the agent log clear decisions with the same fidelity as flags? If the trail captures only flags, it cannot answer the false negative question. Every clear needs an immutable record with input fingerprint and rationale.

Two: does the trail capture why the agent decided at decision time, not in a summary generated afterward? Auditors treat post-hoc explanations as reconstructions, not evidence.

Three: can an external auditor verify log integrity without accessing transaction data? If verification requires system access, that is a custody problem. The verification string model lets the auditor check a hash and confirm the record is intact.

Closing the loop

The team with a proper audit trail answers all three questions. Internal audit review takes an afternoon. External audit review takes an hour.

The team without those answers has a different experience. The agent ran for months. The log lives in an application database. Whether it is complete, immutable, and whether it captures clears as well as flags, those questions lack clean answers.

Under amended AS 2201, that gap becomes testable in the 2027 audit cycle for calendar-year companies.

Sigmodx provides audit trail infrastructure for AI agents in financial workflows, including anomaly detection with clear and flag logging, reliability signals, and period attestations. Pilot access for Q3 2026 is listed at sigmodx.com/enterprise.

Your AI Agent Is Flagging Financial Anomalies. Do You Know What It Missed?