Auditable
The auditable layer for AI agents.
One typed graph sits across the whole agent lifecycle. It captures a signed record of each consequential decision, replays the call under the state that is live now, and rolls the stale action back when the decision no longer holds.
Built for the audit window: the months after a decision, when the model has moved, the file has changed, and a compliance team still has to account for the call.
PRE
Lint a declared plan for structural risk before deploy.
LIVE
Record the dependency state, replay against live state, and reverse the committed action.
POST
Rank the finished run and name the load-bearing step.
By
Yue Zhao, creator of PyOD, Assistant Professor of Computer Science at the University of Southern California.
The audit arrives after the trace is gone.
Regulated teams are putting AI agents into work that carries supervisory, clinical, financial, or claims risk. A bank uses one to clear trade-surveillance alerts. A health plan uses one to draft an adverse determination. A claims desk uses one to decide whether a file needs human review. Most calls pass through because the evidence looks ordinary.
Then one call is wrong. It is not a jailbreak or an obvious outage. The agent followed policy, called the right tools, cited the file, and returned a result that looked internally consistent. Months later the examiner asks why the action was taken. The vector store has changed, the model has been redeployed twice, and the state that mattered at decision time is no longer live.
Broker-dealer supervision
A surveillance agent clears a trading flag that should have escalated. At decision time, the customer's KYC tier was provisional, a margin exception had not posted, and the account restriction table was stale. The agent read the policy, called the account tools, and cleared the flag. Three months later, FINRA opens a supervision review and asks for the basis of the clearance.
The firm needs more than logs. It needs the dependency state the agent relied on, the ability to replay the decision under the state that is live now, and a record strong enough for a risk, compliance, or audit team to defend.
One typed graph, read before, during, and after the run.
Your logs show what the agent did; auditable shows what it relied on.
auditable is an open-source SDK for agent decisions that need a record. It turns a run into one typed graph: the plan, the tools, the data dependencies, the committed action, and the recovery rail.
Before deploy, the graph lints a declared plan for structural risk: a missing approval edge, evidence collected after the action, a policy gate placed after commit. During the run, it captures the dependency state a decision relied on, replays that decision under live state, and reverses the committed action when it no longer holds. After the run, the same graph ranks the trace and names the load-bearing step.
This is the consolidation layer: a system of record for consequential agent decisions. It replaces a pile of disconnected logs, evals, and guardrails with one representation that engineers, operators, and reviewers can all read.
The evidence is public. In GRADE, across six public agent corpora, the dependency layer predicts which runs fail at ROC-AUC 0.805, where run length carries no signal. The execution layer localizes the faulting step at Top-3 0.614.
A Small Family for Agent Auditing
AuditableBench
The benchmark and evidence layer for agent auditing. In development.
awesome-auditable-ai
A curated reading list on agent reliability and auditing, at github.com/yzhao062/awesome-auditable-ai.
GRADE
The research method the typed graph is built on, published at arXiv:2606.22741.
Trusted Infrastructure With a Paper Trail
42M+
PyOD downloads
9.8K+
PyOD GitHub stars
Used at OpenAI, Amazon, Walmart, Databricks, Apache Beam, and the European Space Agency.
Recommended in the US DoD CDAO Generative AI Responsible AI Toolkit.
PyOD is the credibility base for this work. Yue Zhao created it as an open-source anomaly detection library that production teams could adopt, and its role here is a record: reproducible tools, practical APIs, and evidence quality at scale.
Auditable applies the same engineering posture to agent decisions. The object changes from an anomaly score to a consequential action, but the standard stays the same. The system should be inspectable, reproducible, and clear enough to support a later review.
That work builds on a decade of anomaly detection and trustworthy machine learning, including TrustLLM. The common thread is auditability as an engineering constraint, from model behavior to the decisions an agent commits.
About Yue Zhao
Yue Zhao is the solo founder of Auditable, Inc. and an Assistant Professor of Computer Science at the University of Southern California, where he leads the FORTIS Lab.
His research agenda is AI auditing: methods, benchmarks, and open-source tools that make AI systems inspectable, safe, and accountable. Auditable carries that agenda into agent systems that take actions in regulated work.
Auditable is supported by the Foresight Institute, with research operations in San Francisco.
Full bio and publication list at yzhao062.github.io.
contact@auditable.run
Working with AI agents in regulated workflows (financial services, healthcare, claims handling, compliance review)? Reach out.