When to Use AI in Audit. And When Not To.

AI adoption in audit is accelerating. Most conversations focus on what AI can do. Fewer focus on what AI should do.

That distinction matters. It matters for accuracy. It matters for audit trail defensibility. And it matters for the professional putting their name on the file.

In a recent session hosted by Validis and Trullion, Michael Turner (CEO, Validis) and Isaac Heller (Founder and President, Trullion) spent an hour working through the practical questions. Not the theory. The practice.

Watch the replay

The generalist versus specialist question

Every profession navigates the same debate. You could do the work yourself. You could use a generalist. Or you could bring in a specialist built for that exact task.

In audit, the stakes attached to that choice are significant. A generalist AI tool handles a wide range of tasks. A purpose-built tool handles one thing consistently, at a level no generalist can match.

The question is not which is better in the abstract. The question is which is right for each step in your workflow.

Why probabilistic matters

LLMs are probabilistic by nature. Every time you ask an LLM a question, it gives you the most likely answer based on its training. Not a guaranteed answer. A probable one.

In isolation, a high probability of accuracy feels acceptable. But audit workflows are multi-step. Each LLM step introduces its own probability of error. Across a chain of steps, those errors compound. The final output carries compounded uncertainty, and you cannot easily identify which step introduced the gap.

For a workflow where accuracy is verifiable and the stakes are regulatory, that is a real problem.

Where deterministic tools hold the line

Not every step in an audit workflow requires interpretation. Some steps require translation. Taking data from a source system and delivering it in a consistent, structured format is translation, not interpretation.

For GL extraction, the goal is to get the same trial balance, the same data, every time. Repeatable to the cent. That outcome requires a deterministic approach. An LLM running natural language queries against an accounting system will not give you that. It may miss transactions. It may interpret platform-specific rules differently across engagements. And it will not always tell you when it has missed something.

Deterministic extraction removes that variable before AI is applied to any part of the workflow that benefits from it.

What a trusted audit workflow looks like

Validis connects directly to accounting systems and delivers standardized, structured financial data. That data is deterministic. It does not vary. It can always be traced back to its source.

That data then flows into Trullion, where AI-powered audit testing can run against a trusted foundation. The audit trail has checkpoints. The accuracy is consistent. The human reviewer knows what they are reviewing.

This is the framework Michael and Isaac walked through in detail: a workflow where each tool does the job it was built for, and LLMs are applied at the steps where interpretation and judgment add value.

The cost and auditability tradeoffs

The session also covered two questions that do not get enough attention: what does LLM-based processing actually cost per test, and what does the audit trail look like when an LLM has made a decision?

Both have implications for how firms approach AI adoption. Isaac shared specific numbers from Trullion’s own model development. Michael addressed the regulatory dimension of data extraction and what defensibility requires.

What the next 12 months look like

The session closed with audience questions on human-in-the-loop review, MCP (Model Context Protocol) and where it helps and where it does not, and whether LLMs will eventually close the accuracy gap or whether the answer is smarter integration of specialist tools.

Michael and Isaac’s answer on the last point is worth watching in full.

Featured

One and Done - the power of integrated financial data in audit

Article