AI should be a dumb renderer
The default pattern for AI insights dumps data into a model and asks it to count, compare, and conclude. That's the wrong order. Precompute the numbers deterministically; let the model render and narrate them.
A language model cannot be trusted to add up a column of numbers. It can be trusted to explain, fluently and correctly, a sum that something else already computed. The whole design of a defensible AI insights feature follows from holding those two facts at once.
The default pattern does the opposite. It takes a table of structured data, pastes it into the context window, and asks the model to count the rows, compare the segments, and tell you what changed. This is a CSV in ChatGPT dressed up as a product. It demos beautifully and degrades the moment a stakeholder checks one of the numbers against the source, because the model was never counting. It was predicting what a count usually looks like.
The boundary is the product
Draw a line through the system. On one side sits everything that has a single correct answer: the aggregations, the group-bys, the deltas, the percentages, the ranking, the “this is up 12% on last quarter.” On the other side sits everything that is a judgement call about language: which of those facts matters, how to phrase it, what to lead with, how to connect three numbers into a sentence a human will act on.
The first side is arithmetic. You write it in SQL or pandas, you test it, it returns the same answer every time, and when someone disputes a figure you can point at the query. The second side is rendering. You hand the model the precomputed facts and ask it to narrate them. It is good at this in a way that no deterministic system is; it is also incapable of inventing a number it was never given, because the numbers arrive already settled.
That line is the entire architecture. Everything to the left is auditable. Everything to the right is generated. The failure mode of most AI features is putting arithmetic on the right.
What this looks like in practice
Take an analytics feature built on this split. Every claim it makes is clickable: click “revenue rose 18% last quarter” and you land on the 340 orders the figure was computed from. That guarantee is only possible because the claim was computed, not authored. The model receives a structured object that already says revenue rose 18% across these 340 orders, and its job is to turn that into a sentence and a paragraph of context. It cannot quietly say 19%, because it never held the 340 orders; it held a fact with a fixed shape. That is verification over generation: the claim is checkable because it was computed, not composed.
I saw the same boundary from the other direction at Lyssna, working on AI analysis of user-research studies. The temptation with a pile of interview transcripts is to throw the lot at a model and ask “what did we learn?” You get something that reads like a synthesis and is impossible to defend, because you cannot trace any sentence back to the moment a participant actually said it. The work that mattered was upstream: coding responses, tagging themes, counting how many participants hit a given friction point. Once those counts exist, the model writing “seven of twelve participants stalled at the pricing step” is rendering a fact, and a researcher can click through to all seven. Skip the counting and the same sentence is a plausible fabrication.
Why this is the order that scales
The deterministic side scales because it is ordinary software. It is cacheable, testable, cheap to run, and it does not drift when the model version changes underneath you. The generative side scales because its surface area is small and forgiving: phrasing errors are recoverable in a way that arithmetic errors are not. A clumsy sentence loses you nothing; a wrong total loses you the account.
Reverse the split and you inherit the worst of both. You pay model latency and token cost to do work a database does for free, you get answers that vary between runs, and you cannot prove any of them. Worse, the errors are confident and well-written, which is precisely what makes them dangerous. A wrong number in a clumsy email gets caught. A wrong number in a beautifully argued executive summary gets forwarded.
The reason this matters beyond any one product is that “AI insights” is becoming a default feature request, and the default implementation is the broken one. The fix is not a better model or a longer prompt. It is moving the boundary: compute every fact before the model sees it, and let the model do the one thing it is genuinely better at than your codebase, which is putting those facts into language a person will trust.
Trust is not a property of the model. It is a property of where you drew the line.