Chain of thought, and where it breaks

Asking a model to reason step by step reliably improves its answers. The catch most advice skips: in a production app, that visible reasoning is often the last thing you want.

12 November 2024 2 min read

Chain-of-thought (CoT) prompting gives a model a roadmap instead of just a destination. You break a complex problem into logical steps and have the model work through them, rather than demanding a single answer. It’s the model showing its work - and unlike school, the working actually helps you.

How it works

Ask a model to predict a stock movement and a naive prompt looks like: “Will Acme Corp go up or down tomorrow?” A CoT prompt guides the reasoning instead:

Analyse Acme Corp’s latest financial reports.
Consider current market trends affecting its industry.
Evaluate recent news and announcements.
Based on the above, predict the likely direction tomorrow.

You’re guiding the process, not extracting a verdict.

Why it helps

Better accuracy. Decomposing the problem makes illogical leaps and hallucinations less likely.
Explainability. The step-by-step trace makes the reasoning legible. No black box.
Harder problems. The model can tackle more nuanced tasks when it works through them.

Where it breaks

This is the part most advice skips. CoT is straightforward in a back-and-forth chat. Inside a deployed application it’s harder, because the reasoning becomes part of the output - and often you don’t want it there.

I made this point in simple prompting tips:

If you are generating something that will be shared with an audience, you don’t want the step-by-step thinking in there.

“Think step by step” works, but only when the reasoning is part of what you actually want to show. To get CoT’s accuracy benefit without leaking the reasoning into production output, you need a couple of extra techniques:

Ask for the reasoning inside <thinking> tags, then strip those out programmatically before showing the result.
Use multi-shot prompting - pass an initial output back to the model so it can refine, then surface only the final pass.

The principle underneath all of it is the same one that governs prompting generally: be clear about what you want the model to produce, and make sure the technique you reach for matches the format you actually need to ship.