Context engineering is the design surface

Prompt engineering treats the words as the lever. Context engineering treats the whole context window as the design surface, with a stage and an artifact for each step. It's the better abstraction for production work.

22 September 2025 5 min read

Prompt engineering treats the wording as the lever you pull. Phrase the request better, add the magic words, and the output improves. That model works for a single turn against a chatbot. It stops working the moment the task is large enough to matter, because the wording was never the whole input. The whole input is the context window: the instructions, the examples, the retrieved material, the prior turns, the structure you imposed on all of it. Context engineering treats that entire window as a design surface you compose on purpose.

The companion to this argument is that you should build context systems instead of crafting one-off prompts. That piece makes the case for the shift. This one goes further into how: what stages the work moves through, and what each stage leaves behind.

A stage produces an artifact

The trap with “context engineering” is that it can stay a vibe. You nod, you agree the context matters, and then you go back to typing paragraphs into a box. The fix is to give the work stages, and to make each stage produce something you can point at. An artifact you can name is an artifact you can reuse, review, and hand to someone else.

I teach this as four stages at Lyssna, in workshops on how teams actually get leverage out of models. The progression is Define, Discover, Design, Develop. Each one owns a question and produces an artifact.

Stage	The question it answers	The artifact it leaves
Define	What is the task, and what does done look like?	A problem statement and explicit success criteria
Discover	What domain material and examples does the model need?	A curated context pack: vocabulary, references, good and bad examples
Design	What structure and framework holds the work?	A scaffold: the framework named, the output shape fixed
Develop	How does it improve across turns?	A working transcript you can rerun and refine

The artifacts are the point. If a stage produces nothing you can save, you skipped it, and the model is now guessing at the thing you didn’t write down.

Define: write the success criteria before the prompt

Most bad output traces back to a task that was never specified. “Make this better” has no done state, so the model invents one, and it rarely matches yours. The Define stage forces the criteria out of your head and onto the page: what the output is for, who reads it, what would make you reject it.

The artifact is small and it does a lot of work. A few lines of success criteria let you judge the output instead of reacting to it. They also become the thing you paste into every later turn, so the model is scored against your bar rather than a generic one.

Discover: vocabulary activates the right regions

This is the stage people skip, and it’s the one that moves quality most. A model has read an enormous amount. The problem is reaching for the right part of it. Domain vocabulary is how you do that. Precise terms act as a key: say “rebase” and “bisect” and the model lands in the part of its training where careful git work lives; say “make the history clean” and you get a vaguer neighbourhood. The words you choose decide which regions of the model’s training light up.

So Discover is deliberate retrieval, done by you before any automated retrieval runs. You assemble the context pack: the terms of art for this domain, the references that define the standard, and examples of both good and bad output. Examples carry more than instructions do; one strong example of the thing you want teaches faster than three paragraphs describing it. The pack is reusable. Build it once for a recurring task and every future run starts from the right place.

Design: a framework is a scaffold the model already knows

Frameworks earn their keep here, and not because they’re clever. A named framework is a structure the model has seen thousands of times in training. Invoke it and you get its shape for free: the steps, the order, the headings, the way the parts relate. You’re not teaching the model the framework. You’re pointing at one it already holds and asking it to pour this task into that mould.

That’s the lever. “Analyse this” is unstructured and the output will be too. “Run a SWOT, then a risk register, then a recommendation” hands the model three scaffolds and an order to fill them in. The artifact of this stage is the scaffold made explicit: the framework named, the output shape fixed, the sections decided before the first word is generated.

Develop: build across turns, don’t shoot once

Single-shot prompting asks for the finished thing in one call. It works for small tasks and fails quietly on large ones, because everything that goes wrong goes wrong at once and you can’t see where. Building across turns separates the failures. You generate a draft, inspect it against the success criteria from Define, correct the one thing that’s off, and continue. Each turn carries the accumulated context forward, so the window gets richer as you go rather than starting cold.

The artifact is the transcript itself. A good multi-turn session is a reusable thing: the sequence of moves that produced a result you trusted. Save it and you’ve captured a workflow, not just an answer.

CLAUDE.md is team infrastructure, not a personal trick

All four stages produce artifacts, and artifacts are exactly what a context file is for. CLAUDE.md (or its equivalent in whatever tool you use) is where the durable parts live: the vocabulary, the success criteria, the frameworks you reach for, the examples of good output. Written down once, loaded every session.

The shift worth naming is from personal trick to shared infrastructure. When the context file lives in your head, the leverage leaves when you do. When it lives in the repo, it’s something a team builds together and improves over time. A new person inherits the accumulated context on day one instead of rediscovering it over months. That is the difference between a clever individual and a team that compounds; the file is where the compounding happens.

Prompt engineering optimises a sentence. Context engineering builds the surface that every sentence lands on, and leaves an artifact at each stage so the next run starts ahead of the last one.