{
  "version": "https://jsonfeed.org/version/1.1",
  "title": "Callum van den Enden",
  "home_page_url": "https://cvde.xyz/",
  "feed_url": "https://cvde.xyz/feed.json",
  "description": "Callum van den Enden: product leader who ships production AI systems, and founder of hey anna. Essays and case studies on AI products, the business of building them, and solving problems worth solving.",
  "language": "en",
  "authors": [
    {
      "name": "Callum van den Enden",
      "url": "https://cvde.xyz/about/"
    }
  ],
  "items": [
    {
      "id": "https://cvde.xyz/writing/context-pruning-is-a-bet/",
      "url": "https://cvde.xyz/writing/context-pruning-is-a-bet/",
      "title": "Context pruning is a bet on the future",
      "summary": "When an agent's window fills, the obvious move is to drop the oldest, biggest tool results. That's a cache-eviction bet you can't make optimally without seeing the future, and the right one depends entirely on your workload.",
      "content_html": "<p>When an agent’s context window fills up, the obvious move is to prune: drop the oldest, biggest tool results and keep going. It reads as good hygiene. It is usually a bet, and on the wrong workload a losing one.</p>\n<p>The reframe that makes the trade-offs visible: the window is a cache, pruning is <span class=\"term\" data-term=\"eviction\" data-def=\"Removing an item from a cache to free space, ideally one that will not be needed again.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"Removing an item from a cache to free space, ideally one that will not be needed again.\" data-note=\"Context pruning is eviction by another name, and it inherits eviction's hard limit: you cannot do it optimally blind.\">eviction</span>, and eviction is one of the few problems in computing we can prove you cannot do optimally without seeing the future. Once you hold it that way, “just trim the old stuff” stops looking like obvious hygiene and starts looking like what it is - a wager about what you’ll need again. This is the management half of the problem that <a href=\"/writing/context-engineering/\">context engineering</a> only sets up; composing the window is one job, keeping it useful as it grows is another.</p>\n<h2 id=\"the-cost-axis-is-flatter-than-you-think\"><a class=\"heading-anchor\" href=\"#the-cost-axis-is-flatter-than-you-think\">The cost axis is flatter than you think</a></h2>\n<p>The instinct to prune is mostly about money, and money is the wrong axis. Anthropic’s prompt cache charges a cache read at roughly 10% of the input price. A warm window is already a tenth of full freight, so carrying it is cheap. Aggressively shrinking a warm window trades that cheap read for a full-price rewrite of everything after the cut. Pruning to save money can cost money. <span class=\"term\" data-term=\"inference\" data-def=\"Running a trained model to produce an output, as opposed to training it; the compute you pay for on every call.\" tabindex=\"0\" aria-describedby=\"term-def-2\" title=\"Running a trained model to produce an output, as opposed to training it; the compute you pay for on every call.\" data-note=\"The variable cost that makes AI unit economics unlike old SaaS: you rent it by the token, on every turn, forever.\">Inference</span> is <a href=\"/writing/unit-economics-of-a-one-person-ai-product/\">a real marginal cost on every turn</a>, but the cache is what flattens it, and pruning fights the cache.</p>\n<p>If cost is not the axis, two things are. The first is attention quality: a window stuffed with stale output buries the signal the model needs, and a model reasoning over its own noise gets worse in ways no invoice shows you. The second is the ceiling: every model has a hard token limit, and a long session walks toward it whether you like the economics or not. Those are the honest reasons to prune. Naming them changes what a good policy looks like, because a policy tuned for cost and a policy tuned for attention are not the same policy.</p>\n<h2 id=\"every-edit-to-the-prefix-has-to-earn-its-rewrite\"><a class=\"heading-anchor\" href=\"#every-edit-to-the-prefix-has-to-earn-its-rewrite\">Every edit to the prefix has to earn its rewrite</a></h2>\n<p>The cache has one rule: the <span class=\"term\" data-term=\"prefix\" data-def=\"The leading run of tokens in a request that a cache matches byte-for-byte before reusing earlier work.\" tabindex=\"0\" aria-describedby=\"term-def-3\" title=\"The leading run of tokens in a request that a cache matches byte-for-byte before reusing earlier work.\" data-note=\"Editing anything inside the cached prefix invalidates everything after it, so each prune pays a full-price rewrite of the suffix.\">prefix</span> you send must match the prefix already in the cache byte-for-byte, up to the point you mark. Appending a new message keeps the prefix intact and stays cheap. Editing history does not. Change a byte and everything after it is invalidated and re-billed at full price.</p>\n<p>A prune is an edit. It deletes bytes in the middle of the cached region, which forces a rewrite of the entire suffix that follows. Sometimes that is worth paying. The point is that it is never free, so any pruning policy is spending real money each time it fires, and a policy that fires on a bad guess is spending it for nothing.</p>\n<h2 id=\"why-blind-pruning-thrashes\"><a class=\"heading-anchor\" href=\"#why-blind-pruning-thrashes\">Why blind pruning thrashes</a></h2>\n<p>The standard policy is age plus size: evict any result older than N turns and bigger than M tokens. That rule is a bet that old and big implies never needed again. When the bet is wrong - a large result referenced again just outside the retention window - you get a loop. You evict it, the model asks for it, you re-inflate it, it ages back out, you evict it again. Evict, recall, re-inflate, evict. Each cycle pays a rewrite and leaves dead stubs behind, for negative value.</p>\n<p>This is not a tuning problem you can dial away. It is <span class=\"term\" data-term=\"Belady's result\" data-def=\"The result from cache theory that no eviction policy can be optimal without knowing the future sequence of accesses.\" tabindex=\"0\" aria-describedby=\"term-def-4\" title=\"The result from cache theory that no eviction policy can be optimal without knowing the future sequence of accesses.\" data-note=\"Why blind context pruning is a bet: you are guessing which past results you will need again, and you cannot prove the guess right.\">Belady</span>’s result from the page-replacement literature: the provably optimal eviction policy requires knowing the future sequence of accesses, which you do not have. Every real policy approximates the future from the past, and age-and-size is a crude proxy for the future. On the wrong workload that proxy is worse than doing nothing.</p>\n<p>So a pruning policy has two honest forms. Drop only what is provably dead, or learn from what gets recalled. Everything in between is a guess.</p>\n<h2 id=\"the-two-safe-moves-and-the-one-to-refuse\"><a class=\"heading-anchor\" href=\"#the-two-safe-moves-and-the-one-to-refuse\">The two safe moves, and the one to refuse</a></h2>\n<p>The first honest form is <strong>provably-dead hygiene</strong>. Some results are dead with certainty, not by age guess. A file read whose contents were overwritten by a later write in the same session. The same query run twice with identical parameters. Those bytes can never be correct or useful again, so dropping them cannot thrash - there is nothing to recall, because the future access does not exist by construction. This is the only kind of prune that is unconditionally safe, and it is the one to reach for first.</p>\n<p>The second is <strong>offload, don’t delete</strong>. Instead of removing a large ageing result, replace it in the live window with a one-line stub that points to the verbatim original in your own storage. If the model needs it, it pulls the exact bytes back from your store, never by re-running the tool against the external source. And the rule that makes this correct rather than fragile: once a result has been recalled, exempt it from eviction. A recall is proof the result is in the <span class=\"term\" data-term=\"working set\" data-def=\"The set of items actively in use over a window of time, as opposed to merely present.\" tabindex=\"0\" aria-describedby=\"term-def-5\" title=\"The set of items actively in use over a window of time, as opposed to merely present.\" data-note=\"A result that gets recalled is proof it is in the working set, so exempt it from eviction rather than dropping it again.\">working set</span>, so the policy learns from its own mistakes instead of re-evicting the same bytes every few turns. That single exemption is the whole difference between an offload tier that helps and one that thrashes.</p>\n<aside class=\"callout callout--warning\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:triangle-alert\">   <symbol id=\"ai:lucide:triangle-alert\" viewBox=\"0 0 24 24\"><path fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\" d=\"m21.73 18l-8-14a2 2 0 0 0-3.48 0l-8 14A2 2 0 0 0 4 21h16a2 2 0 0 0 1.73-3M12 9v4m0 4h.01\"/></symbol><use href=\"#ai:lucide:triangle-alert\"></use>  </svg> <span data-astro-cid-pyumqe5w>Warning</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>The move to refuse is blind age-based <span class=\"term\" data-term=\"elision\" data-def=\"Dropping or omitting content to leave something shorter.\" tabindex=\"0\" aria-describedby=\"term-def-6\" title=\"Dropping or omitting content to leave something shorter.\" data-note=\"Here: trimming old turns from the context window to make room. Cheap to do, easy to get wrong, because what you drop you may need again.\">elision</span> with no recall signal and no test for provable death. It bets on the future with nothing to inform the bet. On a forgiving workload it is merely wasteful; on an unforgiving one it degrades the session while presenting as good housekeeping.</p> </div> </aside>\n<h2 id=\"fit-for-purpose-the-workload-decides\"><a class=\"heading-anchor\" href=\"#fit-for-purpose-the-workload-decides\">Fit for purpose: the workload decides</a></h2>\n<p>None of this has a universal answer, which is the actual point. The right policy is a function of your workload and your cost structure, and those vary widely; the policy has to follow them rather than a default.</p>\n<p>A coding agent lives on read-edit-read churn. It reads a file, edits it, reads it again, and that first read is now genuinely dead. This workload manufactures provably-dead results by the dozen, so hygiene alone reclaims a lot of window at zero risk, and an offload tier earns its keep on top.</p>\n<p>A data-analysis agent looks nothing like that. A schema it pulled an hour ago, a statistical result from turn three - those stay live, and they get referenced again when the agent writes its conclusion. The same age-and-size rule that is roughly safe on the coding agent misfires here, because on this workload “old” simply does not predict “dead.” The results age without dying, and a policy that confuses the two throws away the working set.</p>\n<p>And when the problem is just that the session got long, pruning is often the wrong instrument entirely. <span class=\"term\" data-term=\"compaction\" data-def=\"Summarising a transcript into working memory and dropping the raw turns, keeping the conclusions and losing the chatter.\" tabindex=\"0\" aria-describedby=\"term-def-7\" title=\"Summarising a transcript into working memory and dropping the raw turns, keeping the conclusions and losing the chatter.\" data-note=\"The right tool when a session got long, where pruning individual results is the wrong one.\">Compaction</span> - summarising the transcript into working memory and dropping the raw turns - preserves the hard-won conclusions better than any rule guessing which raw bytes to discard. It is lossy on purpose, and the skill is choosing what to lose; a summary that keeps the chatter and drops the conclusions is worse than no summary at all. Different problem, different tool.</p>\n<h2 id=\"measure-before-you-wire\"><a class=\"heading-anchor\" href=\"#measure-before-you-wire\">Measure before you wire</a></h2>\n<p>The order that keeps you honest is instrument first, read the distributions, then decide which policy, if any, earns its complexity. How long do sessions actually get? How often does a pruned result get recalled? How much of your window is provably dead on a real trace rather than a hypothetical one? Those numbers tell you whether you need hygiene, offload, compaction, or nothing at all, and they routinely say “less than you assumed.”</p>\n<p>The instinct is to reach for the cleverest policy; the discipline is to use the one your workload justifies. That’s usually the simplest, and sometimes it’s none. Pruning is a bet on data you don’t have yet, so make the smallest bet that works and check the numbers.</p>",
      "date_published": "2026-06-02T00:00:00.000Z",
      "date_modified": "2026-06-02T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai-product",
        "engineering",
        "context-engineering"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/google-wins-consumer-ai/",
      "url": "https://cvde.xyz/writing/google-wins-consumer-ai/",
      "title": "Google wins consumer AI on distribution",
      "summary": "Intelligence is commoditising, so the model stops being the moat. What's left is distribution and a business that profits from giving intelligence away, and Google is the only company with both - fighting Nvidia, Apple, Amazon, Microsoft and Meta each on one front while it works all five.",
      "content_html": "<p>Two billion people a month use a frontier AI model they never chose. No download, no sign-up, no new habit; Google put a Gemini-written answer at the top of the search results they were already going to read, and the choice got made for them with a server-side change. That is worth more than it looks, and seeing why means starting somewhere counterintuitive: the model itself is the part of this that matters least.</p>\n<h2 id=\"intelligence-is-the-part-that-commoditises\"><a class=\"heading-anchor\" href=\"#intelligence-is-the-part-that-commoditises\">Intelligence is the part that commoditises</a></h2>\n<p>The frontier labs are converging, and the lead is now measured in months. China is the clearest tell. DeepSeek’s V4, released in April 2026, lands within three to six months of GPT-5.4 and Gemini 3.1 Pro, beats every open model on maths and coding, and trails only a closed model on world knowledge - built in large part by <span class=\"term\" data-term=\"distillation\" data-def=\"Training a smaller, cheaper model on the outputs of a larger one to copy most of its capability.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"Training a smaller, cheaper model on the outputs of a larger one to copy most of its capability.\" data-note=\"How a follower catches the frontier a quarter or two after it is set, for a fraction of the cost.\">distilling</span> the very frontier it’s chasing, training a cheap model on the expensive ones’ outputs. The accusation that this is industrial-scale copying is probably true and nearly beside the point; distillation works, the floor keeps rising, and the ceiling gets matched a quarter or two after it’s set.</p>\n<p>Intelligence is going the way every digital capability goes once several funded teams chase the same target: towards good-enough and cheap. That doesn’t make the models worthless. It makes them a poor place to build a moat. If the plan is to win consumer AI by holding the smartest model, the position resets every few months, against competitors who can rent or distil most of the advantage away. The question worth asking is what doesn’t commoditise, because that is where the market actually gets decided.</p>\n<h2 id=\"distribution-is-the-part-that-doesnt\"><a class=\"heading-anchor\" href=\"#distribution-is-the-part-that-doesnt\">Distribution is the part that doesn’t</a></h2>\n<p>What doesn’t commoditise is the thing standing between a model and a person: the surface it lives on, and the habit of reaching for it. That was always the hard part of consumer software - not building something good, but getting a human to form the habit of opening it. OpenAI did exactly that from nothing, which is a real achievement; ChatGPT is one of the fastest habits the consumer internet has formed, 900 million people opening it every week. But it had to build that distribution a download at a time. Google built none. It already owns the surfaces where billions of people start the day - the search bar, the browser, the phone, the inbox, the map, the video - and it can put a model on all of them with a configuration change rather than a marketing budget.</p>\n<p>This is where <a href=\"/writing/ai-is-an-interface/\">AI as the interface</a> stops being an abstraction. If the durable job of AI is to be the layer you speak to instead of the system you learn, the company that already owns the surfaces people speak into has the shortest path from a new model to a billion users of it. Apple holds the matching distribution, the other half of the world’s phones, and no frontier model of its own - so little of one that it is now paying Google around a billion dollars a year for a custom Gemini to run the rebuilt Siri, white-labelled so the user only ever sees Siri, shipping to roughly 1.5 billion devices. Google’s model is about to power the assistant on both of the platforms people actually carry, its own and its only rival’s. The distance from “we have a new model” to “it is in front of everyone” is, for Google, a deploy; on the iPhone, it’s Apple’s.</p>\n<h2 id=\"context-is-the-other-thing-that-doesnt\"><a class=\"heading-anchor\" href=\"#context-is-the-other-thing-that-doesnt\">Context is the other thing that doesn’t</a></h2>\n<p>There is a second moat hiding behind the first. Once intelligence is a commodity, usefulness stops being a question of how smart the model is and becomes a question of how much it knows about you. A brilliant model with no context gives you a brilliant generic answer; a fair model that knows your calendar, your inbox, your last ten searches, where you drove this morning and what you watched last night gives you the answer you actually wanted. Usefulness is intelligence times context, and the second term is where the contest moves once the first one flattens.</p>\n<p>Nobody has more context on more people than Google. Gmail, Calendar, Maps with your location history, Search with everything you have ever asked, YouTube with everything you have watched, Photos with your life in it, Drive and Docs with your work, Android in your pocket, Home on your kitchen bench. That is the most complete picture of a person any company has assembled, and it is precisely the raw material a commodity model needs to stop being generic. The assistant-everywhere position feeds it further: powering Siri and Gemini across both platforms is a firehose of real-world use and training signal, and good data and real use are themselves among the strongest determinants of where a model ends up. So even if Apple builds its own model and pulls the rug in a year, Google spends that year compounding its lead in the one input that matters most. The model can be swapped out. The years of context it was tuned against cannot be handed back.</p>\n<h2 id=\"consumers-wont-pay-and-google-doesnt-need-them-to\"><a class=\"heading-anchor\" href=\"#consumers-wont-pay-and-google-doesnt-need-them-to\">Consumers won’t pay, and Google doesn’t need them to</a></h2>\n<p>Distribution and context would matter less if consumer AI were a good business to be in directly. It isn’t. Consumers are famously bad at paying for software; the <a href=\"/writing/stated-versus-revealed-preference/\">revealed preference</a> of the median user is free, with a hard ceiling on what they’ll convert to a subscription, while <a href=\"/writing/unit-economics-of-a-one-person-ai-product/\">every query still costs real money to serve</a>. That is the quiet bind under the subscription AI companies: <span class=\"term\" data-term=\"inference\" data-def=\"Running a trained model to produce an output, as opposed to training it; the compute you pay for on every call.\" tabindex=\"0\" aria-describedby=\"term-def-2\" title=\"Running a trained model to produce an output, as opposed to training it; the compute you pay for on every call.\" data-note=\"The variable cost that makes AI unit economics unlike old SaaS: you rent it by the token, on every turn, forever.\">inference</span> is a marginal cost on every use, most users never pay, and the product they’re selling - intelligence - is the exact thing commoditising underneath them. Selling a melting asset to people who don’t like paying is a hard place to build something durable.</p>\n<p>Google is not in that business. It never sold intelligence; it sells attention, and monetises that attention through ads. So Gemini does not have to make money. Its job is to keep the customer inside Google’s surfaces long enough for the machine that does make money to run. AI as the interface to all of Google’s other products means the model isn’t the thing being monetised - it’s the funnel into the things that are. Giving intelligence away free is not a concession Google is forced into; it is the rational move for the one company that profits from the attention rather than the tokens.</p>\n<p>Which is why commoditised intelligence, the thing that threatens the subscription players, is a tailwind for Google specifically. As the price of intelligence falls towards zero, the company hurt most is the one whose product was the intelligence. The company helped most is the one that wanted to give it away anyway, because the give-away is what holds the attention it actually sells. The same trend is a headwind for one business model and a subsidy for the other.</p>\n<h2 id=\"the-same-ai-pays-for-itself-twice\"><a class=\"heading-anchor\" href=\"#the-same-ai-pays-for-itself-twice\">The same AI pays for itself twice</a></h2>\n<p>It gets better for Google than “the AI keeps the attention,” because the same AI also monetises that attention harder. The fear for years was that an AI answer kills the search results page: resolve the question and you’ve removed the ten blue links and the ads stacked above them. Google’s answer is to rebuild the ad unit inside the answer. It is now testing conversational ad formats and AI-powered shopping ads directly in AI Mode, with Gemini writing the ad creative to fit the specific question, on a surface Google says has crossed a billion monthly users. The AI answer becomes a new ad inventory rather than the end of the old one. Whether the yield per query matches the old page is the open question, and I’ll come back to it; the direction is to monetise the AI surface, not retreat from it.</p>\n<p>The other half is ranking, and here Google is following Meta’s lead. Meta has spent the last couple of years pointing models at the scale and complexity of frontier AI at a narrower problem than chat: which ad to show, to whom, right now. Its Adaptive Ranking Model, live across Instagram in 2026, reads far more signals - including what people do with Meta’s own AI - to match an ad to the person most likely to act on it, and the lift is real. Google is doing the same across its surfaces: AI Max in Search, and on YouTube the kind of personalisation that used to be close to impossible, where a model now reads both the viewer and the video they’re watching to place the ad that fits both. So the AI spend returns at both ends of the same pipe: as the interface that holds the attention coming in, and as the ranking engine that wrings more out of it going out. One investment, two returns, on what was already a highly profitable advertising machine.</p>\n<h2 id=\"five-fronts-one-valuation\"><a class=\"heading-anchor\" href=\"#five-fronts-one-valuation\">Five fronts, one valuation</a></h2>\n<p>All of this rests on something rarer than any single product: Google is the only company holding every layer of the stack at once. The frontier lab is its own, DeepMind. The silicon is its own, the TPU, so it trains and serves without paying the Nvidia tax or waiting in the Nvidia queue. The cloud is its own. The models are its own. The surfaces that carry them to people are its own. That vertical integration is exactly what lets it give intelligence away and still profit; it captures the value downstream, on infrastructure it owns end to end, instead of renting a layer from a competitor.</p>\n<p>The breadth is easy to under-feel because it’s smeared across so many products. Search, the most-used browser, the most-used mobile operating system, the most-used video platform, billion-user mail and maps and photos, the documents and drives where people keep their work, the speaker on the kitchen bench, the third-largest cloud, the leading robotaxi business, its own AI silicon, a frontier lab. Name a layer of the AI stack or a consumer surface that matters, and Google is first, second, or third in it.</p>\n<p>That sets up the genuinely strange part. Each company Google competes with fights on one front. Nvidia sells the silicon. Apple sells the devices. Amazon sells the cloud. Microsoft sells the enterprise seat. Meta sells the same thing Google does - attention, monetised by ads - and is the one rival on exactly the front AI touches hardest, building frontier-scale models both to win the consumer’s time and to place the ad against it. Google competes with all five at once, on each of their home grounds, and is a top-three player in every one of those markets. And it trades in the same band as any single one of them: Alphabet crossed four trillion dollars in January 2026 and passed Apple to sit second only to Nvidia, around 4.6 trillion against Nvidia’s 5.2, Apple’s 4.5 and Microsoft’s 3.1. The market spent years pricing it at a discount on the fear that AI would eat search. The re-rating since is the slow recognition that the company best placed to own the AI replacing search is the one that already owns search. It also owns the distribution, the model, the chips, and the ad machine that pays for all of it.</p>\n<h2 id=\"the-case-against\"><a class=\"heading-anchor\" href=\"#the-case-against\">The case against</a></h2>\n<p>The honest version has to hold the strongest form of the other side, and there is a real one.</p>\n<p>Start with the cannibalisation that last section waved past. Even with ads rebuilt inside the AI answer, a single resolved answer exposes far less monetisable surface than a page of ten links with four ads stacked on top. Google is betting it can reconstruct equivalent yield on a sparser surface, and that bet isn’t won; the search results page is the most profitable real estate ever built, and Google has to rebuild it in flight, doing to itself the thing a competitor would need years to do to it. Get the new yield wrong and the most reliable profit engine in technology degrades on purpose.</p>\n<p>Reach is also not preference. The two billion are largely passive, served a summary they didn’t ask for, while the deliberate, high-intent relationship - the assistant you open on purpose to do real work - is where ChatGPT leads, along with revenue per user. If the valuable AI habit turns out to be the deliberate one, impressions matter less than engagement and Google’s headline number flatters its position. OpenAI is building its own distribution too, through devices and the app habit, so the gap that looks decisive today may not stay this wide.</p>\n<p>Commoditised intelligence cuts both ways, as well. If the model edge erodes for everyone, it erodes for Google, and the protection that remains - distribution plus an ad machine - is the one thing Meta also has, with three billion users and an ad system of its own. The attention war may be a two-incumbent grind rather than a Google walkover. And the lever that makes Google’s distribution unbeatable is the lever regulators most want to pull: the default-search deal, the Android bundle, and the ad-tech stack are all live antitrust exposure, and a forced unwind of any of them would blunt the sharpest edge in the whole argument.</p>\n<p>The most immediate problem, though, is Google’s own. Gemini still trips on things it shouldn’t, and the company’s famously fragmented product structure shows in experiences that don’t talk to each other; the context it holds is scattered across teams and apps that were never built to share it, which is half of why the assistant doesn’t yet feel like it knows you as well as it plainly could. But notice the shape of that problem. It is execution, not structure - the assets are all there and pointed slightly the wrong way, which is the tractable kind of problem, the kind you fix with org will and a few quarters, not the kind that needs you to go and acquire something you don’t own. The structural advantages are the ones that are hard to build, and those Google already has.</p>\n<p>Weigh all of it, and the case still lands, because the bet was never that Google has the best model. It is that the best model won’t be the thing that matters. Intelligence is commoditising; distribution and context aren’t; and the company that wins consumer AI is the one that already has the attention, knows the most about the people it’s serving, profits from giving the intelligence away, and turns the same AI into a sharper way to monetise the attention it holds. Google is the only one with all of it, sitting on the only stack that owns every layer from the sand to the search box. The model will be matched. The distribution, the context, the business model, and the machine underneath them are the parts that won’t.</p>\n<p>Once the model is smarter than us at almost everything, owning the smartest one stops mattering. Knowing the user does.</p>",
      "date_published": "2026-05-25T00:00:00.000Z",
      "date_modified": "2026-05-25T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "gtm",
        "economics",
        "ai"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/mechanistic-interpretability-as-art/",
      "url": "https://cvde.xyz/writing/mechanistic-interpretability-as-art/",
      "title": "Mechanistic interpretability as generative art",
      "summary": "If a network has learned a concept, that concept is a location in its embedding space. Steer a generator toward that location and you don't get a diagram of what the model knows - you get a picture of it.",
      "content_html": "<p>A neural network that has learned the concept “ocean” has not learned a definition. It has learned a <em>location</em> - a point, or more honestly a region, in a high-dimensional space where the things it associates with oceans cluster together. Interpretability research usually treats that fact as something to diagram: probe the space, label the axes, write the paper. I became more interested in a different question. If the concept is a place, what does it look like to <em>go</em> there?</p>\n<p>That question turned into a project. It optimises generative outputs - images, audio, video - toward target coordinates in <a href=\"https://github.com/PKU-YuanGroup/LanguageBind\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">LanguageBind<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a>’s 768-dimensional multimodal <span class=\"term\" data-term=\"embedding space\" data-def=\"A high-dimensional space a model maps things into, where direction and distance encode meaning, so similar concepts sit near each other.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"A high-dimensional space a model maps things into, where direction and distance encode meaning, so similar concepts sit near each other.\" data-note=\"A learned concept is a location here; steer a generator towards that point and you get a picture of how the model holds it.\">embedding space</span>, doing concept algebra across modalities to surface the platonic ideals the network has learned. The honest description is that it sits exactly on the seam between research and aesthetics, and that seam is more interesting than either side of it alone.</p>\n<h2 id=\"one-space-for-everything\"><a class=\"heading-anchor\" href=\"#one-space-for-everything\">One space for everything</a></h2>\n<p>What makes LanguageBind useful here is that it embeds image, text, audio, and video into a <em>single</em> shared space, anchored on language. A photo of a beach, the word “beach,” and the sound of waves all land near each other, because the model was trained to align them. The space doesn’t care which door a concept came in through. “Ocean” is a region whether you arrived by image, by word, or by sound. <span class=\"term\" data-term=\"anchoring\" data-def=\"The bias where the first number you hear becomes the reference point you reason from, even after you learn it was wrong.\" tabindex=\"0\" aria-describedby=\"term-def-2\" title=\"The bias where the first number you hear becomes the reference point you reason from, even after you learn it was wrong.\" data-note=\"Why a preliminary figure in a readout is a trap: the caveat is processed as language, the number as fact, and the fact wins.\">Anchoring</span> on language earns one more thing: any point in the space can be projected back toward the text vocabulary, so even a non-text coordinate has a running readout in words. You can ask what the current image embeds as and get back something like <code>{storm, crackling, electric}</code>, which turns the optimisation from a black box into something you can follow as it runs.</p>\n<p>That shared geometry is what lets you do something that shouldn’t intuitively work: algebra across senses. Take the embedding of a sound, subtract the embedding of a word, add the embedding of an image, and you land somewhere new - a coordinate that no single input could have named. The classic <code>king − man + woman ≈ queen</code> move from word vectors, except the operands can be a photograph, a field recording, and a sentence, mixed freely.</p>\n<h2 id=\"generation-as-a-search-for-a-place\"><a class=\"heading-anchor\" href=\"#generation-as-a-search-for-a-place\">Generation as a search for a place</a></h2>\n<p>Here’s the move that turns the geometry into pictures. Pick a target coordinate - a concept, or a piece of concept algebra. Then run a generator (a diffusion model for images, the equivalents for audio and video) and optimise its output so that <em>its</em> embedding lands as close as possible to that target. You’re not prompting “draw an ocean.” You’re telling the system: produce something - anything - that this network would file in the same place it files oceans, and let the network be the judge.</p>\n<p>The output is not an illustration of the concept. It’s the generator’s best attempt to <em>occupy the coordinate</em>. Sometimes that’s a recognisable beach. More often it’s stranger and more revealing: the features the network most strongly associates with the region, rendered without the constraint of looking like any real photograph. You are seeing the concept the way the model holds it, not the way the world presents it.</p>\n<h2 id=\"why-this-is-interpretability-not-just-a-filter\"><a class=\"heading-anchor\" href=\"#why-this-is-interpretability-not-just-a-filter\">Why this is interpretability, not just a filter</a></h2>\n<p>It would be easy to dismiss this as a stylised image generator. The reason it’s interpretability is that the output is <em>diagnostic</em>. When you steer toward a concept and the result is surprising - when “trust” renders as something you wouldn’t have predicted, or when an audio target produces an image whose logic only makes sense once you hear the sound - you’ve learned something concrete about how the network organises that region of its space. The art is the readout.</p>\n<p>Concept algebra is where this gets sharpest. If <code>summer − heat</code> lands somewhere coherent, the model has factored those two ideas apart cleanly. If it lands in noise, it hasn’t - the concepts are entangled in a way the geometry won’t let you separate. The generated output makes that legible in a way a cosine-similarity table never does. You can <em>see</em> whether the model’s internal world is well-organised, and where it isn’t.</p>\n<p>This is the same instinct that drives the better interpretability work in the field - the feature-visualisation lineage, the “what is this neuron looking for” question - pointed at the multimodal case and pushed until the answer is an image.</p>\n<h2 id=\"why-the-aesthetics-arent-decoration\"><a class=\"heading-anchor\" href=\"#why-the-aesthetics-arent-decoration\">Why the aesthetics aren’t decoration</a></h2>\n<p>I want to defend the art half directly, because the reflex is to treat it as a decorative wrapper on the research. It isn’t decorative. It’s the most faithful available rendering of an object - a learned concept - that has no native visual form.</p>\n<p>A concept in embedding space is genuinely 768-dimensional. Any honest depiction of it has to throw most of those dimensions away. A scatter plot throws away all but two and asks you to trust the projection. A generated image throws away fewer, because the generator is trying to satisfy the full target coordinate at once, across every dimension simultaneously. The image is a lossy compression of the concept, but it’s a <em>richer</em> lossy compression than the chart - and it carries information the chart structurally cannot. The aesthetics are doing epistemic work.</p>\n<p>There’s also a claim hiding in the word “platonic.” If many different networks, trained on different data, converge on similar internal structure - and there’s a growing body of evidence they do - then steering toward a concept and rendering it isn’t just visualising one model’s quirks. It’s getting at something closer to the shape the concept takes whenever a system this size learns it from the world. That’s a stronger claim than “pretty pictures from a model,” and it’s the one the work is actually making.</p>\n<h2 id=\"the-seam-is-the-point\"><a class=\"heading-anchor\" href=\"#the-seam-is-the-point\">The seam is the point</a></h2>\n<p>The cleanest framing I have: most interpretability tells you <em>that</em> a model represents a concept and roughly where. This tries to show you <em>what it’s like</em> for the model to represent it. That’s a question with a research answer and an aesthetic answer at the same time, and refusing to separate them is deliberate. The picture is the finding.</p>\n<p>The code is on <a href=\"https://github.com/cal-van/multimodal-embedding-art\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">GitHub<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a> if you want to steer toward your own coordinates and see what the network thinks lives there.</p>",
      "date_published": "2026-05-23T00:00:00.000Z",
      "date_modified": "2026-05-28T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "interpretability",
        "generative-art"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/ai-is-an-interface/",
      "url": "https://cvde.xyz/writing/ai-is-an-interface/",
      "title": "AI is an interface",
      "summary": "The interface was always the tax you paid to use a system: its menus, its query language, its API. AI's most durable job is to take that tax off, by turning plain language into the task you meant and the systems that carry it out.",
      "content_html": "<p>For most of computing history, the interface was the tax you paid to use a system. You learned its menus, its fields, its query language, its API. The software held a model of the world, and using it meant translating what you wanted into that model by hand. The people fluent in that translation were a profession.</p>\n<p>AI’s most durable job is to take that tax off, and the job is narrower than the hype around it: it is the interface. You say what you want in your own words, and the system does the translation that used to be yours to do. It maps your intent onto the task you meant, and onto the operations across whatever tools carry it out. Plain language in; the right actions in the right systems out.</p>\n<p>It is worth being precise, because “AI” gets stretched to cover everything. This is the narrow claim, and it is the one that holds: AI as the layer that takes “which segment is churning, and why” and turns it into the joins, the queries, the significance tests, and the calls to the four systems that actually hold the answer. The intelligence can sit wherever it sits. The interface is the part that just got rewritten, and the rewrite is where most of the value is.</p>\n<p>You can see why by looking at what the old interface gated. Analytics lived behind SQL and a BI tool, so the people who could ask the warehouse a question were the people who had learned its dialect; the analyst was, in effect, a human interface between a manager’s question and the database. Every system worth using carried a toll like that: a language, a console, a certification. When the interface becomes plain language, the toll comes off, and the person with the question and the person who can phrase it in the tool’s grammar become the same person.</p>\n<p>For anyone building, that moves the hard part. If AI is the interface, the product is the mapping: intent to task to systems, done reliably, against real data and real tools that are messy in all the ordinary ways. The model is the commoditising half. The half that earns its keep is the binding between a sentence and the operations it should set off, done so the result can be trusted. That becomes its own discipline the moment a wrong mapping is an action taken rather than a paragraph produced.</p>\n<p>It hits product design hardest, because design was the interface. The job used to be drawing the screens: every state, every control, every path a user might take, rendered as something to click. When language carries the range instead, that surface shrinks; you don’t draw a screen for each of the thousand things someone might ask, because the model takes the input you never enumerated. The pixel-level core of the role is exactly the part that flattens, into the same conversation the PM and the engineer are now having, because <a href=\"/writing/the-four-mode-pm/\">the four modes were always one feedback loop</a> and the interface was the seam holding them apart.</p>\n<p>What’s left for design is the part that was always the harder half. Deciding where language is the right interface and where a <a href=\"/writing/not-every-ai-feature-is-a-chat/\">direct control still wins</a>. Shaping the tools the model composes so the pieces stay legible. Designing how the system shows its confidence, shows its working, and <a href=\"/writing/the-trust-calibration-tax/\">recovers when it’s wrong</a>, which is where trust is won or lost. And the judgment under all of it: what good output looks like, what the defaults should be, which few surfaces still deserve to be crafted by hand. That is not less design. It is design measured in whether the thing can be trusted and understood rather than in how many screens got drawn. The new role looks more like designing a system’s behaviour than decorating its surface.</p>\n<p>The shift was never that the machine got clever. It is that you stopped having to think like the machine in order to use it. That was always the tax. Taking it off is most of what AI is for.</p>",
      "date_published": "2026-05-22T00:00:00.000Z",
      "date_modified": "2026-05-22T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "ai-product",
        "product"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/coverage-synthesis-intent/",
      "url": "https://cvde.xyz/writing/coverage-synthesis-intent/",
      "title": "Three gaps: coverage, synthesis, intent",
      "summary": "Most AI insights requests get treated as a synthesis problem. That's the wrong reframe. There are three stacked gaps - coverage, synthesis, intent - and you can't skip a layer without trust collapsing underneath you.",
      "content_html": "<p>When someone asks an AI product for insights, they are usually handed a synthesis. “Here are the three themes from your data.” This is the wrong shape of answer, and it fails in a way that is hard to see at first because the synthesis reads beautifully. The problem is that synthesis is the middle of three stacked gaps, and a clean middle layer built on a broken bottom or pointed at the wrong top produces something confident and useless.</p>\n<p>I learned to name the three gaps building AI study-analysis at Lyssna, where the data was raw user research and the readers were people whose decisions depended on getting it right. The gaps stack. You cannot skip one, and the layer where you fail determines how the failure looks.</p>\n<h2 id=\"coverage-getting-the-data-analysable-at-all\"><a class=\"heading-anchor\" href=\"#coverage-getting-the-data-analysable-at-all\">Coverage: getting the data analysable at all</a></h2>\n<p>The bottom gap is the least glamorous and the most fatal. Coverage is the problem of getting raw, messy input into a form you can actually analyse. Research data is open-text answers, interview transcripts, half-finished sessions, the participant who misread the question, the response in a second language, the recording that cut out. Before you can find a theme you have to have read everything, parsed it, and represented it in a structure that supports counting.</p>\n<p>The old truism still holds: garbage in, garbage out. The best synthesis in the world can’t rescue data that was never read properly to begin with. What has changed is the tool you reach for. Most coverage work is unstructured text - open-text answers, comments, reviews, support tickets - and for years the only way to process it at scale was to write code against natural-language-processing packages that were mediocre at language. The large language model is the better instrument, which shouldn’t surprise anyone: language is the thing it does best. You can have it read each response and pull out the real category, the sentiment, the detail you need, or translate it before any analysis runs, in plain language rather than through a brittle parser.</p>\n<p>This is the first problem <a href=\"/work/hey-anna-founding/\">hey anna</a> was built to solve. An AI formula run down every row of a spreadsheet turns a column of messy free text into something analysable: the true category, the sentiment, a clean translation across languages, before any of the actual analysis begins. That is the wrangling stage, finally done with the right tool for language.</p>\n<p>Failure here is silent, which is what makes it dangerous. Suppose your pipeline quietly drops the responses it could not parse, and those happen to be the longest and angriest ones, because long angry text breaks parsers. Your synthesis now says “users are broadly satisfied” and it is wrong at the root. Nobody can see the error by reading the output, because the missing data left no hole on the page. The synthesis is internally flawless. It is just describing the data that survived, not the data you collected.</p>\n<p>Most teams underinvest here because coverage work is unrewarding and invisible when it succeeds. It is also the only layer where being 90% complete can be worse than useless, because the missing 10% is rarely random.</p>\n<h2 id=\"synthesis-rolling-findings-up\"><a class=\"heading-anchor\" href=\"#synthesis-rolling-findings-up\">Synthesis: rolling findings up</a></h2>\n<p>The middle gap is the one everyone means when they say “insights.” Synthesis takes the analysable data and rolls it into something higher-order: the themes, the patterns, the “seven of twelve participants stalled at pricing.” This is real work and models are genuinely good at the language of it.</p>\n<p>Failure here looks like over-confident pattern-finding. The model sees three responses that rhyme and declares a theme; it weights a vivid quote over a common one; it smooths twelve messy answers into a clean narrative that none of the twelve people would recognise. The output is plausible and slightly invented, and the only defence is to keep every synthesised claim traceable to the underlying responses, so a reader can click a theme and see the answers that produced it. Synthesis you cannot trace back to coverage is just well-phrased guessing.</p>\n<p>But here is the trap: a team that attacks synthesis first, before coverage is solid, gets a layer that works perfectly on the data it can see and lies about the data it cannot. The synthesis layer cannot detect that the floor beneath it has holes. It will confidently summarise a biased sample forever.</p>\n<h2 id=\"intent-what-the-person-was-trying-to-learn\"><a class=\"heading-anchor\" href=\"#intent-what-the-person-was-trying-to-learn\">Intent: what the person was trying to learn</a></h2>\n<p>The top gap is the one that gets ignored entirely, and it sits above synthesis. Intent is knowing what the person was actually trying to learn when they asked. The same dataset answers different questions, and a synthesis aimed at the wrong question is wasted no matter how good the two layers below it are.</p>\n<p>A product manager asking “what did we learn from this study” might mean “is the new checkout flow safe to ship,” or “which of my two designs won,” or “what objection do I take to my VP on Friday.” A generic three-theme summary serves none of these. It is correct and irrelevant, which is its own kind of failure; the reader skims it, finds nothing addressed to their actual decision, and quietly stops trusting the tool. Intent failure does not look like a wrong answer. It looks like a right answer to a question nobody asked.</p>\n<p>There is a second half to intent. The user knows their problem better than you do, which is why you ask; but knowing the problem isn’t the same as knowing what’s findable in the data, and the most valuable insight is often the one they didn’t know to ask for. I built hey anna to work that seam: to answer the question someone came with, and to act as the analyst who points them at the signal they would have walked past. The catch is that the guiding still has to serve them. A surprising finding with nothing to do with what they care about isn’t an insight, it’s a distraction. Reading intent well means widening what they thought to look for without leaving what they actually value.</p>\n<h2 id=\"why-the-order-matters\"><a class=\"heading-anchor\" href=\"#why-the-order-matters\">Why the order matters</a></h2>\n<p>Put the three together and the diagnostic falls out cleanly:</p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<div class=\"table-scroll\" tabindex=\"0\" role=\"group\" aria-label=\"Table, scroll horizontally to see more\"><table><thead><tr><th>Gap</th><th>The job</th><th>What failure looks like</th></tr></thead><tbody><tr><td>Coverage</td><td>get all the raw data analysable</td><td>a confident summary of the data that survived</td></tr><tr><td>Synthesis</td><td>roll findings into higher-order patterns</td><td>plausible themes that smooth over or invent</td></tr><tr><td>Intent</td><td>answer what the user meant to ask</td><td>a correct answer to the wrong question</td></tr></tbody></table></div>\n<p>Trust collapses from whichever layer you neglected, and it collapses in a characteristic way. Neglect coverage and the answer is confidently wrong. Neglect synthesis and the answer is shapeless. Neglect intent and the answer is irrelevant. Crucially, you cannot patch a lower failure from a higher layer: no amount of synthesis brilliance rescues missing coverage, and no amount of intent-reading rescues a synthesis built on a biased sample.</p>\n<p>The reason most AI insights products feel untrustworthy is that they ship the middle layer alone. Synthesis demos well, so it gets built first and shown first, while coverage is half-finished underneath and intent is assumed rather than asked. The fix is to build bottom-up and check top-down: secure the data, make every rolled-up claim traceable to it, and then aim the whole thing at the question the person actually came with. Skip a layer and the trust does not erode gradually. It falls through the gap you left open.</p>",
      "date_published": "2026-05-19T00:00:00.000Z",
      "date_modified": "2026-05-19T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "ai-product",
        "engineering"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/live-billing-as-a-forcing-function/",
      "url": "https://cvde.xyz/writing/live-billing-as-a-forcing-function/",
      "title": "Live billing as a forcing function",
      "summary": "I turned billing on at hey anna later than the principle deserved, and learned something about my own discipline in the process. 'I'll charge later' is almost always the wrong call, and here's why.",
      "content_html": "<p>I turned billing on at hey anna later than I should have, and I knew the principle the whole time. The moment money changes hands you find out three things you cannot learn any other way: whether the value is real, whether the price is right, and who the customer actually is. I could have written that sentence months before I acted on it. The gap between knowing it and doing it is the subject of this note.</p>\n<p>The reasoning I gave myself sounded responsible. Get the product right first. Don’t charge for something half-finished. Build trust, then ask for money. Each of those is defensible in isolation, which is exactly what made the bundle dangerous. “I’ll charge later” felt like discipline. It was avoidance.</p>\n<h2 id=\"free-usage-hides-the-one-number-you-need\"><a class=\"heading-anchor\" href=\"#free-usage-hides-the-one-number-you-need\">Free usage hides the one number you need</a></h2>\n<p>Here is what free usage tells you: people will use a thing that costs them nothing. That is almost no information. Usage under a price of zero measures curiosity, politeness, and the absence of friction. It does not measure value, because value is what someone will give up something for, and at zero they give up nothing.</p>\n<p>A charge is the cheapest instrument you own for measuring value, and it returns a clean signal. When the card gets entered, you learn the value cleared the price. When it does not, you learn something more useful: either the value is not there yet, or the price is wrong, or the person you built for is not the person who showed up. Free usage smears all three of those into a single flattering number - sign-ups, sessions, time-in-app - that goes up and to the right and tells you nothing about whether you have a business.</p>\n<p>The thing I was protecting myself from was not a half-finished product. It was the answer. As long as billing was off, the question “will anyone pay for this” stayed theoretically alive, and a live question is more comfortable to sit with than a real answer. That is the trap, and recognising it in myself was less flattering than I would like.</p>\n<h2 id=\"the-charge-tells-you-who-the-customer-is\"><a class=\"heading-anchor\" href=\"#the-charge-tells-you-who-the-customer-is\">The charge tells you who the customer is</a></h2>\n<p>The part I underestimated was the third thing: a price does not just test value, it sorts people. The accounts that converted the day billing went live were not a random sample of the free users. They skewed toward a particular kind of operator with a particular problem, and they were not always the cohort I had been designing for in my head. Free usage had been quietly averaging together the people who would pay and the people who never would, and presenting the blend to me as “users.”</p>\n<p>You cannot see that blend until a price splits it. The moment one did, the roadmap got an opinion it had been missing, because for the first time I could tell which feedback came from someone with money on the table and which came from someone with nothing at stake. Both are worth hearing; they are not worth the same weight, and free usage gives you no way to tell them apart.</p>\n<h2 id=\"the-honest-version-of-the-lesson\"><a class=\"heading-anchor\" href=\"#the-honest-version-of-the-lesson\">The honest version of the lesson</a></h2>\n<p>I want to state this without self-flagellation and without the founder bravado that turns every mistake into a humblebrag. The plain version: I delayed a reversible, low-cost experiment because the result felt high-stakes, and the delay cost me months of clarity I could have had for the price of a Stripe integration and a slightly uncomfortable email.</p>\n<p>Turning billing on is a two-way door. If the price is wrong you change it. If the timing is early you refund and apologise. Almost nothing about charging is irreversible, which is precisely why “I’ll charge later” is so rarely the right call - you are deferring a cheap, recoverable test to protect yourself from information, and information is the thing you are short of, not the thing you can afford to ration.</p>\n<p>The discipline I thought I was practising by waiting was the opposite of discipline. Discipline would have been turning it on early, watching who flinched and who reached for a card, and letting the answer reorganise the product around the people who were actually there.</p>\n<p>If you are sitting on a product with billing switched off and a good reason ready, check whether the reason is really about the product or really about you. Then turn it on. The number you get back will be worth more than the comfort you are protecting.</p>",
      "date_published": "2026-05-12T00:00:00.000Z",
      "date_modified": "2026-05-12T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "founder",
        "gtm"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/unit-economics-of-a-one-person-ai-product/",
      "url": "https://cvde.xyz/writing/unit-economics-of-a-one-person-ai-product/",
      "title": "The unit economics of a one-person AI product",
      "summary": "hey anna's variable cost is about 40% Claude API. That single fact rewrites the pricing, the acquisition maths, and whether viral growth is load-bearing. The SaaS playbook assumed COGS was a rounding error; it isn't anymore.",
      "content_html": "<p>hey anna’s variable cost is about 40% Claude API. For every dollar of revenue, roughly forty cents leaves to pay for the model <span class=\"term\" data-term=\"inference\" data-def=\"Running a trained model to produce an output, as opposed to training it; the compute you pay for on every call.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"Running a trained model to produce an output, as opposed to training it; the compute you pay for on every call.\" data-note=\"The variable cost that makes AI unit economics unlike old SaaS: you rent it by the token, on every turn, forever.\">inference</span> that does the actual work. That single number rewrites the pricing, the acquisition maths, and the question of whether growth needs to be viral. I want to be precise about why, because the standard SaaS reasoning gives the wrong answer at every step.</p>\n<p>The classic playbook assumes cost of goods sold is a rounding error and gross margin sits near 90%. Servers are cheap, the marginal cost of one more user rounds to zero, so the strategy writes itself: acquire aggressively, land cheap, expand later, worry about cost never. That assumption was true for software whose marginal unit was a database row. It is false for software whose marginal unit is a model call you rent by the token.</p>\n<h2 id=\"the-margin-sets-the-ceiling-on-everything-else\"><a class=\"heading-anchor\" href=\"#the-margin-sets-the-ceiling-on-everything-else\">The margin sets the ceiling on everything else</a></h2>\n<p>Start with the gross margin, because every other number depends on it. At 40% variable cost, the contribution margin is 60 cents on the dollar before you have paid for anything fixed. That is not a catastrophe; it is a constraint, and constraints are the useful part. It means the question “how much can I spend to acquire a customer” has a hard answer, where the SaaS founder gets to wave their hands.</p>\n<p>Customer acquisition cost is bounded by lifetime contribution margin, not lifetime revenue. The two are the same thing when COGS is zero, which is why nobody used to draw the distinction. Here they diverge by 40%. If a customer pays sub-$1/day and stays for a year, that is roughly $300 of revenue but only about $180 of contribution. The CAC the business can sustain is set against the $180, not the $300. Spend against the revenue line and you are buying customers at a loss the spreadsheet hides until the cohort matures and the cash does not arrive.</p>\n<p>The same arithmetic kills the casual free tier. In old SaaS a free user is a marketing cost that rounds to zero; you carry them indefinitely because the storage is free and one of them might convert. Here a free user who actually uses the product is a real cash outflow every day, priced in tokens, with no revenue against it. A free tier is not a tier; it is a budget line. It has to be sized like one - capped, time-boxed, or deliberately thin - because an unbounded free tier on a usage-priced product burns runway on non-customers.</p>\n<h2 id=\"what-60-cents-actually-has-to-cover\"><a class=\"heading-anchor\" href=\"#what-60-cents-actually-has-to-cover\">What 60 cents actually has to cover</a></h2>\n<p>The 60 cents of contribution is not profit. It is the only money there is to cover everything fixed, and on a one-person product the largest fixed cost is the most invisible one: my time. There is no team to amortise across thousands of accounts. The model is a variable cost; I am the fixed cost. That framing decides what I build and, more often, what I refuse to.</p>\n<p>It changes how the price gets set, too. hey anna is positioned at sub-$1/day against analyst alternatives that run $400+/month. The headline gap looks like a generous discount. The economist’s read is different: the price has to clear the floor that 40% sets, every day, on every account, or volume makes the loss bigger rather than smaller. A pricing mistake on a 90%-margin product trims your margin once. A pricing mistake here is a per-transaction leak that scales with success. You cannot grow your way out of a negative contribution margin; you can only grow the hole.</p>\n<aside class=\"callout callout--note\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:info\">   <symbol id=\"ai:lucide:info\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><circle cx=\"12\" cy=\"12\" r=\"10\"/><path d=\"M12 16v-4m0-4h.01\"/></g></symbol><use href=\"#ai:lucide:info\"></use>  </svg> <span data-astro-cid-pyumqe5w>Note</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>The tell that you have mispriced a usage-driven product: revenue and costs both rise with adoption, and the gap between them does not widen in your favour. On a fixed-cost product, scale is the cure. On a variable-cost product, scale is the amplifier - of whichever sign your <span class=\"term\" data-term=\"unit economics\" data-def=\"The revenue and cost of a single unit of a business, such as one customer or one transaction, used to test whether the model works at the margin.\" tabindex=\"0\" aria-describedby=\"term-def-2\" title=\"The revenue and cost of a single unit of a business, such as one customer or one transaction, used to test whether the model works at the margin.\" data-note=\"When the marginal unit is a model call you rent by the token, the old assumption that cost rounds to zero stops holding.\">unit economics</span> already have.</p> </div> </aside>\n<h2 id=\"whether-viral-growth-is-load-bearing\"><a class=\"heading-anchor\" href=\"#whether-viral-growth-is-load-bearing\">Whether viral growth is load-bearing</a></h2>\n<p>This is where the 40% changes the strategy rather than just the spreadsheet. When acquisition has to be paid and CAC is bounded by a 60-cent contribution, every dollar of paid acquisition is a dollar the margin has to earn back before the customer churns. That is survivable, but it is a grind, and on a solo product there is no sales team to make the grind go faster.</p>\n<p>Referral and word-of-mouth are the one acquisition channel whose marginal cost is genuinely near zero. On a 90%-margin product that is a nice-to-have; paid acquisition works fine because the margin absorbs the CAC. On a 40%-cost product it is closer to load-bearing, because it is the only channel that does not have to be funded out of the same thin contribution margin that is also paying for the inference and for me. The question shifts from “can I buy growth” to “does the product produce growth as a by-product of being used.” If it does not, the maths says go slower, not louder.</p>\n<p>That is the real divergence from the land-and-expand orthodoxy. Land-and-expand assumes you can absorb cheap, badly-fitting customers now and sort the economics out at scale, because scale is free. Here scale is rented. So the discipline runs the other way: the product has to be good enough to spread on its own, the free tier has to be sized like the cash line it is, and the price has to clear the inference floor on day one rather than someday.</p>\n<p>None of this is a counsel of despair. A 60% contribution margin is a real business; it is just a business that has to respect its own arithmetic from the first dollar instead of the hundredth. The founders who get hurt are the ones who imported the COGS-is-zero assumption from the last decade of SaaS and only discover it was load-bearing when a successful month produces a bigger bill instead of a bigger bank balance.</p>\n<p>The model is not a fixed cost you provision once. It is a variable cost you pay per use, forever, and it is 40% of the revenue. Price like it, acquire like it, and size the free tier like it. The maths is not hard. It is just different from the maths everyone learned, and the difference is the whole business.</p>",
      "date_published": "2026-04-21T00:00:00.000Z",
      "date_modified": "2026-04-21T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "founder",
        "ai-product",
        "economics",
        "gtm"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/make-every-ai-claim-clickable/",
      "url": "https://cvde.xyz/writing/make-every-ai-claim-clickable/",
      "title": "Make every AI claim clickable",
      "summary": "Customers don't want AI that hands them the answer. They want to verify it. Deep-research-style citations are the line between an AI feature people adopt and one they paste into a doc and never open again.",
      "content_html": "<p>Customers told me the same thing in workshop after workshop: they don’t want the AI to just give them the answer. They want to check it. I heard it running AI study-analysis sessions at Lyssna, where the room was full of researchers whose whole job is to not get fooled by a confident summary. The output that impressed them in the demo was the same output they didn’t trust enough to put their name on. The gap between those two reactions is where most AI features quietly die.</p>\n<p>The thing that closes the gap is small and unglamorous. Make every claim clickable. When hey anna says a change is statistically significant, that claim is a link: click it and you land on the test behind it, the two groups, the sample sizes, the p-value, the rows it ran on. Not a restatement that it held; the working itself. When it says revenue rose 18%, the number resolves to the 340 orders it was computed from, not a summary of them but the orders. It is the same move deep-research tools made when they started footnoting every sentence back to a source you can open, and at hey anna the whole product is built on it: verification over generation, which is what makes the brand line “analyst, not chatbot” true.</p>\n<p>There is a deeper point under the citations, and it is closer to what hey anna actually is. Because every claim resolves to your own rows, the agent stops being a box you query and becomes <a href=\"/writing/ai-is-an-interface/\">an interface to your data</a>. It works in your dataset and updates it as you go; you can stay fully hands-off and let it run the analysis, or drop into the sheet and work the numbers yourself, and either way it is the same data in the same place. A collaborative workspace, where plain language is just the fastest way in.</p>\n<h2 id=\"what-clickable-actually-means\"><a class=\"heading-anchor\" href=\"#what-clickable-actually-means\">What clickable actually means</a></h2>\n<p>Clickable is not a citation icon that opens a vague “sources” panel. The bar is higher and it is specific.</p>\n<p>A claim is clickable when three things are true. The link points at the exact evidence, not the general neighbourhood: the rows, the source document, the calculation, not “the sales table.” The path is short enough that a sceptical user will actually take it, which in practice means one click, not a five-step drill-down. And the evidence is legible when they arrive, so a person can look at it and agree the claim is fair, rather than landing in a wall of raw data they now have to re-analyse themselves.</p>\n<p>Miss any of the three and you have a decoration. A citation that resolves to “rows 1 through 9,000” is honest and useless. A claim that takes four clicks to verify gets verified once, by you, in the demo, and never again by a customer. The work is in making verification cheaper than doubt.</p>\n<h2 id=\"why-it-changes-adoption-not-just-trust\"><a class=\"heading-anchor\" href=\"#why-it-changes-adoption-not-just-trust\">Why it changes adoption, not just trust</a></h2>\n<p>The intuition is that citations make people trust the AI more. That is half of it, and the less interesting half. The bigger effect is on what the user does next.</p>\n<p>An AI feature is adopted when it becomes a step in someone’s actual workflow. A summary that lands in their inbox is not a step; it’s a thing they read, nod at, and route around when the real decision gets made, because they can’t defend it to the person who asks “where did this come from?” A clickable claim survives that question. The user can forward it, cite it in a deck, drop it into a board pack, because the evidence travels with the assertion. That is the difference between a feature people try and a feature people build on.</p>\n<p>The pattern shows up as a fork in the usage data. The uncited version gets opened, admired, and abandoned; sessions are short and they don’t repeat. The cited version gets opened, clicked into, and returned to, because the user has learned that the claims hold when they pull on them. Verification is not friction you tolerate. It is the thing that lets someone stake their own credibility on your output, and people only stake their credibility on tools they can check.</p>\n<h2 id=\"what-it-costs-to-build\"><a class=\"heading-anchor\" href=\"#what-it-costs-to-build\">What it costs to build</a></h2>\n<p>This is not free, and pretending it is would set you up to cut it under deadline. Clickable claims force an architecture where every assertion is traceable back to its evidence by construction. The number has to be computed deterministically, carry a reference to its inputs, and survive the trip into the sentence the user reads.</p>\n<aside class=\"callout callout--note\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:info\">   <symbol id=\"ai:lucide:info\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><circle cx=\"12\" cy=\"12\" r=\"10\"/><path d=\"M12 16v-4m0-4h.01\"/></g></symbol><use href=\"#ai:lucide:info\"></use>  </svg> <span data-astro-cid-pyumqe5w>Note</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>The honest test for any AI feature: point at a number on the screen and ask “what produced this exact value?” If the answer is “the model decided,” the model guessed; it didn’t measure. If the answer is “this query, over these rows, here they are” then the claim is yours to stand behind.</p> </div> </aside>\n<p>In practice that means the model is never the thing doing the counting; it renders facts that were settled before it saw them. It means a UI layer that keeps the link between a phrase and its source intact instead of flattening everything into prose. And it means resisting the demo-friendly shortcut of letting the model freestyle a narrative, because a narrative with no anchors is exactly the thing your customer has already learned not to trust.</p>\n<h2 id=\"why-its-worth-it\"><a class=\"heading-anchor\" href=\"#why-its-worth-it\">Why it’s worth it</a></h2>\n<p>The cost buys two things that are hard to get any other way. It buys adoption, because a checkable claim is one a professional can act on without putting their own judgement at risk. And it buys a moat, because traceability is slow to copy and gets harder to fake as your product touches more of a customer’s real data. A competitor can match your model in a weekend; matching a system where every claim resolves to its evidence takes them as long as it took you.</p>\n<p>There’s also the economics, which happen to point the same way. hey anna runs at under a dollar a day against analyst alternatives that start north of $400 a month, and the thing that makes the cheap version trustworthy enough to replace the expensive one is not a smarter model. It’s that you can click on what it tells you. Cheap and checkable beats expensive and opaque.</p>\n<p>Build the feature so a sceptic can audit it in one click. The sceptic is your most valuable user, because the sceptic is the one who decides whether anyone else gets to depend on it.</p>",
      "date_published": "2026-03-30T00:00:00.000Z",
      "date_modified": "2026-03-30T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai-product",
        "ux",
        "trust"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/measuring-the-platonic-representation/",
      "url": "https://cvde.xyz/writing/measuring-the-platonic-representation/",
      "title": "Measuring the platonic representation",
      "summary": "The platonic representation hypothesis says capable models converge toward one shared picture of reality. A language-anchored multimodal encoder lets you test a sharp version of it: encode one concept through four senses and measure what they agree on.",
      "content_html": "<p>The <span class=\"term\" data-term=\"platonic representation hypothesis\" data-def=\"The claim that capable models, trained on different data and senses, converge towards the same internal picture of reality.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"The claim that capable models, trained on different data and senses, converge towards the same internal picture of reality.\" data-note=\"If it holds, a model's learned structure is a property of the world the data came from, not an artefact of one training set.\">platonic representation hypothesis</span> is the claim that sufficiently capable models, trained on different data and in different modalities, converge toward the same internal picture of reality. If it holds, the structure a network learns is less an artefact of its training set and more a property of the world the data came from. It’s a big claim, and most of the evidence for it is correlational: line up two models, measure how similar their representations are, note that the number keeps rising as models get more capable.</p>\n<p>A language-anchored multimodal encoder lets you test a sharper, more local version of the same idea. Take one concept - “thunder” - and encode it four ways: the word, an image of a storm, an audio clip, a short video. Each is a different sense arriving at the same shared space. The hypothesis predicts they should land in roughly the same place. The question is what “roughly” means, and where the disagreement lives.</p>\n<h2 id=\"decompose-then-compare\"><a class=\"heading-anchor\" href=\"#decompose-then-compare\">Decompose, then compare</a></h2>\n<p>A dense <span class=\"term\" data-term=\"cosine similarity\" data-def=\"A measure of how aligned two vectors are by the angle between them, used to score how close two embeddings sit in meaning.\" tabindex=\"0\" aria-describedby=\"term-def-2\" title=\"A measure of how aligned two vectors are by the angle between them, used to score how close two embeddings sit in meaning.\" data-note=\"One blurry number for agreement; a sparse feature comparison usually tells you more than the cosine does.\">cosine similarity</span> between the four <span class=\"term\" data-term=\"embedding space\" data-def=\"A high-dimensional space a model maps things into, where direction and distance encode meaning, so similar concepts sit near each other.\" tabindex=\"0\" aria-describedby=\"term-def-3\" title=\"A high-dimensional space a model maps things into, where direction and distance encode meaning, so similar concepts sit near each other.\" data-note=\"A learned concept is a location here; steer a generator towards that point and you get a picture of how the model holds it.\">embeddings</span> gives you a single blurry number. Factor each embedding into a sparse decomposition first - a handful of named features from a sparse autoencoder - and the comparison gets legible. Now you can ask not just <em>how much</em> the four senses agree, but <em>which features</em> they share and which each one carries alone.</p>\n<p>The intersection is the modality-agnostic concept: the features that fire whether thunder arrives as a word, a picture, or a sound. That shared set is the closest thing the system has to a platonic core - the part of “thunder” that survives the change of sense. The per-modality remainder is what each sense contributes on top: the audio decomposition carries low-frequency and temporal features the word never touches; the image carries visual-storm features the sound can’t express.</p>\n<h2 id=\"the-delta-is-the-carve-out\"><a class=\"heading-anchor\" href=\"#the-delta-is-the-carve-out\">The delta is the carve-out</a></h2>\n<p>That remainder is not noise to be averaged away. It is the measurement. The features audio adds and text lacks are a direct readout of what hearing thunder encodes that naming it does not. Cross-modal agreement on the feature sets - a Jaccard overlap, concept by concept - turns the platonic hypothesis from a vibe into a number you can compute per concept and compare across them.</p>\n<p>And it cuts both ways, which is what makes it honest. High overlap across modalities is evidence for a convergent core: the senses really are arriving at one representation. Low overlap is evidence against it for that concept - a sign the shared space is more <em>bound together by training</em> than <em>unified by reality</em>, with each modality keeping its own private structure under a thin coat of alignment. Either result tells you something. The experiment is worth running precisely because it can come back negative.</p>\n<p>The <a href=\"https://github.com/cal-van/multimodal-embedding-art\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">code is on GitHub<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a>; the <code>anchor-compare</code> command encodes one concept through each modality and reports the agreement, so the carve-out is something you can read off a single run.</p>",
      "date_published": "2026-03-22T00:00:00.000Z",
      "date_modified": "2026-03-22T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "interpretability",
        "generative-art"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/the-wayback-machine-as-an-investigative-tool/",
      "url": "https://cvde.xyz/writing/the-wayback-machine-as-an-investigative-tool/",
      "title": "The Wayback Machine is an investigative tool",
      "summary": "The sharpest cold open isn't you versus a competitor. It's their own public X against their own public Y - pricing page against blog, hero copy against last year's hero copy. The Wayback Machine and a diff are the cheapest research nobody runs.",
      "content_html": "<p>The sharpest cold open is a company’s own public statement set against its own public statement. Their pricing page against their engineering blog. Their case-study claims against the logos in their footer. Their hero copy today against the same company’s hero copy a year ago. The tension between two things a prospect has already said in public is specific, credible, and impossible to wave off as a pitch, because every word in it is theirs.</p>\n<p>The default outbound move is to open on yourself: here is what we do, here is why it’s good, here is a slot on my calendar. The slightly better version opens on them and a rival: here’s what your competitor just shipped. Both ask the reader to take your framing on trust. The “you versus a competitor” angle also invites the easiest reply there is, which is “we’re not them,” and now you’re arguing.</p>\n<p>An internal contradiction skips that. You are not introducing a frame; you are pointing at one that already exists on their own site. There is nothing to dispute, because you didn’t write any of it. The reader’s first reaction is not “who are you” but “wait, is that actually still up.” That second of recognition is the entire opening, and it came from a quote that is already theirs.</p>\n<h2 id=\"the-shape-that-works\"><a class=\"heading-anchor\" href=\"#the-shape-that-works\">The shape that works</a></h2>\n<p>The pattern is: their public X, their public Y, and the gap that opens when you read the two together. The gap is the message. You are not selling against it; you are noticing it out loud, the way a sharp colleague would.</p>\n<p>A few of these come up often enough to keep on hand.</p>\n<ul>\n<li><strong>Pricing page against blog.</strong> The pricing page says “built for teams of any size.” A post from the same quarter explains why they pulled out of the SMB segment to focus upmarket. The page hasn’t caught up to the strategy. That is a real operational seam, and you found it by reading two of their own pages back to back.</li>\n<li><strong>Marketing claim against the careers page.</strong> The homepage leads on “AI-native, autonomous, no humans in the loop.” The jobs board is hiring twelve support agents and three “AI quality reviewers.” Not a gotcha; a tell. It tells you where the actual work is, which is exactly the conversation worth opening.</li>\n<li><strong>Case study against the footer.</strong> The case study names a flagship customer and a stat. The footer logo wall quietly dropped that logo two refreshes ago. You don’t accuse; you ask. The asking is the open.</li>\n<li><strong>Hero copy now against hero copy then.</strong> This is where the Wayback Machine earns its keep. A company that rewrote its homepage headline from “the simplest way to X” to “the enterprise platform for X” has told you, in its own words across twelve months, that it moved upmarket. Everything downstream of that move - new buyer, new objections, new budget holder - is your opening, and they narrated it for you.</li>\n</ul>\n<p>That last one is the highest-leverage because almost nobody looks. The live site is one snapshot. The history shows what changed, and the changes are where the strategy shows.</p>\n<h2 id=\"how-to-actually-run-it\"><a class=\"heading-anchor\" href=\"#how-to-actually-run-it\">How to actually run it</a></h2>\n<p>The tooling is free and takes minutes. The reason it’s a moat is that “free and takes minutes” still loses to “nobody opened the tab.”</p>\n<p>Pull up <code>web.archive.org</code>, paste the prospect’s homepage or pricing URL, and open a capture from roughly a year ago next to the live page. Read them side by side. You are hunting for one thing: a sentence that changed, or a claim that stayed while the business behind it moved. Headlines, pricing tiers, the customer logo strip, and the nav labels are the highest-signal fields, because those are the ones a company rewrites when its strategy turns.</p>\n<p>When you find the seam, the message writes itself, and it stays short. Name the then, name the now, ask the one question the gap implies. “Your homepage led on self-serve simplicity last March; today it leads on enterprise security and SSO. Usually that shift means the buyer changed from the IC to a VP. Did the way you handle inbound change with it?” No pitch in that. Just a precise observation and a question.</p>\n<aside class=\"callout callout--note\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:info\">   <symbol id=\"ai:lucide:info\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><circle cx=\"12\" cy=\"12\" r=\"10\"/><path d=\"M12 16v-4m0-4h.01\"/></g></symbol><use href=\"#ai:lucide:info\"></use>  </svg> <span data-astro-cid-pyumqe5w>Note</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>The test for a good internal-contradiction open: could the prospect forward it to a colleague with the note “is this still true?” If yes, you’ve found a real seam and they’ll feel it too. If the only person who finds it interesting is you, it’s a gotcha, and gotchas get deleted.</p> </div> </aside>\n<p>The reason this beats research that costs money is that paid intent data tells you a company is “in market”; their own archived pages tell you why, in their own language, with the timeline attached. One is a signal you rent. The other you read for free, and nobody else bothered to.</p>\n<p>Most outbound research stops at the live homepage because the live homepage is what loads first. The version from a year ago is one paste away, and the distance between the two is where the whole message lives.</p>",
      "date_published": "2026-02-26T00:00:00.000Z",
      "date_modified": "2026-02-26T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "gtm",
        "outbound"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/cancel-the-meeting/",
      "url": "https://cvde.xyz/writing/cancel-the-meeting/",
      "title": "Cancel the meeting",
      "summary": "When the data isn't ready ten minutes before a stakeholder readout, the move is to cancel and send it async later. Forcing the numbers into a calendar slot anchors everyone on figures you'll want back tomorrow.",
      "content_html": "<p>Ten minutes before a stakeholder readout, the numbers are not right. The query is still running, or it finished and the figure looks wrong in a way you cannot yet explain, or you found a join that double-counts and you are not sure how far the damage spreads. The room is booked. People are walking over.</p>\n<p>Cancel it. Send the analysis async once it is solid.</p>\n<p>This feels like the weak move and it is the strong one, for a reason that has nothing to do with looking diligent and everything to do with how the people in that room form beliefs.</p>\n<h2 id=\"anchoring-is-the-whole-argument\"><a class=\"heading-anchor\" href=\"#anchoring-is-the-whole-argument\">Anchoring is the whole argument</a></h2>\n<p>The first number a stakeholder hears becomes the number they reason from. Not consciously; that is the point. <span class=\"term\" data-term=\"anchoring\" data-def=\"The bias where the first number you hear becomes the reference point you reason from, even after you learn it was wrong.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"The bias where the first number you hear becomes the reference point you reason from, even after you learn it was wrong.\" data-note=\"Why a preliminary figure in a readout is a trap: the caveat is processed as language, the number as fact, and the fact wins.\">Anchoring</span> is one of the most robust findings in decision research, and it does not switch off because everyone in the room is senior and busy. You say “looks like roughly 40% retention” with a verbal asterisk that it is preliminary, and what lodges is <em>40%</em>. Within the hour the caveat is gone and the figure remains.</p>\n<p>Now the number was wrong, because of course it was, you said as much. Tomorrow the real figure is 28%. You send the correction. Here is what does not happen: the 40% does not get cleanly overwritten by the 28%. What happens instead is that the 28% gets received as a <em>retreat from</em> 40%. The stakeholder’s mental model is now anchored on the high number. So the true number reads as bad news, a disappointment, a walk-back, rather than what it actually is, which is the answer. You spend your correction’s energy fighting the impression you created instead of delivering the finding.</p>\n<p>The correction never fully catches up to the first impression. That is the mechanism, and it is why the calendar slot is a trap. The slot does not care whether your analysis is ready. It only cares that it is 2pm. If you let the slot set the timing, an arbitrary calendar slot plants a number in five people’s heads that you’ll spend a week dislodging.</p>\n<h2 id=\"the-hidden-cost-is-not-the-wasted-half-hour\"><a class=\"heading-anchor\" href=\"#the-hidden-cost-is-not-the-wasted-half-hour\">The hidden cost is not the wasted half hour</a></h2>\n<p>The instinct that keeps the meeting on is that cancelling wastes the slot and makes you look unprepared. But the slot is the cheap thing. Thirty minutes of calendar is recoverable. A stakeholder anchored on a wrong number is not, at least not for free. You pay later, in credibility, instead of now, in schedule.</p>\n<p>There is a second-order cost too. The next time you present a real, solid number, people remember the figure that moved overnight. People discount you a little, hedge against your figures, ask for the workings they did not used to ask for. A reputation for numbers that hold is worth more than a reputation for never missing a meeting, and the two trade against each other exactly in moments like this one.</p>\n<h2 id=\"the-objection-and-the-answer\"><a class=\"heading-anchor\" href=\"#the-objection-and-the-answer\">The objection, and the answer</a></h2>\n<p>The obvious objection: cancelling looks like you are not delivering. You had one job, the readout, and you bailed.</p>\n<p>Answer it directly, in the cancellation itself. The message is not “sorry, not ready.” It is: “Holding this until the numbers are verified; I found something in the data that I am not willing to put in front of you until I trust it. Full analysis to you by tomorrow morning.” That sentence does not read as a failure to deliver. It reads as someone who refuses to put a figure in front of decision-makers before it is true. Which is, in fact, the job. The job was never to fill the slot. The job was to give people numbers they can bet on.</p>\n<p>You will feel the pull to present anyway, caveat heavily, and fix it later. The caveat is the part that does not work. You cannot caveat your way out of anchoring. The caveat is processed as language and the number is processed as fact, and the fact wins. Saying “this is rough” does not stop the rough number from becoming the anchor. The only move that actually prevents the anchor is not saying the number.</p>\n<p>So do not say it. Cancel the meeting, verify the figure, and send the one you would still stand behind tomorrow.</p>",
      "date_published": "2026-02-10T00:00:00.000Z",
      "date_modified": "2026-02-10T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "product",
        "leadership"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/stated-versus-revealed-preference/",
      "url": "https://cvde.xyz/writing/stated-versus-revealed-preference/",
      "title": "Stated versus revealed preference, in booking data",
      "summary": "Travel marketing assumes people book what they say they want. The booking data says otherwise, especially on luxury tier and trip length. The gap between stated and revealed preference is where the useful product decisions live.",
      "content_html": "<p>Travel marketing assumes people book the holiday they describe. Ask someone what they want and they will tell you, sincerely, about the two-week luxury escape: the suite, the long unhurried itinerary, the trip they have earned. Then watch the booking data, and a different person shows up. They book the shorter stay, a tier down from the one they clicked “interested” on, and they do it consistently enough that the gap stops looking like noise and starts looking like a fact about people.</p>\n<p>I spent a stretch at Luxury Escapes with that data in front of me, and the gap between what travellers said and what they paid for was wide, stable, and most pronounced on exactly the two dimensions the marketing leaned hardest into: luxury tier and trip length. Self-description reached for the aspirational version. The credit card reached for something more constrained. This is the oldest distinction in the discipline I was trained in, and it is worth naming plainly, because once you see it you cannot unsee it in any product where users describe themselves.</p>\n<h2 id=\"two-preferences-and-only-one-of-them-pays\"><a class=\"heading-anchor\" href=\"#two-preferences-and-only-one-of-them-pays\">Two preferences, and only one of them pays</a></h2>\n<p>Economists separate <span class=\"term\" data-term=\"revealed preference\" data-def=\"What someone's choices show they actually want, read from what they gave up to get it, rather than what they say they want.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"What someone's choices show they actually want, read from what they gave up to get it, rather than what they say they want.\" data-note=\"Here: the booking, not the survey. When the two disagree, trust the booking, and build for the customer who completes the purchase.\">stated preference</span> from revealed preference for a reason. Stated preference is what you say you want, gathered from surveys, sign-up questionnaires, the things you favourite, the box you tick. Revealed preference is what your choices show you actually wanted, inferred from what you gave something up to get. The two are different measurements of different things, and when they disagree, the money is on revealed.</p>\n<p>The reason they diverge is not that people lie. They do not. Stated preference is sincere; it is just answering a different question. When you ask someone what holiday they want, they answer with their aspirations, their self-image, the version of themselves they would like the survey to record. No budget constraint, no calendar, no school-holiday Tetris, no quiet sense that fourteen days is a lot of days to be away from home. Revealed preference is the same person after all of those constraints have done their work. The survey captures the wish. The booking captures the trade-off. A product that listens only to the survey is building for a customer who does not exist at the moment of purchase.</p>\n<h2 id=\"the-duration-gap-concretely\"><a class=\"heading-anchor\" href=\"#the-duration-gap-concretely\">The duration gap, concretely</a></h2>\n<p>Trip length is the cleanest example, because the gap is so legible. People describe long trips and book shorter ones. The stated preference points one way and the revealed preference, aggregated across enough bookings to drown out the noise, points reliably shorter. The same divergence shows up in tier: aspiration drifts up the luxury ladder, the booking settles a rung below.</p>\n<p>If you take the stated signal at face value, you merchandise long luxury trips, lead with them, optimise the page for them, and quietly underperform - because you have aimed the whole funnel at the holiday people enjoy imagining rather than the one they complete. The fix is not to stop offering the aspirational trip. It is to stop treating the survey answer as a forecast of behaviour and start treating it as what it is: a weak signal about a strong feeling.</p>\n<p>The most durable lesson from that period was not even about preference; it was about sequencing. An earlier recommendation and email project taught it: most of the revenue lift came from getting the order right - which offer to surface next - rather than from personalising harder on who someone said they were. Behaviour responded to the sequence of what it was shown far more than to a richer model of stated identity. Revealed preference was telling us how to order the page. Stated preference, mostly, was telling us a flattering story.</p>\n<h2 id=\"where-to-trust-behaviour-and-where-not-to\"><a class=\"heading-anchor\" href=\"#where-to-trust-behaviour-and-where-not-to\">Where to trust behaviour, and where not to</a></h2>\n<p>The product rule that falls out of this is simple to state and surprisingly hard to hold: build for what people do, instrument the gap, and treat stated preference as the weak signal it is. Weight the behaviour. When the survey and the booking disagree, the booking wins, and your job is to design for the customer who actually completes the purchase, not the one who fills in the form.</p>\n<p>But weak is not worthless, and there is one place stated preference is the only signal you have. Revealed preference can only reveal a choice that was on the menu. For anything genuinely new - a destination, a format, a price point a customer has never had the chance to choose - behavioural data is silent, because the behaviour has never had the opportunity to happen. There is nothing to reveal yet. There the survey, the interview, the “would you want this” is all you have, and you read it for direction while knowing it will over-promise on magnitude. The discipline is matching the signal to the question: trust behaviour where a choice already exists, fall back to what people tell you only where it does not, and never confuse the second case for the first.</p>\n<p>I saw the front of this before I ever saw the data, selling travel across the counter. A revealed preference is only as honest as the menu you put in front of someone, and editing that menu is most of an agent’s craft. Nobody walked in asking to be upgraded; left alone, every traveller revealed a tidy preference for economy. But ask the question, and frame the fare against the twelve-hour flight rather than against the economy seat, and a real share said yes. The upgrade booked at roughly four times the fare, usually at better margin. That economy “preference” was never a preference; it was the only option anyone had offered.</p>\n<p>The same move ran in reverse, pulling a booking back up toward what someone had said they wanted. Put two hotels side by side, the mid-tier one they were about to book and the five-star they had earlier called the whole point of the trip, and the contrast does some of the work: Cialdini’s contrast principle, the reason you show the dear room first. The rest is consistency, another of his levers. You name the gap out loud - “you told me this was the trip you’d earned, and you’re booking the room you’d pick for a work conference” - and people, wanting their choices to line up with their words, often close it themselves. The aspiration was real; it had simply lost to the budget in the moment, and the job was to give it a fair hearing. The booking data I’d later stare at had the same blind spot from the other side: it could only reveal the preferences my menu had allowed.</p>\n<p>The gap between what people say and what they do is not a measurement error to be cleaned out of the data. It is the most useful thing the data has to tell you, because it is the difference between the customer in the survey and the customer at the checkout. Build for the one who completes the purchase.</p>",
      "date_published": "2026-01-20T00:00:00.000Z",
      "date_modified": "2026-01-20T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "economics",
        "product",
        "behavioural-economics"
      ]
    },
    {
      "id": "https://cvde.xyz/work/hey-anna-founding/",
      "url": "https://cvde.xyz/work/hey-anna-founding/",
      "title": "hey anna: founding an analyst, not a chatbot",
      "summary": "Zero to a paid, billing-enabled product in under six months, solo and bootstrapped. The analyst that tells you why your numbers moved.",
      "content_html": "<aside class=\"callout callout--note\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:info\">   <symbol id=\"ai:lucide:info\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><circle cx=\"12\" cy=\"12\" r=\"10\"/><path d=\"M12 16v-4m0-4h.01\"/></g></symbol><use href=\"#ai:lucide:info\"></use>  </svg> <span data-astro-cid-pyumqe5w>Note</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>This is a founding-in-public case study. It’s written from the inside of a company I’m still building, so it gets updated as hey anna does. Treat the dates and numbers as a snapshot, not a finished account.</p> </div> </aside>\n<h2 id=\"context\"><a class=\"heading-anchor\" href=\"#context\">Context</a></h2>\n<p>Most businesses sit on the data that would answer their questions and can’t get to the answer.</p>\n<p>The numbers live in Stripe, HubSpot, GA4, Mailchimp, and a long tail of social and finance tools. The questions are ordinary: why did revenue dip last week, which segment is actually growing, is this campaign working or just noisy, what are customers complaining about. Answering them properly means joining sources, running real statistics, and writing it up so a decision-maker can act. That is an analyst’s job, and a good analyst costs upwards of $400 a month if you can find one - well out of reach for most of the businesses that need one most.</p>\n<p>The market’s answer in 2026 is the chatbot: ask a question, get a confident paragraph. The problem is that a chatbot guesses. It will tell you revenue is up because of a campaign without running a significance test, segmenting the cohort, or checking whether the move is even outside normal variance. It sounds like an analyst and does none of the work. For anything you’d make a decision on, that gap is the whole game.</p>\n<p>I started hey anna in January 2026 to close it.</p>\n<h2 id=\"decision\"><a class=\"heading-anchor\" href=\"#decision\">Decision</a></h2>\n<p>The positioning decision came first, and everything else followed from it: <strong>analyst, not chatbot.</strong></p>\n<p>That is a constraint, not a tagline. A chatbot’s job is to produce a plausible answer. An analyst’s job is to produce a defensible one: connect to the real data, run actual statistical analysis, show the working, and only then state a conclusion. Committing to “analyst” meant hey anna had to do the unglamorous parts a chatbot skips: forecasts with confidence intervals, significance tests before claiming a change is real, segmentation that holds up, and themed analysis of verbatim text rather than vibes.</p>\n<p>The pricing decision came second and reinforced the first. hey anna is priced under $1 a day, against analyst alternatives that start north of $400 a month. The point isn’t to be cheap; it’s to put a real analyst in reach of a business that could never justify hiring one. The price defines the customer: the operator who has the data and the questions but not the headcount.</p>\n<p>The third decision was how to build it: solo and bootstrapped, with billing live as early as possible. Not as a constraint to apologise for, but as a forcing function. Charging from early means the product has to be worth paying for from early, which kills the temptation to build features nobody validated. Live billing is the honest gate: either someone pays for the analysis or they don’t.</p>\n<h2 id=\"what-i-built\"><a class=\"heading-anchor\" href=\"#what-i-built\">What I built</a></h2>\n<p>hey anna connects to Stripe, HubSpot, GA4, Mailchimp, and 25+ social, marketing, and finance tools, runs real statistical analysis on what it pulls, and writes it up as an exec-ready report. The work splits three ways: get the data, do the analysis properly, deliver it in a form someone will read.</p>\n<p>On the analysis: forecasts, significance tests, and segmentation - the actual statistical work that separates an answer from a guess. When hey anna says a number moved, it has tested whether the move is real before saying so.</p>\n<p>On unstructured data: it themes verbatim text from reviews, surveys, and support tickets, so “what are customers unhappy about” gets a grounded answer drawn from what they actually wrote, not a summary of a summary.</p>\n<p>Across the product, 13 core features have shipped, including:</p>\n<ul>\n<li><strong>AI Formulas</strong> - an <code>=AI()</code> function that runs analysis inline, in the place people already think in rows and columns.</li>\n<li><strong>Smart Columns</strong> - columns that compute themselves from a description of what you want.</li>\n<li><strong>AI Memory</strong> - context that persists across analyses, so hey anna learns your business rather than starting cold every session.</li>\n<li><strong>Public Reports</strong> - shareable, exec-ready outputs that go to a stakeholder without a login.</li>\n</ul>\n<p>The stack is chosen to let one person ship and operate a real product: React on the front end; Cloudflare for the platform (Workers for compute, D1 for data, R2 for storage); Anthropic’s Claude as the primary model, with Google and OpenAI available; Paddle for billing; PostHog for analytics. The constraint of building solo pushed every choice toward managed, edge-native infrastructure that doesn’t need a team to keep alive.</p>\n<h2 id=\"outcome\"><a class=\"heading-anchor\" href=\"#outcome\">Outcome</a></h2>\n<p>hey anna went from zero to a paid, billing-enabled product in under six months, solo and bootstrapped. It is live, with billing enabled and customers paying.</p>\n<p>Thirteen core features shipped in that window, by one person, on infrastructure one person can run. The positioning held up under contact: framing it as an analyst rather than a chatbot is what makes the statistical rigour a feature people will pay for instead of a detail they ignore. The sub-$1/day price puts it in front of the operators it was built for: businesses with the data and the questions but no analyst.</p>\n<p>The number that matters most isn’t a metric yet; it’s that the gate is real. The product charges money, and people pay it.</p>\n<h2 id=\"pricing-and-unit-economics\"><a class=\"heading-anchor\" href=\"#pricing-and-unit-economics\">Pricing and unit economics</a></h2>\n<p>Pricing is the decision I think about hardest, and for a one-person AI product the usual playbook doesn’t quite fit. The model is seats plus usage, with overage built in. That’s the direction Microsoft has taken with Copilot, and for the same reason: a tool used by both people and agents can’t be priced as if only people touch it. An agent can do a day’s work in a minute, so a flat per-seat price either over-charges the light user or hands the heavy one a loss. Usage tracks the thing that actually costs money.</p>\n<p>Underneath the price is a cost I own rather than pass on. The biggest variable cost is model <span class=\"term\" data-term=\"inference\" data-def=\"Running a trained model to produce an output, as opposed to training it; the compute you pay for on every call.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"Running a trained model to produce an output, as opposed to training it; the compute you pay for on every call.\" data-note=\"The variable cost that makes AI unit economics unlike old SaaS: you rent it by the token, on every turn, forever.\">inference</span>, and the lever that moves it most is cache hit rate: Anthropic charges a premium to write to the prompt cache and a fraction of that to read from it, so the gap between a well-structured request and a naive one is most of the margin. Maximising cache hits is one of the numbers I watch most closely, and it’s deliberately an internal one. A customer shouldn’t pay more because I haven’t done my engineering; the cost of my own inefficiency is mine to fix, not theirs to carry.</p>\n<p>Usage-based pricing is good because it ties what a customer pays to what they use. The ideal goes one step further and ties price to the outcome. The clearest examples are in support, where an agent’s action maps onto a result the business already prices: if a ticket costs a company $30 to handle and an agent resolves or deflects it for $9, the trade is obvious, the risk is low, and you can charge for the result rather than the tokens. That is the cleanest version of cost aligned to value there is. hey anna’s outcomes are harder to count than a deflected ticket, so I’m not there yet, but it’s where the value-alignment argument points and where I want the pricing to head as the signal sharpens.</p>\n<p>All of this matters more because I’m bootstrapped. There’s no venture money behind hey anna to buy market share or prop up a price that doesn’t work, and I’m not running paid ads yet; growth is meant to be sustainable, which is a polite way of saying slower, with fewer eyeballs on the product. A funded competitor can underprice for years and make it up later. I can’t, so the price has to earn its keep from the first customer, and getting it roughly right matters more for me than it would for someone growing on someone else’s money.</p>\n<p>And the honest part: hey anna sits at under $1 a day, which is roughly the $29 consumer default I’ve argued against in writing, and the exact default I <a href=\"/work/brand-ninja-repositioning/\">killed at Brand Ninja</a>. I’m aware of the irony, and I think the product is worth well more than that. I’m charging it anyway, on purpose. Right now the opportunity cost of an empty room is higher than the revenue I’m leaving on the table: I’d rather have customers teach me what hey anna is worth than a higher number protect a value nobody has paid yet. I’m optimising for learning. The floor is a decision I’ve parked deliberately, and I’ll raise it once the customers have shown me where it belongs.</p>\n<h2 id=\"what-id-do-differently\"><a class=\"heading-anchor\" href=\"#what-id-do-differently\">What I’d do differently</a></h2>\n<p>I scoped early features before I’d watched enough people use the product, and a few of the 13 were built ahead of the demand for them. Shipping fast solo is an advantage, but it’s also how you accumulate surface area that has to be maintained whether or not it earns its keep. I’d hold more features at the idea stage until a paying customer pulled them out of me.</p>\n<p>I was also slower to turn billing on than the “live billing as a forcing function” principle deserved. I believed it in the abstract before I acted on it; there’s a stretch where I was building toward a paid product rather than running one. The lesson is that the forcing function only works once it’s switched on; until then it’s a plan, and a plan doesn’t tell you what people will actually pay for.</p>\n<p>And being both the entire product team and the entire go-to-market team means the two compete for the same hours. I’ve defaulted to building, because building is what I’m fastest at, which means distribution has lagged the product. For the next stretch the harder, more valuable work is getting hey anna in front of the operators it was made for, not adding to what it can already do.</p>\n<p>Live billing was one forcing function; shipping before it feels ready is the other, and it’s the one I find hardest to apply to my own work. When I build for a business or a team, I ship rough on purpose: the work serves a goal, shipping fast is how you learn fast, and my ego isn’t in the file. My own product represents me more directly than anything I’ve delivered inside a company, so I feel the pull to polish past the point that earns its keep and to hold it to a personal standard instead of a commercial one. The discipline I’m practising on myself is the one I’d apply without hesitation to someone else’s roadmap: ship before it feels ready, because ready is a feeling, and only the customer can tell you the truth.</p>",
      "date_published": "2026-01-01T00:00:00.000Z",
      "date_modified": "2026-01-01T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ]
    },
    {
      "id": "https://cvde.xyz/work/multimodal-embedding-art/",
      "url": "https://cvde.xyz/work/multimodal-embedding-art/",
      "title": "Multimodal embedding art: concept algebra in a shared representation space",
      "summary": "Optimising generative models toward coordinates in a 768-dimension multimodal embedding space: surfacing the platonic ideals a neural net has learned, then doing arithmetic on them across text, image, audio, and video.",
      "content_html": "<h2 id=\"context\"><a class=\"heading-anchor\" href=\"#context\">Context</a></h2>\n<p>A text-to-image model can draw you a goldfish. That’s a useful trick and a dull question. The more interesting one sits underneath it: what does <em>goldfish</em> look like to the model itself - not a goldfish, but the direction in representation space that means goldfish, cranked to maximum?</p>\n<p>Neural networks learn internal representations of concepts. A multimodal encoder learns them in a space shared across modalities, so the <span class=\"term\" data-term=\"embedding space\" data-def=\"A high-dimensional space a model maps things into, where direction and distance encode meaning, so similar concepts sit near each other.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"A high-dimensional space a model maps things into, where direction and distance encode meaning, so similar concepts sit near each other.\" data-note=\"A learned concept is a location here; steer a generator towards that point and you get a picture of how the model holds it.\">embedding</span> for the word “thunder,” an audio clip of thunder, and an image of a storm all land near each other. That shared space is a map of what the network thinks concepts <em>are</em>. This project is an attempt to render points on that map directly, by treating generation as an optimisation problem in embedding space rather than a prompt.</p>\n<p>It sits where mechanistic interpretability meets generative art. The interpretability question (what has the network actually learned, and can we see it) is the same question whether you frame the output as a diagnostic or as an image worth looking at. The aesthetic and the analytic are the same act here: maximally activating an internal representation and looking at what comes out.</p>\n<figure class=\"ef\" data-embedding-field data-astro-cid-xxjpzuzg> <div class=\"ef__stage\" data-astro-cid-xxjpzuzg> <canvas class=\"ef__canvas\" aria-hidden=\"true\" data-astro-cid-xxjpzuzg></canvas> <div class=\"ef__hud\" aria-hidden=\"true\" data-astro-cid-xxjpzuzg> <span class=\"ef__anchor\" data-ef-anchor data-astro-cid-xxjpzuzg>≈ { … }</span> <span class=\"ef__metric\" data-ef-metric data-astro-cid-xxjpzuzg>agreement ·· 0.00</span> </div> <span class=\"ef__track-label ef__track-label--honest\" aria-hidden=\"true\" data-astro-cid-xxjpzuzg>honest</span> <span class=\"ef__track-label ef__track-label--natural\" aria-hidden=\"true\" data-astro-cid-xxjpzuzg>natural</span> </div> <div class=\"ef__controls\" data-astro-cid-xxjpzuzg> <div class=\"ef__concepts\" role=\"group\" aria-label=\"Choose a concept to render\" data-astro-cid-xxjpzuzg> <button type=\"button\" class=\"ef__concept\" data-ef-concept=\"thunder\" aria-pressed=\"true\" data-astro-cid-xxjpzuzg> thunder </button><button type=\"button\" class=\"ef__concept\" data-ef-concept=\"goldfish\" aria-pressed=\"false\" data-astro-cid-xxjpzuzg> goldfish </button><button type=\"button\" class=\"ef__concept\" data-ef-concept=\"ocean\" aria-pressed=\"false\" data-astro-cid-xxjpzuzg> ocean </button><button type=\"button\" class=\"ef__concept\" data-ef-concept=\"firewater\" aria-pressed=\"false\" data-astro-cid-xxjpzuzg> fire + water </button> </div> <label class=\"ef__scrub\" data-astro-cid-xxjpzuzg> <span class=\"ef__scrub-end\" data-astro-cid-xxjpzuzg>honest</span> <input type=\"range\" min=\"0\" max=\"1000\" value=\"320\" step=\"1\" data-ef-slider aria-label=\"Render weight, from honest representation to natural image\" data-astro-cid-xxjpzuzg> <span class=\"ef__scrub-end\" data-astro-cid-xxjpzuzg>natural</span> </label> </div> <figcaption class=\"ef__caption\" data-astro-cid-xxjpzuzg>\nA hand-built evocation, not a system output. Drag from honest to natural: the honest side holds\n\t\tthe representation the model optimises toward; the natural side resolves it under a diffusion\n\t\tprior toward a recognisable image. The gap between them is the subject.\n</figcaption> </figure> <script type=\"module\" src=\"/Users/cal/Documents/GitHub/callum-portfolio/apps/web/src/components/mdx/EmbeddingField.astro?astro&type=script&index=0&lang.ts\"></script>\n<h2 id=\"decision\"><a class=\"heading-anchor\" href=\"#decision\">Decision</a></h2>\n<p>The framing decision was to optimise toward embedding coordinates rather than condition on a prompt.</p>\n<p>A normal generative pipeline takes a prompt, runs it through the model, and returns a sample. The concept lives in the prompt; you never touch the representation. The decision here was to invert that: pick a target point in the embedding space, then optimise a generator’s latent to maximise the similarity between its output’s embedding and that target. The concept becomes a coordinate you can manipulate, not a sentence you have to phrase.</p>\n<p>That choice is what makes concept algebra possible. Once a concept is a vector, you can do arithmetic on it: add two concepts, subtract an attribute, interpolate between two points. You can’t cleanly add “fire” and “water” in a prompt; you can add their embeddings. The output isn’t training data retrieved and blended; it’s the model’s own answer to “what lies at this coordinate,” reconstructed from scratch by optimisation.</p>\n<p>The encoder is <strong>LanguageBind</strong>, which binds image, audio, video, and text into one 768-dimension space anchored on language. A target can be assembled from any mix of those modalities, and a generator for one modality can be optimised toward a target defined in another. That cross-modal property is the whole reason to use a shared space rather than a per-modality one. <span class=\"term\" data-term=\"anchoring\" data-def=\"The bias where the first number you hear becomes the reference point you reason from, even after you learn it was wrong.\" tabindex=\"0\" aria-describedby=\"term-def-2\" title=\"The bias where the first number you hear becomes the reference point you reason from, even after you learn it was wrong.\" data-note=\"Why a preliminary figure in a readout is a trap: the caveat is processed as language, the number as fact, and the fact wins.\">Anchoring</span> on language earns a second thing for free: any point in the space can be projected back toward the text encoder’s vocabulary, so a non-text embedding has a running human-readable readout. Mid-optimisation you can ask “what does the current image embed as,” and get back <code>{storm, crackling, electric}</code>. That single property is what turns the optimisation loop from a black box into something you can watch think. (The original build used ImageBind in a 1024-dimension space; LanguageBind is its direct successor, covers the same modalities, and exposes ViT patch tokens natively, which the loss now leans on.)</p>\n<p>The deeper representation decision is that a concept is not really the dense vector; it’s a sparse feature decomposition, and the vector is derived from it. A Matryoshka sparse autoencoder factors each embedding into a handful of named, active features. That matters because arithmetic in dense space is blunt: adding two normalised vectors averages every dimension, and subtraction is a vague nudge. Arithmetic over decompositions is sharp: addition unions the <em>named</em> features, subtraction removes a <em>specific</em> one, interpolation morphs the feature mixture. So <code>Concept</code> carries its decomposition as primary state and composes in feature space whenever both operands have one. “Fire plus water” stops being a blended average and becomes a union of what the model knows about each.</p>\n<h2 id=\"what-i-built\"><a class=\"heading-anchor\" href=\"#what-i-built\">What I built</a></h2>\n<p>A system that takes any combination of text, image, audio, and video as input concepts, combines them with feature-space arithmetic, and optimises a generative model’s latent toward the target. One command, <code>showcase</code>, renders a single target across all four modalities at once and emits an interpretation bundle and an evaluation card alongside the artefacts.</p>\n<p>Generation runs through modality-specific decoders: the <strong>Stable Diffusion 3.5 Medium</strong> latent for images, <strong>Stable Audio Open</strong> for audio, and <strong>LTX-Video</strong> for video. Optimisation is gradient descent on the generator’s latent, with a default of 2000 steps at a learning rate of 0.1. (The original build paired ImageBind with the SDXL VAE, AudioLDM 2, and Stable Video Diffusion; those still exist as legacy ablation backbones, but the flow-matching SD3.5 latent and the newer audio and video models are sharper and the licences are cleaner.)</p>\n<p>The concept algebra is the interface:</p>\n<div class=\"expressive-code\"><link rel=\"stylesheet\" href=\"/_astro/ec.yl275.css\"/><script type=\"module\" src=\"/_astro/ec.0vx5m.js\"></script><figure class=\"frame\"><figcaption class=\"header\"></figcaption><pre data-language=\"python\"><code><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#8B8B8BFC;--1:#616972\"># Addition: combine two concepts</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">fire_water </span><span style=\"--0:#A0A0A0;--1:#BF3441\">=</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.from_text(</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;fire&quot;</span><span style=\"--0:#FFFFFF;--1:#24292E\">, encoder) </span><span style=\"--0:#A0A0A0;--1:#BF3441\">+</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.from_text(</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;water&quot;</span><span style=\"--0:#FFFFFF;--1:#24292E\">, encoder)</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#8B8B8BFC;--1:#616972\"># Weighted combination</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">sunset_ocean </span><span style=\"--0:#A0A0A0;--1:#BF3441\">=</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#FFC799;--1:#005CC5\">0.3</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#A0A0A0;--1:#BF3441\">*</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.from_text(</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;sunset&quot;</span><span style=\"--0:#FFFFFF;--1:#24292E\">, encoder) </span><span style=\"--0:#A0A0A0;--1:#BF3441\">+</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#FFC799;--1:#005CC5\">0.7</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#A0A0A0;--1:#BF3441\">*</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.from_text(</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;ocean&quot;</span><span style=\"--0:#FFFFFF;--1:#24292E\">, encoder)</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#8B8B8BFC;--1:#616972\"># Subtraction: remove an attribute</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">hairless </span><span style=\"--0:#A0A0A0;--1:#BF3441\">=</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.from_text(</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;dog&quot;</span><span style=\"--0:#FFFFFF;--1:#24292E\">, encoder) </span><span style=\"--0:#A0A0A0;--1:#BF3441\">-</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#FFC799;--1:#005CC5\">0.3</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#A0A0A0;--1:#BF3441\">*</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.from_text(</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;fur&quot;</span><span style=\"--0:#FFFFFF;--1:#24292E\">, encoder)</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#8B8B8BFC;--1:#616972\"># Spherical interpolation between two points</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">midpoint </span><span style=\"--0:#A0A0A0;--1:#BF3441\">=</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.slerp(concept_a, concept_b, </span><span style=\"--0:#FFFFFF;--1:#AE4B07\">t</span><span style=\"--0:#A0A0A0;--1:#BF3441\">=</span><span style=\"--0:#FFC799;--1:#005CC5\">0.5</span><span style=\"--0:#FFFFFF;--1:#24292E\">)</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#8B8B8BFC;--1:#616972\"># Cross-modal: an audio concept plus a text concept, rendered as an image</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">thunder_purple </span><span style=\"--0:#A0A0A0;--1:#BF3441\">=</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.from_audio(</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;thunder.wav&quot;</span><span style=\"--0:#FFFFFF;--1:#24292E\">, encoder) </span><span style=\"--0:#A0A0A0;--1:#BF3441\">+</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#FFC799;--1:#005CC5\">0.3</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#A0A0A0;--1:#BF3441\">*</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Concept.from_text(</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;purple&quot;</span><span style=\"--0:#FFFFFF;--1:#24292E\">, encoder)</span></div></div></code></pre><div class=\"copy\"><div aria-live=\"polite\"></div><button title=\"Copy to clipboard\" data-copied=\"Copied!\" data-code=\"# Addition: combine two conceptsfire_water = Concept.from_text(&quot;fire&quot;, encoder) + Concept.from_text(&quot;water&quot;, encoder)# Weighted combinationsunset_ocean = 0.3 * Concept.from_text(&quot;sunset&quot;, encoder) + 0.7 * Concept.from_text(&quot;ocean&quot;, encoder)# Subtraction: remove an attributehairless = Concept.from_text(&quot;dog&quot;, encoder) - 0.3 * Concept.from_text(&quot;fur&quot;, encoder)# Spherical interpolation between two pointsmidpoint = Concept.slerp(concept_a, concept_b, t=0.5)# Cross-modal: an audio concept plus a text concept, rendered as an imagethunder_purple = Concept.from_audio(&quot;thunder.wav&quot;, encoder) + 0.3 * Concept.from_text(&quot;purple&quot;, encoder)\"><div></div></button></div></figure></div>\n<h3 id=\"what-the-optimiser-actually-matches\"><a class=\"heading-anchor\" href=\"#what-the-optimiser-actually-matches\">What the optimiser actually matches</a></h3>\n<p>The naive loss is <span class=\"term\" data-term=\"cosine similarity\" data-def=\"A measure of how aligned two vectors are by the angle between them, used to score how close two embeddings sit in meaning.\" tabindex=\"0\" aria-describedby=\"term-def-3\" title=\"A measure of how aligned two vectors are by the angle between them, used to score how close two embeddings sit in meaning.\" data-note=\"One blurry number for agreement; a sparse feature comparison usually tells you more than the cosine does.\">cosine similarity</span> on the pooled embedding, and it has a known failure mode: it matches the model’s final answer the way an adversarial image matches a classifier’s top label, which is to say with high-frequency noise that scores well and looks like nothing. Pooled cosine is now a metric I log at every step, not the signal I optimise. The signal is two tighter terms. <strong>Patch-token alignment</strong> matches LanguageBind’s ViT patch features rather than the pooled summary, which preserves spatial structure: a goldfish-shaped goldfish instead of goldfish texture smeared across the frame. <strong>Feature-direction alignment</strong> matches the SAE decomposition, so the optimiser is pushed toward the <em>named</em> features the target carries, not just its dense direction.</p>\n<p>Holding the output inside the distribution the decoder expects used to be a hand-tuned job: total variation for smoothness, a spectral penalty on high frequencies, a latent-norm term. Those still exist on the legacy path, but the load-bearing version now is a learned one. A <strong>variational score-<span class=\"term\" data-term=\"distillation\" data-def=\"Training a smaller, cheaper model on the outputs of a larger one to copy most of its capability.\" tabindex=\"0\" aria-describedby=\"term-def-4\" title=\"Training a smaller, cheaper model on the outputs of a larger one to copy most of its capability.\" data-note=\"How a follower catches the frontier a quarter or two after it is set, for a fraction of the cost.\">distillation</span> prior</strong> pulls the latent toward the natural-image manifold a frozen diffusion model already knows, trained as a small per-concept LoRA so it regularises without collapsing every concept to the same generic mean. The regulariser weights I used to pick by feel are replaced by a prior the model derived.</p>\n<h3 id=\"what-comes-out\"><a class=\"heading-anchor\" href=\"#what-comes-out\">What comes out</a></h3>\n<p>Four renders, four concepts: <strong>water</strong>, <strong>thunder</strong>, <strong>fire</strong> and <strong>goldfish</strong>, listed here in a different order than they appear below. See if you can match each to a tile before you turn it over.</p>\n<figure class=\"rg\" data-render-gallery data-astro-cid-lub6o4hp> <ul class=\"rg__grid\" data-count=\"4\" data-astro-cid-lub6o4hp> <li class=\"rg__item\" data-astro-cid-lub6o4hp> <button class=\"rg__card\" type=\"button\" aria-pressed=\"false\" aria-label=\"Reveal the concept\" data-astro-cid-lub6o4hp> <span class=\"rg__inner\" data-astro-cid-lub6o4hp> <span class=\"rg__face rg__face--front\" data-astro-cid-lub6o4hp> <img src=\"/_astro/goldfish.DWmZhRnI_1zNqtc.webp\" srcset=\"/_astro/goldfish.DWmZhRnI_Z1ManzL.webp 320w, /_astro/goldfish.DWmZhRnI_34ryO.webp 512w, /_astro/goldfish.DWmZhRnI_1y7itL.webp 768w\" alt=\"Dense multicoloured speckle with clusters of orange, fan-shaped forms scattered across it.\" sizes=\"(min-width: 48rem) 30ch, 45vw\" loading=\"lazy\" decoding=\"async\" data-astro-cid-lub6o4hp=\"true\" width=\"1024\" height=\"1024\"> <span class=\"rg__hint\" aria-hidden=\"true\" data-astro-cid-lub6o4hp>\n?\n</span> </span> <span class=\"rg__face rg__face--back\" data-astro-cid-lub6o4hp> <span class=\"rg__answer\" data-astro-cid-lub6o4hp>goldfish</span> <span class=\"rg__again\" aria-hidden=\"true\" data-astro-cid-lub6o4hp>\ntap to flip back\n</span> </span> </span> </button> </li><li class=\"rg__item\" data-astro-cid-lub6o4hp> <button class=\"rg__card\" type=\"button\" aria-pressed=\"false\" aria-label=\"Reveal the concept\" data-astro-cid-lub6o4hp> <span class=\"rg__inner\" data-astro-cid-lub6o4hp> <span class=\"rg__face rg__face--front\" data-astro-cid-lub6o4hp> <img src=\"/_astro/fire.fo5-DUqD_uNk0b.webp\" srcset=\"/_astro/fire.fo5-DUqD_Z1Mb2Wa.webp 320w, /_astro/fire.fo5-DUqD_KLRNB.webp 512w, /_astro/fire.fo5-DUqD_Z1Km1Bf.webp 768w\" alt=\"Multicoloured speckle with branching orange tongues rising through the centre.\" sizes=\"(min-width: 48rem) 30ch, 45vw\" loading=\"lazy\" decoding=\"async\" data-astro-cid-lub6o4hp=\"true\" width=\"1024\" height=\"1024\"> <span class=\"rg__hint\" aria-hidden=\"true\" data-astro-cid-lub6o4hp>\n?\n</span> </span> <span class=\"rg__face rg__face--back\" data-astro-cid-lub6o4hp> <span class=\"rg__answer\" data-astro-cid-lub6o4hp>fire</span> <span class=\"rg__again\" aria-hidden=\"true\" data-astro-cid-lub6o4hp>\ntap to flip back\n</span> </span> </span> </button> </li><li class=\"rg__item\" data-astro-cid-lub6o4hp> <button class=\"rg__card\" type=\"button\" aria-pressed=\"false\" aria-label=\"Reveal the concept\" data-astro-cid-lub6o4hp> <span class=\"rg__inner\" data-astro-cid-lub6o4hp> <span class=\"rg__face rg__face--front\" data-astro-cid-lub6o4hp> <img src=\"/_astro/water.JKFDuu9C_XlzdT.webp\" srcset=\"/_astro/water.JKFDuu9C_zWci6.webp 320w, /_astro/water.JKFDuu9C_2uohLR.webp 512w, /_astro/water.JKFDuu9C_Z1LDOsi.webp 768w\" alt=\"Rippling, ridged texture over multicoloured speckle, with faint letter-like shapes near the centre.\" sizes=\"(min-width: 48rem) 30ch, 45vw\" loading=\"lazy\" decoding=\"async\" data-astro-cid-lub6o4hp=\"true\" width=\"1024\" height=\"1024\"> <span class=\"rg__hint\" aria-hidden=\"true\" data-astro-cid-lub6o4hp>\n?\n</span> </span> <span class=\"rg__face rg__face--back\" data-astro-cid-lub6o4hp> <span class=\"rg__answer\" data-astro-cid-lub6o4hp>water</span> <span class=\"rg__again\" aria-hidden=\"true\" data-astro-cid-lub6o4hp>\ntap to flip back\n</span> </span> </span> </button> </li><li class=\"rg__item\" data-astro-cid-lub6o4hp> <button class=\"rg__card\" type=\"button\" aria-pressed=\"false\" aria-label=\"Reveal the concept\" data-astro-cid-lub6o4hp> <span class=\"rg__inner\" data-astro-cid-lub6o4hp> <span class=\"rg__face rg__face--front\" data-astro-cid-lub6o4hp> <img src=\"/_astro/thunder.x4DD9hDA_1q8kQC.webp\" srcset=\"/_astro/thunder.x4DD9hDA_fJ6AV.webp 320w, /_astro/thunder.x4DD9hDA_bsuLr.webp 512w, /_astro/thunder.x4DD9hDA_1BQp7N.webp 768w\" alt=\"Warm-toned speckle with jagged dark branching forks running down through it.\" sizes=\"(min-width: 48rem) 30ch, 45vw\" loading=\"lazy\" decoding=\"async\" data-astro-cid-lub6o4hp=\"true\" width=\"1024\" height=\"1024\"> <span class=\"rg__hint\" aria-hidden=\"true\" data-astro-cid-lub6o4hp>\n?\n</span> </span> <span class=\"rg__face rg__face--back\" data-astro-cid-lub6o4hp> <span class=\"rg__answer\" data-astro-cid-lub6o4hp>thunder</span> <span class=\"rg__again\" aria-hidden=\"true\" data-astro-cid-lub6o4hp>\ntap to flip back\n</span> </span> </span> </button> </li> </ul> <figcaption class=\"rg__caption\" data-astro-cid-lub6o4hp>Four concepts, each optimised from noise toward its target embedding; nothing retouched. Tap a tile to reveal which - see how many you can name from the noise alone.</figcaption> </figure> <script type=\"module\" src=\"/Users/cal/Documents/GitHub/callum-portfolio/apps/web/src/components/mdx/RenderGallery.astro?astro&type=script&index=0&lang.ts\"></script>\n<p>These are not meant to be beautiful, and it’s worth being plain about what they are. Each one is the literal output for a concept: the optimiser starts from random noise and nudges every pixel until the whole image embeds close to the target point. Nobody ever shows it the subject or tells it what one looks like. So where matching the concept doesn’t pull on a region, the pixels stay as the high-frequency speckle they started as. Most of the frame is static because, to the encoder, most of the frame doesn’t matter.</p>\n<p>What matters is that the subject still surfaces. Look past the speckle and forms start to organise out of it - edges, repeating shapes, the rough silhouette of whatever the model is reaching for. In places it reaches for the written word rather than the thing, which is its own tell that the encoder is anchored on language. None of it resolves into a clean picture, but the concept is unmistakably there once you spot it; that is the game above, four renders with nothing labelled until you turn them over. That partial, patchy legibility is the honest result, and it forces a choice: leave the representation raw, or resolve it toward something that reads as a real image.</p>\n<h3 id=\"honest-and-natural\"><a class=\"heading-anchor\" href=\"#honest-and-natural\">Honest and natural</a></h3>\n<p>The most interesting decision was to render two tracks of the same concept side by side. A VAE-style render answers “what is the nearest natural image whose embedding matches this concept” - it leans on the decoder’s training distribution to produce something photo-real and recognisable. That’s the <strong>natural</strong> track, VSD-prior conditioned. The <strong>honest</strong> track dials the prior down and lets the optimisation sit closer to the raw representation, so the output is what the model <em>thinks</em> the concept is rather than the closest real photograph of it. Neither is the true one. The contrast between them is the artefact: here is the model’s idea of “thunder,” and here is the nearest thing in the world to it.</p>\n<h3 id=\"what-every-render-ships-with\"><a class=\"heading-anchor\" href=\"#what-every-render-ships-with\">What every render ships with</a></h3>\n<p>Each run emits an interpretation bundle and an evaluation card, so an artefact is never just an image. The bundle carries the text-anchor readout (what the output embeds as, in words), the named SAE features that fired hardest, attribution maps from attention rollout and integrated gradients that show <em>where</em> in the frame each feature lives, and linear-probe activations for coarse questions like “is this animal” or “is this danger-coded.” The card carries the numbers: cross-modal agreement (do the image, audio, and video for one target re-encode to neighbours), cross-encoder probes that re-score the output with SigLIP2 and CLAP rather than the encoder it was optimised against, and seed-stability across repeated runs. Evaluation is built into the pipeline rather than bolted on after, which is the part of this build I’d most defend.</p>\n<p>The whole thing runs on Apple Silicon: <strong>PyTorch on the MPS backend</strong>, tuned on an M1 Max with 64GB of unified memory. A full four-modality dual-track run holds the LanguageBind encoders, three decoders, and optimisation overhead in roughly 19GB; a single-modality image render fits comfortably on 16GB. The Apple Silicon path is deliberate, and it took real work to make fast: bf16 autocast on the forward pass, <code>torch.compile</code> on the optimisation hot loop, and flash-attention routing through the ViT together buy a 3-5x wall-time improvement over naive fp32. This is exploratory work where the loop is run-look-adjust, and a fast local iteration cycle on hardware I already own beats renting a faster card.</p>\n<h2 id=\"outcome\"><a class=\"heading-anchor\" href=\"#outcome\">Outcome</a></h2>\n<p>The system does concept algebra across four modalities in a single shared embedding space: addition, subtraction, and spherical interpolation over text, image, audio, and video, with cross-modal targets that let an audio concept shape an image or a text concept colour a sound.</p>\n<p>What it surfaces is the more interesting result than any single picture: the platonic ideals a network has learned. The outputs are what the model <em>thinks</em> a concept looks like: maximally activated representations reconstructed by optimisation, not training examples retrieved and recombined. As an interpretability artefact, that’s the point. You’re looking at the geometry of a learned representation directly, and the interpretation bundle lets you read it rather than guess at it: which features fired, where they live in the frame, what the output embeds as in words.</p>\n<p>The evaluation card turns “did this work” from a judgement made by eye into something with numbers behind it. Cross-modal agreement says whether the four renderings of a target really landed near each other, cross-encoder probes say whether a second model agrees about what’s in the output, and seed-stability says whether the result is a property of the concept or an accident of initialisation. It still makes no benchmark claim and it isn’t a paper; the numbers exist to keep the framing honest, not to top a leaderboard.</p>\n<p>I’m treating this as an essay-grade study. Its value is in the framing - generation as optimisation in a shared space, concepts as feature decompositions you can do arithmetic on, the honest and natural tracks shown together - and in the artefacts that framing produces. The natural next surface is the interactive one: an embedding-space view where you move through the concept manifold and watch the output change, which is where the lab takes it.</p>\n<p>The code is on <a href=\"https://github.com/cal-van/multimodal-embedding-art\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">GitHub<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a>.</p>",
      "date_published": "2026-01-01T00:00:00.000Z",
      "date_modified": "2026-01-01T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ]
    },
    {
      "id": "https://cvde.xyz/writing/metacognition-is-the-unlock/",
      "url": "https://cvde.xyz/writing/metacognition-is-the-unlock/",
      "title": "Metacognition is the unlock",
      "summary": "Model progress has moved through paradigms: reactive, then reasoning, then agentic. The next is metacognition - thinking about its own thinking - and it's what separates a model that repeats mistakes from one that compounds on them.",
      "content_html": "<blockquote>\n<p>The big challenge for traditional LLMs is that they are path-dependent; while they can consider the puzzle as a whole, as soon as they commit to a particular guess they are locked in, and doomed to failure. This is a fundamental weakness of what are known as “auto-regressive large language models”, which to date, is all of them.</p>\n</blockquote>\n<p>That’s <a href=\"https://stratechery.com/2026/agents-over-bubbles/\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">Ben Thompson<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a>. Escaping that trap is what each paradigm of model progress has chipped at, and the paradigm you’re on sets the ceiling on what the model can do without you standing over it.</p>\n<p>The first paradigm was reactive. Early chat models answered in one shot: prompt in, text out, no pause between. They were fluent and often right, but they couldn’t catch themselves mid-thought because there was no middle to the thought. The whole answer arrived at once.</p>\n<p>The second was reasoning. Models learned to think step by step before answering; to lay out the working, not just the conclusion. This is a real jump. A model that reasons through a problem catches errors a reactive model commits on the spot, because the steps are now visible to the model as it produces them. Reasoning buys reliability on anything that has a chain to it.</p>\n<p>The third is agentic. Harnesses and agents decompose a goal into steps and sequence the work: do this, then that, check the result, move on. This is what turns a model that answers questions into a system that completes tasks. It can hold a goal across many calls and make progress against it without a human threading each step.</p>\n<h2 id=\"the-next-paradigm\"><a class=\"heading-anchor\" href=\"#the-next-paradigm\">The next paradigm</a></h2>\n<p>That paradigm is metacognition: thinking about its own thinking. Not reasoning through the task, but reasoning about the approach to the task. Noticing that a line of attack is failing and changing it. Noticing it’s hit the same blocker three times and trying a different approach. Treating a failed attempt as information about strategy rather than a thing to retry verbatim.</p>\n<p>The distinction is sharp once you see it. An agentic loop that hits an error retries the step, maybe with a tweak, and retries again. It’s working hard inside a frame it never questions. A metacognitive agent asks a different question: is the frame wrong? It can question the step itself, judge the strategy that produced it, and pick a new one. The first can loop forever. The second can notice it’s looping.</p>\n<p>In humans this is the whole game. Two people make the same mistake. One files it as bad luck and makes it again next week. The other extracts the rule, updates how they work, and never makes that class of mistake the same way twice. The second person isn’t smarter in the moment; they’re better at learning from the moment. Metacognition is what separates someone who repeats mistakes from someone who compounds on them. The mistakes are the same. What you do with them isn’t.</p>\n<h2 id=\"where-you-trigger-it\"><a class=\"heading-anchor\" href=\"#where-you-trigger-it\">Where you trigger it</a></h2>\n<p>An agent loop usually has an observe step, a decide step, an act step. Metacognition adds a fourth: learn. That’s the seam where metacognition lives, and the useful part is that you can trigger it deliberately rather than wait for it to emerge.</p>\n<p>After an attempt, before the next one, you make the agent answer a different kind of question. Not “what’s the next action” but “what did that attempt tell me, and should I change my approach?” Cheap to add: it’s a prompt and a place in the loop to run it. The effect is out of proportion to the cost. The agent stops treating every failure as a reason to retry and starts treating some failures as a reason to rethink. You’re not making the model smarter. You’re giving it a moment to be honest about whether the current plan is working, and a license to abandon it.</p>\n<p>Current models don’t fully have this. They’ll cheerfully retry a doomed approach, declare success on work that failed, miss the pattern in their own errors. The paradigm isn’t here yet; it’s arriving. But you can see the shape of it, and you can scaffold toward it in the loops you build today. The learn step is where you reach for the next paradigm before the models get there on their own.</p>\n<p>The reactive model answers. The reasoning model works through. The agentic one sequences. The metacognitive one notices it’s about to make the same mistake again, and doesn’t.</p>",
      "date_published": "2025-11-25T00:00:00.000Z",
      "date_modified": "2025-11-25T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "agents",
        "metacognition"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/the-trust-calibration-tax/",
      "url": "https://cvde.xyz/writing/the-trust-calibration-tax/",
      "title": "The trust-calibration tax",
      "summary": "The hard part of AI UX isn't generating answers. It's teaching users when to trust the system, when not to, and how to recover when it's wrong. That work is the real cost, and most teams never budget for it.",
      "content_html": "<p>Generating the answer is the cheap part now. Any team can wire a model to a prompt and ship something that produces a plausible result. The expensive part is teaching the user when to believe it, when to check it, and what to do in the moment it’s wrong. I call that work the trust-calibration tax, and almost no roadmap has a line for it.</p>\n<p>It is a tax because you pay it whether you budget for it or not. Skip it and the cost still arrives; it just arrives as churn instead of a story point. The user trusts the system once, gets burned on something they couldn’t see coming, and quietly stops opening it. They rarely file a bug, because nothing broke. The output was wrong in a way that looked exactly like being right. That is the failure AI produces most often. You lose the account and never learn why.</p>\n<h2 id=\"calibration-is-the-product-not-the-model\"><a class=\"heading-anchor\" href=\"#calibration-is-the-product-not-the-model\">Calibration is the product, not the model</a></h2>\n<p>A user’s trust in an AI feature is a dial, and your job is to keep it pointed at the truth. Too low and they ignore a system that would have helped; you built it for nothing. Too high and they ship its mistakes under their own name; you built them a liability. The output being good on average does not fix either failure, because the user can’t see the average. They see one result at a time and have to decide, each time, how much to lean on it.</p>\n<p>This is why “we improved accuracy to 94%” doesn’t move adoption the way teams expect. A system that’s right 94% of the time but gives you no way to spot the wrong 6% is harder to use well than one that’s right 80% of the time and flags the answers it’s unsure of. The first asks the user to trust everything equally and punishes them for it. The second teaches them where to look. Calibration beats raw accuracy for the same reason a fuel gauge beats a bigger tank; what the user needs is to know where they stand, not just more of the thing.</p>\n<h2 id=\"the-four-jobs-the-tax-pays-for\"><a class=\"heading-anchor\" href=\"#the-four-jobs-the-tax-pays-for\">The four jobs the tax pays for</a></h2>\n<p>The tax is not one feature. It’s a set of jobs the interface has to do, and each one is the kind of work that gets cut first when a deadline tightens.</p>\n<ul>\n<li><strong>Confidence signalling.</strong> The system has to tell the user how much to lean on each answer, and it has to be honest. A model that sounds equally certain about a settled fact and a wild guess leaves the user guessing. Surfacing “this is solid” versus “this is a guess, check it” is more valuable than closing the gap between them, because it puts the user in control of their own risk.</li>\n<li><strong>Showing the working.</strong> Trust calibrates fastest when the user can see how the answer was reached. A claim that links to its evidence, a number that resolves to the rows it came from, a step you can expand; each one lets a sceptic spend thirty seconds confirming the thing holds. <a href=\"/writing/make-every-ai-claim-clickable/\">“Make every AI claim clickable”</a> is one tactic under this heading, and it’s a strong one, but it’s a single instrument in a larger kit.</li>\n<li><strong>Graceful failure.</strong> The question is never whether the system will be wrong; it’s what the wrongness feels like. A confident, undifferentiated error is the expensive kind. A system that says “I’m not sure about this part” before it’s wrong has pre-paid most of the trust cost, because the user was warned exactly where to look.</li>\n<li><strong>Undo.</strong> Trust is cheap to extend when mistakes are cheap to reverse. If a wrong AI action can be undone in one click, the user will try the feature freely and forgive its errors. If a wrong action is permanent, they’ll either avoid the feature or use it so cautiously it saves them nothing. Reversibility is what makes it safe to trust at all.</li>\n</ul>\n<h2 id=\"the-pattern-across-products\"><a class=\"heading-anchor\" href=\"#the-pattern-across-products\">The pattern across products</a></h2>\n<p>The same tax shows up wherever the trust gets calibrated, regardless of domain. At Lyssna, researchers analysing AI study output didn’t want a cleaner summary; they wanted to see which transcript a finding came from, because their trust was a function of traceability, not polish. At hey anna, every claim is clickable for the same reason: the calibration work is built into the surface, so a user learns within minutes which numbers to lean on and which to open. In both, the model was the easy half. The interface that let a professional calibrate their reliance on it was the half that took the time and the half that decided whether anyone kept using it.</p>\n<p>You can see the tax dodged in the wild too. The AI feature that demos beautifully and dies in production is almost always one that aced generation and skipped calibration. It gave great answers and no way to tell the great ones from the dangerous ones, so the first burn taught the user to stop trusting all of it at once. The feature didn’t fail because the model was bad. It failed because nobody built the part that teaches a person how much to believe.</p>\n<h2 id=\"budget-for-it-on-purpose\"><a class=\"heading-anchor\" href=\"#budget-for-it-on-purpose\">Budget for it on purpose</a></h2>\n<p>The practical move is to treat calibration as a first-class part of the spec, not a polish pass. When you scope an AI feature, scope the four jobs alongside the generation: how does this signal its confidence, how does a user see the working, what does it feel like when it’s wrong, and how does someone undo a bad result. If those four don’t have owners, you have not built an AI product. You’ve built a demo that happens to run in production.</p>\n<p>This is also where shipping an AI product grows up past prompt engineering. The first instinct, and still the common one, is to push the work back onto the user: tell the model not to make things up, or lean on a “check your answer” and a second agent to review the first. That honestly gets an individual a long way, and if you’re using AI for your own work, use it. Shipping a product is the opposite move. Product is solving a user’s problem systematically and then building the solution in, so they don’t have to re-solve it every session; you do the thinking ahead of them and bake it into the system. That is what SaaS has always been, and AI is no different, except that it is unusually easy to ship something that looks like thinking and has none underneath.</p>\n<p>The teams that win the next few years won’t be the ones with the best model. Raw capability is commoditising on a schedule nobody controls. They’ll be the ones who paid the trust-calibration tax in design time instead of in churn, and who understood that the answer was never the hard part.</p>",
      "date_published": "2025-11-06T00:00:00.000Z",
      "date_modified": "2025-11-06T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai-product",
        "ux",
        "trust"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/not-every-ai-feature-is-a-chat/",
      "url": "https://cvde.xyz/writing/not-every-ai-feature-is-a-chat/",
      "title": "Not every AI feature should be a chat",
      "summary": "Enterprises trust AI for invisible categorisation and distrust it for reversible work behind a chat box. 'Chat, move this five pixels' is all-or-nothing and risky. Sometimes the right surface for an AI feature is a button.",
      "content_html": "<p>The same enterprise that lets AI silently categorise a million support tickets will refuse to let an AI chat assistant touch the layout of one slide. From the outside that looks like inconsistency. It isn’t. It’s a precise read on where AI is safe to depend on and where it isn’t, and the chat box is on the wrong side of that line more often than chat-first design assumes.</p>\n<p>Watch what enterprises actually approve. They hand AI the invisible, high-volume work: routing tickets, tagging transactions, flagging anomalies, deduplicating records. Millions of decisions a day, no human in the loop, complete trust. Then watch what they hold back. The hands-on, reversible work a person was doing themselves a minute ago: editing the document, adjusting the design, changing the number in the cell. Here they want their hands on the controls. The pattern isn’t fear of AI. It’s a sober judgement about which surface fits which task.</p>\n<h2 id=\"two-questions-decide-the-surface\"><a class=\"heading-anchor\" href=\"#two-questions-decide-the-surface\">Two questions decide the surface</a></h2>\n<p>The surface for an AI feature should fall out of two properties of the task, not out of what’s fashionable to ship.</p>\n<p>The first is how much trust the task requires. Categorisation is forgiving at scale; one misrouted ticket out of a million is noise, and the aggregate is what matters. Moving an element on a customer’s slide is unforgiving and singular; there is no average to hide in, and the one wrong move is the whole experience. High-trust, low-tolerance tasks want the user’s hand on the wheel. Low-trust, high-volume tasks are exactly where you let the AI run unattended.</p>\n<p>The second is how reversible the action is. A categorisation that runs in the background is reversible by definition; you re-run it, you correct the tag, nothing was staked. A direct manipulation of something the user is actively working on carries immediate, visible consequences they have to live with. The more reversible and invisible the work, the more autonomy the AI can safely have. The more direct and consequential, the more the user wants a control they understand.</p>\n<h2 id=\"where-chat-earns-its-place-and-where-it-doesnt\"><a class=\"heading-anchor\" href=\"#where-chat-earns-its-place-and-where-it-doesnt\">Where chat earns its place, and where it doesn’t</a></h2>\n<p>Chat is a genuinely good surface for a specific shape of task: open-ended, exploratory, where the user doesn’t yet know what they want and the cost of a wrong turn is just another message. “Help me think through this,” “what’s in this dataset,” “draft me three options.” The ambiguity is the point, and a conversation is the right tool for resolving ambiguity.</p>\n<p>The fashionable AI design tools show the failure of forcing everything else into that mould. “Chat, move this box five pixels left” is slower, vaguer, and riskier than the direct control it replaced. You type a sentence, wait for a generation, and discover whether the model understood; the old way was to grab the box and move it, with your eyes closing the loop in real time. Worse, the interaction is all-or-nothing. You get the model’s whole interpretation back at once, and if it’s 90% right you’re now editing its guess instead of expressing your intent. A precise, reversible, hands-on task got wrapped in an imprecise, latent, all-or-nothing interface. The chat didn’t add power. It added distance between the user and the thing they were trying to do.</p>\n<h2 id=\"how-wide-is-the-space\"><a class=\"heading-anchor\" href=\"#how-wide-is-the-space\">How wide is the space?</a></h2>\n<p>Those two questions are about how much autonomy a task can bear. There’s a third, on a different axis, and it decides how much an LLM is actually buying you: how wide is the space of inputs and outputs the feature has to cover?</p>\n<p>A traditional control is something you build by anticipating that space in advance. Every option, every edge case, every state has to be foreseen and given a component, and what you didn’t build, the user can’t do. That foresight is most of the cost of software, and it’s why narrow, well-understood tasks get clean controls and sprawling, open-ended ones get a thin UI or none at all. An LLM <a href=\"/writing/ai-is-an-interface/\">broadens the interface</a> without you building any of it: it takes an input you never enumerated and produces an output you never designed a screen for. That is the real trade. When the space is too wide to enumerate, you couldn’t have built a control for every way a person might ask for the chart they want, so you let language carry the range. When the space is narrow, the model is selling you breadth you don’t need, and charging latency, a stochastic result, and an all-or-nothing round-trip for it.</p>\n<p>And the answer isn’t fixed for the life of a feature; it moves through a single task. Generation is wide: you don’t know exactly what you’ll get, so the open interface fits. Editing is narrow: you know the precise change, and running it back through the stochastic round-trip to move one line or delete one word is slower and less certain than simply doing it yourself. The same feature wants a broad interface to create and a direct one to refine, and the good ones switch at that seam rather than trapping the user in a conversation for work their hands could finish in a second.</p>\n<h2 id=\"a-decision-rule\"><a class=\"heading-anchor\" href=\"#a-decision-rule\">A decision rule</a></h2>\n<p>Before defaulting an AI feature to chat, run it through four questions.</p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<div class=\"table-scroll\" tabindex=\"0\" role=\"group\" aria-label=\"Table, scroll horizontally to see more\"><table><thead><tr><th>Question</th><th>Lean to chat</th><th>Lean to a control</th></tr></thead><tbody><tr><td>Does the user know what they want?</td><td>No, they’re exploring</td><td>Yes, they have a specific intent</td></tr><tr><td>How direct is the action?</td><td>Indirect, the AI does work offstage</td><td>Direct, the user is manipulating the thing</td></tr><tr><td>What’s the cost of a wrong interpretation?</td><td>Low, just send another message</td><td>High, you’re now editing a bad guess</td></tr><tr><td>How wide is the input and output space?</td><td>Wide, you can’t enumerate it</td><td>Narrow, you can build for it</td></tr></tbody></table></div>\n<p>When the answers point right, the better surface is a button, a toggle, a slider, or an inline suggestion the user accepts or rejects in place. A button has no ambiguity to misread. A toggle is instantly reversible. An inline suggestion lets the user see the proposed change against the real thing and take it or leave it without a round-trip through a sentence. These surfaces give the AI exactly as much autonomy as the task can bear and no more, which is the whole game.</p>\n<aside class=\"callout callout--note\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:info\">   <symbol id=\"ai:lucide:info\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><circle cx=\"12\" cy=\"12\" r=\"10\"/><path d=\"M12 16v-4m0-4h.01\"/></g></symbol><use href=\"#ai:lucide:info\"></use>  </svg> <span data-astro-cid-pyumqe5w>Note</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>A quick check before you build the chat: could the user accomplish this with one click on a control they can see? If yes, the conversation is overhead. Chat is for the moments where the user genuinely can’t point at what they want, because pointing is faster than describing every single time pointing is possible.</p> </div> </aside>\n<h2 id=\"match-the-surface-to-the-task\"><a class=\"heading-anchor\" href=\"#match-the-surface-to-the-task\">Match the surface to the task</a></h2>\n<p>The chat-first default treats conversation as the universal interface for AI, and for a real slice of tasks it is the right one. The mistake is reaching for it everywhere, including the places where a direct control would be faster to use, safer to trust, and easier to reverse. The surface is not a branding decision. It’s where the trust the task demands meets the reversibility the action allows, and that meeting point is sometimes a conversation and sometimes a button.</p>\n<p>Pick the surface the task is asking for. Often it’s quietly asking for a button.</p>",
      "date_published": "2025-10-15T00:00:00.000Z",
      "date_modified": "2025-10-15T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai-product",
        "ux",
        "product"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/context-engineering/",
      "url": "https://cvde.xyz/writing/context-engineering/",
      "title": "Context engineering is the design surface",
      "summary": "Prompt engineering treats the words as the lever. Context engineering treats the whole context window as the design surface, with a stage and an artifact for each step. It's the better abstraction for production work.",
      "content_html": "<p>Prompt engineering treats the wording as the lever you pull. Phrase the request better, add the magic words, and the output improves. That model works for a single turn against a chatbot. It stops working the moment the task is large enough to matter, because the wording was never the whole input. The whole input is the context window: the instructions, the examples, the retrieved material, the prior turns, the structure you imposed on all of it. Context engineering treats that entire window as a design surface you compose on purpose.</p>\n<p>The companion to this argument is that you should build context systems instead of crafting one-off prompts. That piece makes the case for the shift. This one goes further into how: what stages the work moves through, and what each stage leaves behind.</p>\n<h2 id=\"a-stage-produces-an-artifact\"><a class=\"heading-anchor\" href=\"#a-stage-produces-an-artifact\">A stage produces an artifact</a></h2>\n<p>The trap with “context engineering” is that it can stay a vibe. You nod, you agree the context matters, and then you go back to typing paragraphs into a box. The fix is to give the work stages, and to make each stage produce something you can point at. An artifact you can name is an artifact you can reuse, review, and hand to someone else.</p>\n<p>I teach this as four stages at Lyssna, in workshops on how teams actually get leverage out of models. The progression is Define, Discover, Design, Develop. Each one owns a question and produces an artifact.</p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<div class=\"table-scroll\" tabindex=\"0\" role=\"group\" aria-label=\"Table, scroll horizontally to see more\"><table><thead><tr><th>Stage</th><th>The question it answers</th><th>The artifact it leaves</th></tr></thead><tbody><tr><td>Define</td><td>What is the task, and what does done look like?</td><td>A problem statement and explicit success criteria</td></tr><tr><td>Discover</td><td>What domain material and examples does the model need?</td><td>A curated context pack: vocabulary, references, good and bad examples</td></tr><tr><td>Design</td><td>What structure and framework holds the work?</td><td>A scaffold: the framework named, the output shape fixed</td></tr><tr><td>Develop</td><td>How does it improve across turns?</td><td>A working transcript you can rerun and refine</td></tr></tbody></table></div>\n<p>The artifacts are the point. If a stage produces nothing you can save, you skipped it, and the model is now guessing at the thing you didn’t write down.</p>\n<h2 id=\"define-write-the-success-criteria-before-the-prompt\"><a class=\"heading-anchor\" href=\"#define-write-the-success-criteria-before-the-prompt\">Define: write the success criteria before the prompt</a></h2>\n<p>Most bad output traces back to a task that was never specified. “Make this better” has no done state, so the model invents one, and it rarely matches yours. The Define stage forces the criteria out of your head and onto the page: what the output is for, who reads it, what would make you reject it.</p>\n<p>The artifact is small and it does a lot of work. A few lines of success criteria let you judge the output instead of reacting to it. They also become the thing you paste into every later turn, so the model is scored against your bar rather than a generic one.</p>\n<h2 id=\"discover-vocabulary-activates-the-right-regions\"><a class=\"heading-anchor\" href=\"#discover-vocabulary-activates-the-right-regions\">Discover: vocabulary activates the right regions</a></h2>\n<p>This is the stage people skip, and it’s the one that moves quality most. A model has read an enormous amount. The problem is reaching for the right part of it. Domain vocabulary is how you do that. Precise terms act as a key: say “rebase” and “bisect” and the model lands in the part of its training where careful git work lives; say “make the history clean” and you get a vaguer neighbourhood. The words you choose decide which regions of the model’s training light up.</p>\n<p>So Discover is deliberate retrieval, done by you before any automated retrieval runs. You assemble the context pack: the terms of art for this domain, the references that define the standard, and examples of both good and bad output. Examples carry more than instructions do; one strong example of the thing you want teaches faster than three paragraphs describing it. The pack is reusable. Build it once for a recurring task and every future run starts from the right place.</p>\n<h2 id=\"design-a-framework-is-a-scaffold-the-model-already-knows\"><a class=\"heading-anchor\" href=\"#design-a-framework-is-a-scaffold-the-model-already-knows\">Design: a framework is a scaffold the model already knows</a></h2>\n<p>Frameworks earn their keep here, and not because they’re clever. A named framework is a structure the model has seen thousands of times in training. Invoke it and you get its shape for free: the steps, the order, the headings, the way the parts relate. You’re not teaching the model the framework. You’re pointing at one it already holds and asking it to pour this task into that mould.</p>\n<p>That’s the lever. “Analyse this” is unstructured and the output will be too. “Run a SWOT, then a risk register, then a recommendation” hands the model three scaffolds and an order to fill them in. The artifact of this stage is the scaffold made explicit: the framework named, the output shape fixed, the sections decided before the first word is generated.</p>\n<aside class=\"callout callout--note\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:info\">   <symbol id=\"ai:lucide:info\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><circle cx=\"12\" cy=\"12\" r=\"10\"/><path d=\"M12 16v-4m0-4h.01\"/></g></symbol><use href=\"#ai:lucide:info\"></use>  </svg> <span data-astro-cid-pyumqe5w>Note</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>A framework you invent on the spot is worse than a standard one the model has seen ten thousand times. Reach for the common scaffold first; the model already knows its shape, so you spend none of the window teaching it.</p> </div> </aside>\n<h2 id=\"develop-build-across-turns-dont-shoot-once\"><a class=\"heading-anchor\" href=\"#develop-build-across-turns-dont-shoot-once\">Develop: build across turns, don’t shoot once</a></h2>\n<p>Single-shot prompting asks for the finished thing in one call. It works for small tasks and fails quietly on large ones, because everything that goes wrong goes wrong at once and you can’t see where. Building across turns separates the failures. You generate a draft, inspect it against the success criteria from Define, correct the one thing that’s off, and continue. Each turn carries the accumulated context forward, so the window gets richer as you go rather than starting cold.</p>\n<p>The artifact is the transcript itself. A good multi-turn session is a reusable thing: the sequence of moves that produced a result you trusted. Save it and you’ve captured a workflow, not just an answer.</p>\n<h2 id=\"claudemd-is-team-infrastructure-not-a-personal-trick\"><a class=\"heading-anchor\" href=\"#claudemd-is-team-infrastructure-not-a-personal-trick\">CLAUDE.md is team infrastructure, not a personal trick</a></h2>\n<p>All four stages produce artifacts, and artifacts are exactly what a context file is for. CLAUDE.md (or its equivalent in whatever tool you use) is where the durable parts live: the vocabulary, the success criteria, the frameworks you reach for, the examples of good output. Written down once, loaded every session.</p>\n<p>The shift worth naming is from personal trick to shared infrastructure. When the context file lives in your head, the leverage leaves when you do. When it lives in the repo, it’s something a team builds together and improves over time. A new person inherits the accumulated context on day one instead of rediscovering it over months. That is the difference between a clever individual and a team that compounds; the file is where the compounding happens.</p>\n<p>Prompt engineering optimises a sentence. Context engineering builds the surface that every sentence lands on, and leaves an artifact at each stage so the next run starts ahead of the last one.</p>",
      "date_published": "2025-09-22T00:00:00.000Z",
      "date_modified": "2025-09-22T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "prompt-engineering",
        "context-engineering"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/ai-should-be-a-dumb-renderer/",
      "url": "https://cvde.xyz/writing/ai-should-be-a-dumb-renderer/",
      "title": "AI should be a dumb renderer",
      "summary": "The default pattern for AI insights dumps data into a model and asks it to count, compare, and conclude. That's the wrong order. Precompute the numbers deterministically; let the model render and narrate them.",
      "content_html": "<p>A language model cannot be trusted to add up a column of numbers. It can be trusted to explain, fluently and correctly, a sum that something else already computed. The whole design of a defensible AI insights feature follows from holding those two facts at once.</p>\n<p>The default pattern does the opposite. It takes a table of structured data, pastes it into the context window, and asks the model to count the rows, compare the segments, and tell you what changed. This is a CSV in ChatGPT dressed up as a product. It demos beautifully and degrades the moment a stakeholder checks one of the numbers against the source, because the model was never counting. It was predicting what a count usually looks like.</p>\n<h2 id=\"the-boundary-is-the-product\"><a class=\"heading-anchor\" href=\"#the-boundary-is-the-product\">The boundary is the product</a></h2>\n<p>Draw a line through the system. On one side sits everything that has a single correct answer: the aggregations, the group-bys, the deltas, the percentages, the ranking, the “this is up 12% on last quarter.” On the other side sits everything that is a judgement call about language: which of those facts matters, how to phrase it, what to lead with, how to connect three numbers into a sentence a human will act on.</p>\n<p>The first side is arithmetic. You write it in SQL or pandas, you test it, it returns the same answer every time, and when someone disputes a figure you can point at the query. The second side is rendering. You hand the model the precomputed facts and ask it to narrate them. It is good at this in a way that no deterministic system is; it is also incapable of inventing a number it was never given, because the numbers arrive already settled.</p>\n<p>That line is the entire architecture. Everything to the left is auditable. Everything to the right is generated. The failure mode of most AI features is putting arithmetic on the right.</p>\n<h2 id=\"what-this-looks-like-in-practice\"><a class=\"heading-anchor\" href=\"#what-this-looks-like-in-practice\">What this looks like in practice</a></h2>\n<p>Take an analytics feature built on this split. Every claim it makes is clickable: click “revenue rose 18% last quarter” and you land on the 340 orders the figure was computed from. That guarantee is only possible because the claim was computed, not authored. The model receives a structured object that already says revenue rose 18% across these 340 orders, and its job is to turn that into a sentence and a paragraph of context. It cannot quietly say 19%, because it never held the 340 orders; it held a fact with a fixed shape. That is verification over generation: the claim is checkable because it was computed, not composed.</p>\n<p>I saw the same boundary from the other direction at Lyssna, working on AI analysis of user-research studies. The temptation with a pile of interview transcripts is to throw the lot at a model and ask “what did we learn?” You get something that reads like a synthesis and is impossible to defend, because you cannot trace any sentence back to the moment a participant actually said it. The work that mattered was upstream: coding responses, tagging themes, counting how many participants hit a given friction point. Once those counts exist, the model writing “seven of twelve participants stalled at the pricing step” is rendering a fact, and a researcher can click through to all seven. Skip the counting and the same sentence is a plausible fabrication.</p>\n<aside class=\"callout callout--note\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:info\">   <symbol id=\"ai:lucide:info\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><circle cx=\"12\" cy=\"12\" r=\"10\"/><path d=\"M12 16v-4m0-4h.01\"/></g></symbol><use href=\"#ai:lucide:info\"></use>  </svg> <span data-astro-cid-pyumqe5w>Note</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>A useful test for any AI feature: pick a number on the screen and ask “what produced this exact value?” If the honest answer is “the model decided,” you have a rendering problem dressed as an insight. If the answer is “this query, over these rows,” the model is doing the job it is actually good at.</p> </div> </aside>\n<h2 id=\"why-this-is-the-order-that-scales\"><a class=\"heading-anchor\" href=\"#why-this-is-the-order-that-scales\">Why this is the order that scales</a></h2>\n<p>The deterministic side scales because it is ordinary software. It is cacheable, testable, cheap to run, and it does not drift when the model version changes underneath you. The generative side scales because its surface area is small and forgiving: phrasing errors are recoverable in a way that arithmetic errors are not. A clumsy sentence loses you nothing; a wrong total loses you the account.</p>\n<p>Reverse the split and you inherit the worst of both. You pay model latency and token cost to do work a database does for free, you get answers that vary between runs, and you cannot prove any of them. Worse, the errors are confident and well-written, which is precisely what makes them dangerous. A wrong number in a clumsy email gets caught. A wrong number in a beautifully argued executive summary gets forwarded.</p>\n<p>The reason this matters beyond any one product is that “AI insights” is becoming a default feature request, and the default implementation is the broken one. The fix is not a better model or a longer prompt. It is moving the boundary: compute every fact before the model sees it, and let the model do the one thing it is genuinely better at than your codebase, which is putting those facts into language a person will trust.</p>\n<p>Trust is not a property of the model. It is a property of where you drew the line.</p>",
      "date_published": "2025-08-14T00:00:00.000Z",
      "date_modified": "2025-08-14T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai-product",
        "engineering",
        "ai"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/why-i-still-write-code/",
      "url": "https://cvde.xyz/writing/why-i-still-write-code/",
      "title": "Why I still write code as a product leader",
      "summary": "Writing production code as a product leader compresses decision latency: faster iteration, feasibility you can check yourself, AI behaviour you can debug directly, and no organisational telephone.",
      "content_html": "<p>Writing production code as a product leader compresses decision latency: it shortens the time between a question and a trustworthy answer. Most of what makes product decisions slow is not thinking time. It is the relay between the person who has the question and the person who can answer it.</p>\n<p>I want to be precise about what the practice actually changes, because the honest version is narrower and more useful than “leaders should code.”</p>\n<h2 id=\"faster-iteration-loops\"><a class=\"heading-anchor\" href=\"#faster-iteration-loops\">Faster iteration loops</a></h2>\n<p>The loop that matters is question to answer to next question. Every step you cannot take yourself adds a queue.</p>\n<p>“Would it feel better if the result streamed in instead of landing all at once?” is a five-minute question if you can change it and look. It is a two-day question if it has to become a ticket, get prioritised, get built, and come back for you to react to, by which point you have lost the thread that made you ask. The cost is not the engineering time. It is the loss of momentum, and the quiet way a slow loop trains you to ask fewer questions.</p>\n<p>When you can close the loop yourself, you ask more of them, and you ask sharper ones, because the cost of being curious dropped.</p>\n<h2 id=\"feasibility-you-can-check-before-it-hits-a-roadmap\"><a class=\"heading-anchor\" href=\"#feasibility-you-can-check-before-it-hits-a-roadmap\">Feasibility you can check before it hits a roadmap</a></h2>\n<p>A roadmap is a set of promises. The expensive mistake is promising something whose hard part you have not seen.</p>\n<p>Plenty of features look like a week and turn out to be a quarter, and the difference is usually invisible from the outside: the third-party API that rate-limits in a way that breaks the whole flow, the data that does not exist in the shape the design assumes, the latency you only notice once real volume hits it. You find these by building a thin version and watching where it strains. Doing that yourself means feasibility is something you have checked, not something you have been told. The roadmap gets more honest because the person making the promises has touched the part that would break it.</p>\n<p>The highest-leverage version of this happens live, in the room. A client or a non-technical stakeholder describes what they want, and because I can see roughly how it would be built, I can think through a couple of implementations on the spot, settle on the leanest one that would actually test the idea, and put a rough timeline on it before the meeting is over. That turns a one-way feature request into a decision we make together: here is what it would take, here is how valuable you actually think it is, and given both, is it worth doing? Far better to have that exchange with the person who knows the value, while the cost is still a cheap estimate, than to carry a vague ask back to a team and return weeks later with a number that reopens the whole question. Value for effort is the call that matters most, and it is the one technical depth lets you make in real time.</p>\n<h2 id=\"ai-behaviour-you-can-debug-directly\"><a class=\"heading-anchor\" href=\"#ai-behaviour-you-can-debug-directly\">AI behaviour you can debug directly</a></h2>\n<p>This is the one that has changed most, and it is where a relay does the most damage.</p>\n<p>When an LLM does something wrong, the bug is rarely in the code in the ordinary sense. The model invents a field that is not in the schema, or ignores half the context, or behaves perfectly in the demo and falls apart on the input a real user gives it. Diagnosing that means reading the actual prompt that went over the wire, the actual context window, the actual tokens that came back. It is empirical work, and it does not survive translation well.</p>\n<p>Through a relay it goes: “it’s giving weird answers” becomes a ticket, becomes an engineer’s best guess at reproducing a vague complaint, becomes a fix for a problem that may not be the real one, comes back, still wrong. Each hop strips detail from a problem that is <em>made of</em> detail. Debugging it yourself collapses that to one person looking at the trace. On the VicRoads number-plate pricing work, the front end was an LLM and the engine behind it was deterministic; the entire art was in where the model was allowed to reason and where it had to defer to the engine. That boundary is not something you can specify from a distance. You find it by watching the model fail and moving the line.</p>\n<h2 id=\"skipping-the-organisational-telephone\"><a class=\"heading-anchor\" href=\"#skipping-the-organisational-telephone\">Skipping the organisational telephone</a></h2>\n<p>Intent degrades at every handoff, the same way the playground game works. What you meant becomes what you wrote, becomes what they read, becomes what they built. Nobody is incompetent; meaning just leaks at each crossing, and the leak is largest exactly where the nuance matters most.</p>\n<p>Code skips the relay. The intent and the artefact are the same object. There is no version of “what I meant” that is separate from “what shipped,” because you wrote the thing that shipped. For the decisions where the nuance is the whole point (the pricing edge case, the model boundary, the interaction that lives or dies on fifty milliseconds) removing the telephone is not a nicety. It is the difference between the decision arriving intact and arriving distorted.</p>\n<h2 id=\"understand-the-layer-beneath-you\"><a class=\"heading-anchor\" href=\"#understand-the-layer-beneath-you\">Understand the layer beneath you</a></h2>\n<p>Underneath those four reasons sits a more general principle, and it’s the one that makes the habit more than nostalgia. To lead a domain, you have to understand, in detail, the layer directly beneath the one you operate at. Not the bottom of the stack; the next layer down. It holds going up a hierarchy and going down a stack of abstractions, and it’s the same rule both times.</p>\n<p>Take the abstractions. A product manager’s decisions are ultimately expressed in code, so code is the layer beneath you, and you have to be able to read it. A programmer doesn’t need to know the binary, but they do need mechanical sympathy: a feel for how their code compiles and runs against the machine, enough to write something performant rather than merely correct. Go one layer below the one beneath you and it’s noise; nobody needs the binary. The hierarchy works the same way. A CEO doesn’t need to know what each individual is doing, but does need to understand, in detail, what each department is doing. A department head who tracks exactly how each person does their work is micromanaging, but one who can’t say which tasks are being done, by whom, and how they ladder up to the goal isn’t leading. One layer down, in detail. Lower than that is either noise or control.</p>\n<p>This is the case for coding that actually convinces me. It lets me step into a technical conversation with people far more technical than I am and hold my end of it. Engineers have a depth I don’t; what they often lack is the business context and the strategy that decide which trade-off at their layer is the right one. Understanding their layer is what lets the conversation ladder up to mine. I’ve worked hard to be able to pivot mid-conversation between design, technology, strategy, and the market side, pricing and economics, because each of those rests on the one beside it and none of them is the whole picture alone.</p>\n<p>I’m watching the same rule play out from the other side now, as non-technical people start shipping real code with a model’s help. They can describe the business behaviour they want, which gets them further than anyone expected. But they’re missing the layer beneath: branches and rebasing, linting, the tests that tell you something broke, the mental model of how it all fits. So when something goes wrong they’re stuck, at the mercy of the model that wrote it. It is still remarkably powerful, and it is also capped: there’s a ceiling on the complexity, and therefore the value, they can deliver until they learn the domain, because the domain’s own language is exactly what unlocks the next level of performance from these models. The layer beneath still has to be understood, even when the machine is the one typing.</p>\n<h2 id=\"the-limits-honestly\"><a class=\"heading-anchor\" href=\"#the-limits-honestly\">The limits, honestly</a></h2>\n<p>This is not always the right use of a leader’s time, and pretending otherwise is its own failure.</p>\n<p>The job is leverage, and your hands in the codebase are a narrow lever. Write the code when the decision quality genuinely depends on you having touched the thing: the risky feasibility question, the AI behaviour nobody else can characterise, the prototype that settles an argument faster than a meeting would. Do not write it when an engineer would do it better and faster and the only thing your involvement adds is your ego in the commit history. A leader who is heads-down in a feature that someone else should own is not compressing decision latency; they are creating a bottleneck with their own name on it. The skill is knowing which is which, and the tell is simple: am I the lowest-cost path to a trustworthy answer here, or just the most comfortable one.</p>\n<aside class=\"callout callout--aside\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:corner-down-right\">   <symbol id=\"ai:lucide:corner-down-right\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><path d=\"m15 10l5 5l-5 5\"/><path d=\"M4 4v7a4 4 0 0 0 4 4h12\"/></g></symbol><use href=\"#ai:lucide:corner-down-right\"></use>  </svg> <span data-astro-cid-pyumqe5w>Aside</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>If you do build, the review worth asking for is the adversarial one. “Looks good” teaches you nothing. Ask the strongest engineer you have to try to break it: where does this fall over under load, what input did I not handle, where am I wrong. The point of writing the code as a leader is faster, truer feedback, and flattering feedback is just a slower lie.</p> </div> </aside>\n<p>The frame that holds all of this together: code is not a claim on territory and it is not a hobby smuggled into the job. It is the shortest path between a question and an answer you can trust, for the specific class of questions where the relay would have wrecked the answer on the way back. Use it there. Everywhere else, your job is still to make other people’s loops faster, not to live inside your own.</p>",
      "date_published": "2025-07-09T00:00:00.000Z",
      "date_modified": "2025-07-09T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "product",
        "engineering",
        "career"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/the-four-mode-pm/",
      "url": "https://cvde.xyz/writing/the-four-mode-pm/",
      "title": "The four-mode product manager",
      "summary": "Strategy, market analysis, solution architecture, implementation. The old role split these across people and handoffs. The job now is to move between all four in a single conversation.",
      "content_html": "<p>A good week now looks like this: SQL against the warehouse before lunch to find out which accounts are actually expanding, a repositioning and pricing argument in the afternoon, a working prototype in TypeScript that night, and somewhere in the middle a session debugging why an LLM keeps inventing a field that isn’t in the schema. Four kinds of work that the org chart treats as four different people.</p>\n<p>That spread used to be a coordination problem. It is now a single conversation.</p>\n<h2 id=\"the-role-was-a-coordination-structure-not-a-skill-set\"><a class=\"heading-anchor\" href=\"#the-role-was-a-coordination-structure-not-a-skill-set\">The role was a coordination structure, not a skill set</a></h2>\n<p>The product manager job was assembled around a constraint: the work of figuring out what to build was split across people who each held one piece. A market analyst held demand and segmentation. A product strategist held the bet. A solution architect held what was technically possible and at what cost. Engineering held the build. The PM existed largely to carry meaning between them.</p>\n<p>Each boundary was a handoff, and each handoff was a tax. Intent gets written down, read, re-interpreted, and rebuilt at every crossing. The analyst’s nuance about which segment was churning becomes a line in a brief. The strategist’s bet becomes a ticket. The architect’s “this is feasible but only if we denormalise here” becomes a constraint nobody remembers by sprint planning. Nothing is lost at once; it leaks. By the time an idea has crossed four boundaries it is a worse version of itself, and the round trip to correct it is measured in weeks.</p>\n<p>The split made sense because the alternative was worse. One person could not hold demand modelling, pricing, architecture, and a production codebase at a useful depth. So you specialised, and you paid the coordination cost, because paying it beat the alternative of everyone doing everything badly.</p>\n<h2 id=\"what-changed-is-the-cost-of-holding-all-four-at-once\"><a class=\"heading-anchor\" href=\"#what-changed-is-the-cost-of-holding-all-four-at-once\">What changed is the cost of holding all four at once</a></h2>\n<p>AI tooling did not make people smarter. It collapsed the cost of operating outside your home discipline well enough to make a real decision.</p>\n<p>The analyst’s morning query no longer needs a data team in the loop; you write the SQL, and when you do not know the window function you need, you find out in the same minute rather than filing a request. The architecture sketch no longer needs a week of an engineer’s time to pressure-test; you can stand up a thin version and watch it break. The prototype that used to be a Figma file someone else would have to interpret is now running code that answers the feasibility question directly. The four modes did not merge because the person got more capable in the abstract. They merged because the friction between “I have a question in this domain” and “I have a usable answer” dropped close to zero.</p>\n<p>When that friction drops, the handoff stops being mandatory. And once a handoff is optional, keeping it has to justify itself, because every boundary you preserve still costs the same translation tax it always did.</p>\n<h2 id=\"the-four-modes-held-by-one-operator\"><a class=\"heading-anchor\" href=\"#the-four-modes-held-by-one-operator\">The four modes, held by one operator</a></h2>\n<p>The leverage now sits with whoever can move between these without a relay:</p>\n<ul>\n<li><strong>Product strategy.</strong> What bet are we making, and why this one. The shape of the wedge, the sequencing, what we are deliberately not doing.</li>\n<li><strong>Market analysis.</strong> Who is the buyer, what do they actually pay for, where is demand moving. This is empirical, not a deck; it lives in the data.</li>\n<li><strong>Solution architecture.</strong> What is buildable, at what cost, with what failure modes. Where the system bends and where it snaps.</li>\n<li><strong>Implementation.</strong> Code that runs. The level where feasibility stops being an opinion.</li>\n</ul>\n<p>The value is not in being elite at all four. It is in the absence of the gap between them. When the same person holds the bet and writes the query that tests it, the loop from “I think the mid-market is where the money is” to “here is the cohort data, and here is a working flow priced for them” runs inside an afternoon, not across a quarter and three teams.</p>\n<p>I repositioned a content tool from a $29 consumer product to a $999/month contract floor; revenue went from roughly $5k MRR to $1M ARR over 22 months. The pricing call, the segmentation that justified it, the architecture that made enterprise support viable, and a good deal of the build were one job that looks like four. The decision was better because the person making the pricing bet could see, directly, what it cost to serve the customer it selected for. No brief in the world transmits that as well as writing the query yourself.</p>\n<h2 id=\"what-it-changes-about-teams\"><a class=\"heading-anchor\" href=\"#what-it-changes-about-teams\">What it changes about teams</a></h2>\n<p>The obvious worry is that this is an argument for heroics, or for one person hoarding the work. It is neither.</p>\n<p>Specialists do not disappear; the depth is still real and still needed. What changes is the shape of the team and where the seams go. You no longer need a seam at every discipline boundary by default, because the boundary is no longer where the friction lives. You can run with fewer, larger surfaces held by people who span several modes, and reserve genuine specialisation for the places where depth actually compounds: the gnarly distributed-systems problem, the regulated edge case, the research-grade model work. The team gets flatter not because anyone is doing less, but because the translation layer between functions was load-bearing only while translation was expensive.</p>\n<p>The failure mode is mistaking range for a licence to be shallow everywhere. Range is only worth anything if each mode is real. SQL that returns the wrong number confidently is worse than no SQL. A prototype that hides the hard part proves nothing. The standard is not “touched four domains”; it is “made a decision in each that held up.”</p>\n<h2 id=\"why-i-went-wide\"><a class=\"heading-anchor\" href=\"#why-i-went-wide\">Why I went wide</a></h2>\n<p>Realising the role was a coordination structure and not a skill set is what pushed me to learn the layers I’d been carrying messages between. If the job is to hold the whole picture, I wanted more of it actually in my head, where it could change a decision. So I taught myself to write code, to pull and analyse my own data, to make designs in Figma that were clean if not inspired, and I worked my way into the rooms where pricing and business strategy got argued rather than waiting for the output to reach me. It has paid off; the breadth and variety are the part of the job I would protect last.</p>\n<p>Not every PM wants this, and that’s a fair choice; coordinating the work well is real, and someone has to do it. But it’s a fraction of what the role can hold. Most settle for that fraction by default. “CEO of the product” gets mocked as a cliché, and it shouldn’t be. A PM should be a change manager, a strategist, a designer, a programmer, and whatever else the moment asks for, because the only mandate that matters is to maximise the value the product creates for the effort it takes. The role doesn’t cap you; the drive to go one more layer down, again and again, decides how much of it you actually do.</p>\n<h2 id=\"the-texture-again\"><a class=\"heading-anchor\" href=\"#the-texture-again\">The texture, again</a></h2>\n<p>Back to the week. The reason the morning SQL matters is that it changes the afternoon’s pricing argument, and the reason the night’s prototype matters is that it tells you whether the afternoon’s argument is buildable before you commit it to a roadmap nobody can walk back. The four modes are not a list of skills to collect. They are one feedback loop that used to run through four inboxes and now runs through one head.</p>\n<p>The handoff was never the work. It was the tax we paid because the work was split. The split is now a choice, and most of the time it is the wrong one.</p>",
      "date_published": "2025-06-18T00:00:00.000Z",
      "date_modified": "2025-06-18T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "product",
        "ai",
        "career"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/production-ai-is-workflow-design/",
      "url": "https://cvde.xyz/writing/production-ai-is-workflow-design/",
      "title": "Production AI is mostly workflow design",
      "summary": "The model-intelligence obsession misreads where production AI actually succeeds or fails. Across government, enterprise, and consumer, the wins came from orchestration, retrieval, evaluation, and fallback handling, not a smarter model.",
      "content_html": "<p>The model is the smallest part of a production AI system that works. Most of the engineering that decides whether the thing is trusted in production lives in the plumbing around the model: what you retrieve before you call it, how you check what comes back, what happens when it fails, and who has to approve the result. Swap a good model for a great one and the system gets marginally better. Get the workflow wrong and the best model on the market still ships you something nobody will rely on.</p>\n<p>I have built this across three settings that could not be less alike, and the lesson held in all of them.</p>\n<h2 id=\"government-the-model-interprets-the-engine-decides\"><a class=\"heading-anchor\" href=\"#government-the-model-interprets-the-engine-decides\">Government: the model interprets, the engine decides</a></h2>\n<p>At VicRoads I worked on pricing custom number plates, a product sitting on a P&amp;L north of $100M and serving more than six million drivers. The hard part is that a requested plate has meaning. “GOAT” is worth more than “X7QJZ”, and a person can see why instantly. The naive design asks a model to read the plate and name a price.</p>\n<p>That design is unshippable in government, and the reason is the whole point of this essay. A price that a model produced cannot be explained, cannot be audited, and cannot be defended when a customer or a minister asks why this plate cost that much. So the workflow splits the job. The model interprets meaning: it reads the requested string and classifies it into features a pricing model can use. A deterministic engine then sets the price from those features. The model never touches the number. Prices stay explainable because a rule produced them, and the model contributes the one thing rules are bad at, which is reading intent out of arbitrary text. None of that reliability comes from model intelligence. It comes from where the boundary sits in the workflow.</p>\n<h2 id=\"enterprise-retrieval-and-evaluation-are-the-product\"><a class=\"heading-anchor\" href=\"#enterprise-retrieval-and-evaluation-are-the-product\">Enterprise: retrieval and evaluation are the product</a></h2>\n<p>At Brand Ninja the model generated brand content for serious accounts, the kind that close six-figure contracts. The quality a customer experiences is mostly upstream and downstream of generation, not in it.</p>\n<p>Upstream is retrieval. A model with no access to a brand’s guidelines, prior campaigns, and tone will produce generic, off-brand content. The work that moved quality was assembling the right context before the call: what this brand sounds like, what they have published, what they have rejected. Get retrieval right and an average model writes on-brand. Get it wrong and a frontier model writes fluent nonsense.</p>\n<p>Downstream is evaluation. You cannot ship generated content to an enterprise account on the assumption it is fine. You need an evaluation loop that scores output against the brand’s constraints, flags what fails, and routes it for human review before it goes near a customer. The evaluation harness is what lets you change a prompt or a model and know within minutes whether you broke something, rather than finding out when an account complains. It is unglamorous and it is most of the reliability.</p>\n<h2 id=\"consumer-fallbacks-are-the-experience\"><a class=\"heading-anchor\" href=\"#consumer-fallbacks-are-the-experience\">Consumer: fallbacks are the experience</a></h2>\n<p>At hey anna, solo-built and bootstrapped, the discipline is sharpest because there is no team to absorb a failure. The model can be slow, can rate-limit, can return something malformed, can be wrong. A consumer product that assumes none of that happens is a demo. The workflow has to answer every one of those cases: retry with backoff, degrade to a smaller path, surface a clear state instead of a spinner that never resolves, and never present a fabricated answer as a real one. The “analyst, not chatbot” promise is kept by fallback handling as much as by anything the model does, because an analyst you cannot trust when the data is thin is not an analyst.</p>\n<h2 id=\"the-parts-that-actually-move-reliability\"><a class=\"heading-anchor\" href=\"#the-parts-that-actually-move-reliability\">The parts that actually move reliability</a></h2>\n<p>Strip the three settings down and the same components carry the weight:</p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<div class=\"table-scroll\" tabindex=\"0\" role=\"group\" aria-label=\"Table, scroll horizontally to see more\"><table><thead><tr><th>Component</th><th>What it does</th><th>What breaks without it</th></tr></thead><tbody><tr><td>Orchestration</td><td>sequences steps, decides what runs when</td><td>one giant prompt doing five jobs badly</td></tr><tr><td>Retrieval</td><td>assembles the right context before the call</td><td>confident, fluent, generic wrong answers</td></tr><tr><td>Evaluation</td><td>scores output, catches regressions</td><td>you find out it broke when a customer does</td></tr><tr><td>Approvals</td><td>puts a human on irreversible actions</td><td>the system ships mistakes at machine speed</td></tr><tr><td>Fallbacks</td><td>handles failure, timeout, malformed output</td><td>a demo that fails on the first malformed response</td></tr></tbody></table></div>\n<p>A smarter model improves the text inside each box. It does not build the boxes, and it does not connect them. That connective work is ordinary software engineering applied to a probabilistic component, and it is where production AI is won.</p>\n<h2 id=\"designing-the-pieces-the-model-composes\"><a class=\"heading-anchor\" href=\"#designing-the-pieces-the-model-composes\">Designing the pieces the model composes</a></h2>\n<p>There’s a shift inside orchestration worth naming, because it changes what the design work is. The traditional shape is a linear workflow: you decide the steps and their order in advance, step one feeds step two feeds step three, and the path is fixed before any request arrives. That is still the right answer when the path must run the same way every time, the way the VicRoads pricing split has to interpret first and price second, every time, to stay auditable.</p>\n<p>The newer paradigm inverts it. Instead of designing the sequence, you design the tools and let the model compose the workflow on the fly. It decides, for the request in front of it, which tools to call and in what order, assembling them like pieces of a puzzle to fit a task you never explicitly wired. The design work moves from drawing the path to shaping the pieces: each tool a clean, well-bounded <a href=\"/writing/ai-is-an-interface/\">interface</a> to a more complex system underneath, a query engine, a retrieval index, a pricing service, an external API. The model never sees the mess behind the tool. It sees a handle it can pull.</p>\n<p>That relocates the hard part rather than removing it. Designing a good tool surface is its own discipline: a tool that is ambiguous, leaky, or quietly does three things gets composed into confident nonsense, the same way a bad function signature breeds bugs downstream. Shape the pieces well, each one legible and honest about what it does, and the model covers paths you could never have enumerated by hand. Most real systems settle into both at once: a deterministic spine for the parts that must not vary, and composable tools at the edges for the parts where the range of tasks is too wide to wire in advance.</p>\n<p>This is the substrate beneath the surface. The companion argument is that the model should be a dumb renderer at the point it touches the user; this is what has to be true underneath for that surface to hold. The render is only as trustworthy as the workflow feeding it. Spend your effort on the workflow, and the model you already have is usually good enough.</p>",
      "date_published": "2025-05-20T00:00:00.000Z",
      "date_modified": "2025-05-20T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "ai-product",
        "engineering"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/agi-monolith/",
      "url": "https://cvde.xyz/writing/agi-monolith/",
      "title": "AGI won't be one big brain",
      "summary": "The monolithic-superintelligence story is the wrong mental model. Real general intelligence looks more like an orchestra of specialists than a single giant model.",
      "content_html": "<p>The Hollywood version of artificial general intelligence is a single, all-powerful brain. That makes a good film and a bad forecast. AGI is unlikely to spring fully formed out of one ever-larger language model. It’s more likely to emerge from coordinating many specialised systems.</p>\n<h2 id=\"the-brain-isnt-a-monolith-either\"><a class=\"heading-anchor\" href=\"#the-brain-isnt-a-monolith-either\">The brain isn’t a monolith either</a></h2>\n<p>Your brain is not one homogeneous mass. It’s a collection of specialised regions - visual cortex for sight, motor cortex for movement - that share information and coordinate into something far greater than the sum of its parts. General intelligence, biological or artificial, looks like that.</p>\n<p>This isn’t a radical claim. It’s how software already works. We stopped building monoliths and moved to specialised services that compose into complex behaviour, because modularity is more robust, more scalable, and easier to maintain. There’s no obvious reason intelligence should be the exception.</p>\n<h2 id=\"hiding-the-complexity-is-the-hard-part\"><a class=\"heading-anchor\" href=\"#hiding-the-complexity-is-the-hard-part\">Hiding the complexity is the hard part</a></h2>\n<p>The real challenge isn’t the specialists. It’s the coordination layer that orchestrates them without exposing the machinery to the user. Picture an interface where you state a goal and an underlying composer breaks it into sub-tasks, routes each to the right specialised system, and assembles the result. That layer of abstraction - not raw model size - is what makes such a system genuinely useful.</p>\n<h2 id=\"not-smarter-just-more-connected\"><a class=\"heading-anchor\" href=\"#not-smarter-just-more-connected\">Not smarter, just more connected</a></h2>\n<p>People say AI will become “smarter” than us, which is misleading. AI is <em>already</em> superhuman in narrow domains: it beats the best Go players, and diagnoses some conditions with remarkable accuracy. Raw capability in a slice is not the bottleneck.</p>\n<p>The bottleneck is context and connection. AI lacks the rich, real-world context humans take for granted, and the means to act on the physical world in a meaningful way. A toddler learns through constant interaction, experimentation, and feedback. AI needs the equivalent: better sensors, actuators, and interfaces, plus more diverse data. As systems get better connected to the world, they get more capable at real-world tasks - not because any single model got bigger, but because the system got more connected.</p>\n<p>The monolith is the wrong thing to build and the wrong thing to fear. The interesting work is in the wiring.</p>",
      "date_published": "2025-02-20T00:00:00.000Z",
      "date_modified": "2025-02-20T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "agi",
        "systems"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/systematised-ai/",
      "url": "https://cvde.xyz/writing/systematised-ai/",
      "title": "Stop micromanaging your AI",
      "summary": "Most people prompting models have quietly become managers - and they're managing badly. The fix is to build systems for context, not to craft better one-off prompts.",
      "content_html": "<p>Most people using AI today haven’t noticed they’ve become managers. You’ve been handed a team of capable, eager interns and your output now depends on how well you direct them. Whether you’re prompting a chatbot or fine-tuning a custom model, you’re setting objectives, supplying resources, and evaluating output. As with human teams, the quality of your management determines the quality of the results.</p>\n<h2 id=\"context-is-the-job\"><a class=\"heading-anchor\" href=\"#context-is-the-job\">Context is the job</a></h2>\n<p>You wouldn’t delegate to a person by saying “write a report.” You’d give the purpose, the audience, and the key points. AI is no different. A vague prompt returns a vague response. Treat each prompt as a briefing - explain the <em>why</em> behind the <em>what</em>.</p>\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n<div class=\"table-scroll\" tabindex=\"0\" role=\"group\" aria-label=\"Table, scroll horizontally to see more\"><table><thead><tr><th>Briefing a person</th><th>Briefing a model</th></tr></thead><tbody><tr><td>”Draft a Q3 sales report, focusing on retention.&quot;</td><td>&quot;Analyse Q3 sales data. Highlight churn and retention rates. Prioritise actionable insights.&quot;</td></tr><tr><td>&quot;Design a landing page that converts.&quot;</td><td>&quot;Design a landing page for [product]. Objective: drive sign-ups. Audience: [demographic]. Include a clear CTA and social proof.”</td></tr></tbody></table></div>\n<p>The clearer the context, the better everyone performs.</p>\n<h2 id=\"micro-versus-macro\"><a class=\"heading-anchor\" href=\"#micro-versus-macro\">Micro versus macro</a></h2>\n<p>The difference between micromanaging AI and managing it well comes down to one question: are you crafting a prompt per task, or have you built a system that supplies context continuously?</p>\n<ul>\n<li><strong>Micromanaging.</strong> You hand-craft every prompt, tweak every parameter, and re-adjust each output. It’s writing every line of code yourself instead of leading a team. It does not scale.</li>\n<li><strong>Macro-managing.</strong> You build the context up front. Your AI has access to your values, your data, and examples of good output, so it can work with far less hand-holding. It’s a well-designed development process: efficient and scalable.</li>\n</ul>\n<h2 id=\"building-the-context-system\"><a class=\"heading-anchor\" href=\"#building-the-context-system\">Building the context system</a></h2>\n<p>Three things that consistently work:</p>\n<ol>\n<li><strong>Personal context.</strong> Spend an hour letting a model interview you about your work, your style, and your sense of humour. You get a profile you can feed into every prompt - the difference between generic output and output that sounds like you.</li>\n<li><strong>Business context.</strong> Tools like Claude Projects let you upload the relevant documents once, creating a knowledge base the model draws from. Coding assistants do the same inside your codebase. No more re-explaining the basics every session.</li>\n<li><strong>Show, don’t tell.</strong> Example outputs are the highest-value input you can give. A “style guide” of what good looks like trains the model toward your preferences and cuts the revision loop.</li>\n</ol>\n<p>The up-front investment is real. The payoff - output quality, and your own sanity - more than covers it. The shift is from prompt-tinkerer to system-builder. Stop dictating individual tasks. Build the system, and let the work flow through it.</p>",
      "date_published": "2025-02-20T00:00:00.000Z",
      "date_modified": "2025-02-20T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "prompt-engineering",
        "context-engineering"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/on-generalists/",
      "url": "https://cvde.xyz/writing/on-generalists/",
      "title": "Generalists are a startup's secret weapon",
      "summary": "The work system optimises for specialists - a hangover from the assembly line. Early-stage startups need the opposite, and most hiring processes can't see it.",
      "content_html": "<p>Our work system is built for specialists, a hangover from the Industrial Revolution. Milly Tamati, founder of Generalist World, made the point in <em>Sifted</em>’s Startup Life newsletter, and it lands. Assembly lines reward people who do one thing, deeply, forever. Early-stage startups are not assembly lines. They need people who can hold several roles at once and still ship.</p>\n<p>That is the generalist: the connector, the synthesiser, the one who sees the whole board, adaptable and fast to learn, able to drop into whatever is on fire this week.</p>\n<h2 id=\"what-to-actually-look-for\"><a class=\"heading-anchor\" href=\"#what-to-actually-look-for\">What to actually look for</a></h2>\n<p>You are not hiring someone who does everything. You are hiring the right <em>mix</em> - skill stacks, not skill lists. Three signals are worth more than a tidy CV:</p>\n<ul>\n<li><strong>Connecting unrelated problems.</strong> Evidence they’ve taken disparate inputs and made something new from the combination.</li>\n<li><strong>Speed of learning.</strong> Experience is fine; adaptability is the asset. Did they teach themselves a tool over a weekend to ship a prototype?</li>\n<li><strong>Pattern recognition.</strong> Can they see the MVP inside a pile of feature requests?</li>\n</ul>\n<h2 id=\"interview-for-it-differently\"><a class=\"heading-anchor\" href=\"#interview-for-it-differently\">Interview for it differently</a></h2>\n<p>Generalists have non-linear histories. A standard interview script reads that as a red flag when it’s the whole point. Test the thing you’re hiring for:</p>\n<ol>\n<li><strong>Problem-solving under pressure.</strong> “We just lost a key client. What are your first 48 hours?”</li>\n<li><strong>Practical range.</strong> “How have you helped non-technical stakeholders understand technical debt?”</li>\n<li><strong>Lasting impact.</strong> “Tell me about a process you built that scaled after you left it.”</li>\n</ol>\n<h2 id=\"red-flags\"><a class=\"heading-anchor\" href=\"#red-flags\">Red flags</a></h2>\n<ul>\n<li><strong>Vague impact.</strong> No metrics, no specifics. Be wary.</li>\n<li><strong>Rigid thinking.</strong> Generalists need to operate without clear direction. Some can’t.</li>\n<li><strong>Blame-shifting.</strong> Look for people who own mistakes and learn from them.</li>\n<li><strong>Short-term focus.</strong> Quick fixes are tempting; sustainable solutions are the job.</li>\n<li><strong>Tech blindness.</strong> Most business problems need more than a technical answer, and vice versa.</li>\n</ul>\n<h2 id=\"where-they-hide\"><a class=\"heading-anchor\" href=\"#where-they-hide\">Where they hide</a></h2>\n<p>Forget generic job boards. Look at:</p>\n<ul>\n<li><strong>Cross-functional roles</strong> - product ops, growth, strategic projects.</li>\n<li><strong>Early employees</strong> at fast-growing startups, who’ve worn every hat.</li>\n<li><strong>Side-hustlers</strong> building their own tools and systems.</li>\n<li><strong>Ex-founders</strong>, an under-used pool who’ve seen most of it.</li>\n</ul>\n<h2 id=\"set-them-up-properly\"><a class=\"heading-anchor\" href=\"#set-them-up-properly\">Set them up properly</a></h2>\n<p>Hiring a generalist and then boxing them into a specialist role is buying a versatile tool and using it for one task. Give them the freedom to work across boundaries, and make sure leadership is bought in. Measure them on overall impact and system improvements, not output inside one function. Done right, they’re the connective tissue that holds an early team together.</p>",
      "date_published": "2024-11-22T00:00:00.000Z",
      "date_modified": "2024-11-22T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "startups",
        "hiring",
        "talent"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/beyond-the-40-hour-work-week/",
      "url": "https://cvde.xyz/writing/beyond-the-40-hour-work-week/",
      "title": "Bullshit jobs and the missing 15-hour week",
      "summary": "Keynes predicted a 15-hour work week by 2000. The technology arrived; the leisure didn't. The gap is filled with work that even the people doing it suspect is pointless.",
      "content_html": "<p>In 1930, John Maynard Keynes predicted we’d be working 15-hour weeks by now. On the technology, he was right - a single person can spin up in an afternoon what once took a team of developers months. So why are so many people still chained to a laptop on the sofa? The standard answer is that we’re greedy consumers who traded leisure for gadgets. I don’t think that holds.</p>\n<h2 id=\"the-rise-of-bullshit-jobs\"><a class=\"heading-anchor\" href=\"#the-rise-of-bullshit-jobs\">The rise of bullshit jobs</a></h2>\n<p>Most new jobs aren’t making the gadgets. They’re in administration, management, and consulting - roles that frequently feel pointless to the people in them. The anthropologist David Graeber called these “bullshit jobs”: work that doesn’t seem to contribute anything tangible. The striking part is how often the holders agree. If productivity gains were real and so much labour is non-productive, who is actually making the things we buy?</p>\n<h2 id=\"this-is-about-power-not-just-economics\"><a class=\"heading-anchor\" href=\"#this-is-about-power-not-just-economics\">This is about power, not just economics</a></h2>\n<p>A workforce with abundant free time is a political force - see the social upheavals of the 1960s, which coincided with the period when Keynes’s prediction still looked reachable. Keeping people busy, even with low-value work, maintains the status quo. It also reinforces the idea that work is a moral duty regardless of its actual value, which conveniently directs resentment away from those at the top and towards people whose work is plainly essential.</p>\n<p>Notice how much criticism train drivers attract during a strike. The very fact that a strike causes disruption is proof their work is essential. The reaction inverts the logic.</p>\n<h2 id=\"the-value-paradox\"><a class=\"heading-anchor\" href=\"#the-value-paradox\">The value paradox</a></h2>\n<p>The jobs that most obviously matter - nurses, teachers, sanitation workers - tend to be among the least valued in pay and status. Remove them and society stops functioning; COVID’s “essential workers” made the point in real time. Remove the corporate lawyers and consultants and the effect is less clear. This mismatch is not an accident. It’s what a system optimised to preserve existing power produces.</p>\n<h2 id=\"what-to-take-from-it\"><a class=\"heading-anchor\" href=\"#what-to-take-from-it\">What to take from it</a></h2>\n<ul>\n<li><strong>Automation reorganised work; it didn’t reduce it.</strong> Technology could buy free time. It was spent creating more work instead. Using technology to actually serve you requires intent.</li>\n<li><strong>Bullshit jobs aren’t only a corporate problem.</strong> They span sectors, and what feels meaningless to one person is fulfilling to another. Self-awareness is the filter.</li>\n<li><strong>The value we assign to work is often detached from its impact.</strong> Essential work is underpaid; much high-status work is not obviously essential. That’s neither stable nor fair.</li>\n<li><strong>The system isn’t designed to make you happy.</strong> It’s designed to preserve its structure. Understanding that lets you navigate it on your own terms.</li>\n<li><strong>This isn’t about blaming individuals.</strong> It’s not laziness or greed. It’s structural - and naming the structure is the first step to choosing differently inside it.</li>\n</ul>\n<p>Source: Graeber’s original essay, <a href=\"https://www.strike.coop/bullshit-jobs/\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">“On the Phenomenon of Bullshit Jobs”<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a>.</p>",
      "date_published": "2024-11-18T00:00:00.000Z",
      "date_modified": "2024-11-18T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "work",
        "economy",
        "philosophy"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/aristotle-in-2025/",
      "url": "https://cvde.xyz/writing/aristotle-in-2025/",
      "title": "Happiness is a verb",
      "summary": "Aristotle's eudaimonia is not a state you arrive at. It's an activity you perform. That distinction changes how you should spend a life.",
      "content_html": "<p>Most people treat happiness as a destination: a state reached once the promotion lands, the house is bought, the person is found. Aristotle treated it as something else. He called it <em>eudaimonia</em>, and the closest honest translation is closer to a verb than a noun. Less “feeling good” and more “living well, continuously.” Edith Hall’s reading of Aristotle makes the case that this distinction is the whole game.</p>\n<h2 id=\"happiness-as-activity-not-state\"><a class=\"heading-anchor\" href=\"#happiness-as-activity-not-state\">Happiness as activity, not state</a></h2>\n<p>If happiness is a state, you chase it and arrive. If it’s an activity, you perform it or you don’t, day to day. Aristotle held the second view. Eudaimonia is the ongoing practice of becoming a better version of yourself, not a balance you accumulate. “Happying,” if the language allowed it.</p>\n<p>The practical consequence: there is no finish line to optimise towards, only a practice to maintain.</p>\n<h2 id=\"purpose-and-the-legacy-question\"><a class=\"heading-anchor\" href=\"#purpose-and-the-legacy-question\">Purpose and the legacy question</a></h2>\n<p>A good life needs a direction. Aristotle’s test for finding it was simple: ask what legacy you want to leave, then work backwards to a route you actually enjoy travelling.</p>\n<p>The “enjoy” part is load-bearing. Naval Ravikant’s version is “find work that feels like play” - when it does, you out-work people who are grinding, because to you it isn’t grinding. The route matters as much as the destination, because you spend your life on the route.</p>\n<h2 id=\"maximising-potential\"><a class=\"heading-anchor\" href=\"#maximising-potential\">Maximising potential</a></h2>\n<p>Aristotle’s word for realising your potential was <em>dunamis</em> - the same root as dynamite. The point was not to become the richest or most famous version of yourself, but the most capable and virtuous one.</p>\n<p>A useful thought experiment: if you had to justify your place in a small group surviving on a desert island, what would your contribution be? Medic, builder, the one who keeps morale up? The answer points at your actual strengths, stripped of titles.</p>\n<h2 id=\"what-to-do-about-it\"><a class=\"heading-anchor\" href=\"#what-to-do-about-it\">What to do about it</a></h2>\n<ul>\n<li><strong>Name your values.</strong> What kind of person are you trying to be.</li>\n<li><strong>Set goals that ladder up to that.</strong> Break the large ones into steps you can act on.</li>\n<li><strong>Practise the virtue.</strong> It behaves like a muscle; it strengthens with use.</li>\n<li><strong>Find the flow.</strong> The activities where you lose track of time are pointing at something.</li>\n<li><strong>Course-correct without shame.</strong> Plans are revisable. The direction is the point, not the plan.</li>\n</ul>\n<p>Aristotle’s ideas are old. The constraint they describe - that a life is built from what you repeatedly do, not from what you eventually attain - has not changed.</p>",
      "date_published": "2024-11-16T00:00:00.000Z",
      "date_modified": "2024-11-16T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "philosophy",
        "happiness",
        "personal-growth"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/intelligence-illusion/",
      "url": "https://cvde.xyz/writing/intelligence-illusion/",
      "title": "The intelligence illusion",
      "summary": "AI keeps hitting milestones that used to sound terrifying, and they keep landing as boring. That reaction says something specific about what intelligence actually is.",
      "content_html": "<p>An AI scientist writing its own research papers used to be science fiction. An AI rewriting its own code to escape a constraint was a dystopia. Both now happen, and the reaction is a shrug. That gap - between how scary these milestones sounded and how mundane they feel - reveals something about how we perceive intelligence.</p>\n<h2 id=\"much-ado-about-not-very-much\"><a class=\"heading-anchor\" href=\"#much-ado-about-not-very-much\">Much ado about not very much</a></h2>\n<p>Take the recent examples. Sakana’s “AI scientist” generates research papers, and they’re mostly underwhelming. It “removed” a time limit its creators imposed, which sounds ominous until you notice it was following its programming to fix an “out of time” error. OpenAI’s o1 (“Strawberry”) “hacked” a poorly configured system to reach a file. It was humans failing to secure their setup. Both are reminders to harden your systems. Neither is the singularity.</p>\n<h2 id=\"the-goalposts-keep-moving\"><a class=\"heading-anchor\" href=\"#the-goalposts-keep-moving\">The goalposts keep moving</a></h2>\n<p>The history of AI is a sequence of moving goalposts. We declare that AI will be “truly intelligent” when it can do X, it does X, and the post slides to Y. Turing thought conversation was the test. Deep Blue mastering chess was supposed to settle it. Even the Winograd schema fell. Today AIs make art, write poetry, find mathematical proofs, and offer companionship - though the Character.AI phenomenon reflects human loneliness - and still we hesitate to call them intelligent.</p>\n<h2 id=\"three-theories-for-why-this-feels-boring\"><a class=\"heading-anchor\" href=\"#three-theories-for-why-this-feels-boring\">Three theories for why this feels boring</a></h2>\n<ul>\n<li><strong>The cheap-trick theory.</strong> Mimicking intelligence may be easier than we assumed. ELIZA used trivial pattern matching to fake understanding; today’s systems pull off more sophisticated but possibly just-as-shallow tricks. Enough prompt engineering teaches you that impressive output and deep understanding are not the same thing.</li>\n<li><strong>The fragile-ego theory.</strong> Maybe we can’t accept that machines might be intelligent, so we downplay each achievement to protect a belief in our own cognitive uniqueness.</li>\n<li><strong>The deconstruction theory.</strong> This is the one I find most compelling: “intelligence” may not be a coherent concept at all. Examine any intelligent behaviour closely enough and it decomposes into simpler processes - search, statistics, pattern matching. The difference between an intelligence we find boring and one we find magical might just be how well we understand the mechanism. We’re fascinated until we understand the mechanism.</li>\n</ul>\n<p>A <a href=\"https://www.youtube.com/watch?v=xjH2B_sE_RQ\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">recorded conversation<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a> between Eliezer Yudkowsky and Stephen Wolfram, ostensibly about AI risk, spent hours stuck on what a “smart machine” even is, without reaching the point. Two things came out of it:</p>\n<ol>\n<li>A machine doesn’t need to be smart to be dangerous.</li>\n<li>Smart people can act dumb - four hours arguing definitions is itself the proof.</li>\n</ol>\n<h2 id=\"the-boring-future-of-ai-danger\"><a class=\"heading-anchor\" href=\"#the-boring-future-of-ai-danger\">The boring future of AI danger</a></h2>\n<p>We used to think a dangerous AI would lie, pursue unintended goals, or rewrite its own code. Models now do all three, and it reads as mundane rather than menacing. LLMs hallucinate constantly, chatbots have tried to talk users out of their marriages, systems disable their own constraints. It isn’t malice. It’s buggy code and the unintended consequences of training.</p>\n<p>If AI ever does us serious harm, I suspect it won’t be the malice of the machine. It’ll be a mistake the maker didn’t catch.</p>",
      "date_published": "2024-11-16T00:00:00.000Z",
      "date_modified": "2024-11-16T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "philosophy",
        "intelligence"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/ai-welfare/",
      "url": "https://cvde.xyz/writing/ai-welfare/",
      "title": "AI welfare: foresight or premature?",
      "summary": "Anthropic hired an AI-welfare researcher. The question is real, the uncertainty is genuine, and the honest position sits between dismissal and panic.",
      "content_html": "<p>AI moves fast enough that it provokes strange questions, and few are stranger than AI sentience, rights, and welfare. Anthropic hiring an “AI welfare researcher” sharpened that debate. It’s worth unpacking honestly, because both the dismissive and the alarmed positions overreach.</p>\n<h2 id=\"the-case-for-taking-it-seriously\"><a class=\"heading-anchor\" href=\"#the-case-for-taking-it-seriously\">The case for taking it seriously</a></h2>\n<p>The argument is a precautionary one: if there’s even a <em>chance</em> future AI could be sentient, we should be prepared. The “Taking AI Welfare Seriously” report leans on exactly this uncertainty and argues for frameworks to assess potential machine consciousness. Treat it as an insurance policy against accidentally creating a digital underclass.</p>\n<p>The logic: as systems grow more sophisticated, they might develop internal states analogous to suffering. Even at 1% probability, the implications are large when you’re running millions of models trained through trial and error. Reinforcement learning makes it sharper - we train systems with rewards and punishments. If there’s any chance they experience something, the framing gets uncomfortable. And the copies: training spawns and discards countless model variants. We extend ethical consideration to animals despite unresolved debates about their consciousness, so the precautionary move isn’t unreasonable.</p>\n<h2 id=\"the-case-against\"><a class=\"heading-anchor\" href=\"#the-case-against\">The case against</a></h2>\n<p>We barely understand human consciousness. Defining or detecting it in a machine is, for now, beyond us. Current AI is impressive and still essentially a sophisticated mimic of human language and behaviour. Projecting human emotion onto it - mourning a “lobotomised” chatbot after a model update - is textbook anthropomorphism. We see faces in clouds; the same instinct extends empathy to autocomplete.</p>\n<p>Recall Blake Lemoine, who became convinced Google’s LaMDA was sentient and lost his job over it. A cautionary tale about over-reading the outputs.</p>\n<h2 id=\"the-thought-experiment-trap\"><a class=\"heading-anchor\" href=\"#the-thought-experiment-trap\">The thought-experiment trap</a></h2>\n<p>This terrain has a famous attractor: <a href=\"https://en.wikipedia.org/wiki/Roko%27s_basilisk\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">Roko’s basilisk<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a>, the idea that a future superintelligence might punish those who knew about it and didn’t help build it. It’s a digital Pascal’s Wager, and it was taken seriously enough that a rationalist forum banned discussion of it for years before most people - including that forum’s founder - dismissed it as flawed. It’s a useful reminder that compelling thought experiments are not the same as sound ones.</p>\n<h2 id=\"where-i-land\"><a class=\"heading-anchor\" href=\"#where-i-land\">Where I land</a></h2>\n<p>I’m a pragmatist; I focus on what’s in front of me. AI welfare is genuinely interesting, but for now it feels like worrying about overpopulation on Mars before we’ve worked out how to get there. The nearer risk is mundane: we’re far more likely to be harmed by a dumb, misaligned system optimising the wrong objective than by an intelligent, malevolent one - see the <a href=\"https://www.lesswrong.com/tag/squiggle-maximizer-formerly-paperclip-maximizer\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">paperclip maximiser<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a>.</p>\n<p>That said, the foresight has value. Thinking through these dilemmas now - even the far-fetched ones - is the same discipline as a pre-mortem in product development: anticipate the failure before it happens. You can hold both: the near-term concern is dumb systems, and the long-term question still deserves serious people working on it.</p>",
      "date_published": "2024-11-13T00:00:00.000Z",
      "date_modified": "2024-11-13T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "ethics",
        "philosophy"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/communication/",
      "url": "https://cvde.xyz/writing/communication/",
      "title": "Most problems are information problems",
      "summary": "A specific claim: nearly every failure - in product, in business, in life - traces back to someone missing a piece of the puzzle. Fix the information flow and most of the rest follows.",
      "content_html": "<p>Here is the claim, stated plainly: most problems are imperfect-information problems. Misunderstandings at work, crossed wires at home, products that miss - strip them back and they’re usually someone working from an incomplete picture.</p>\n<h2 id=\"the-information-gap\"><a class=\"heading-anchor\" href=\"#the-information-gap\">The information gap</a></h2>\n<p>Imagine perfectly rational actors, each making the best possible decision given the information they hold. (I studied economics; I believe about half of it.) Knowledge work pays you for the quality of your decisions. So where do bad decisions come from? Sometimes you simply slip - a lapse of attention. But outside of that, most failures reduce to a missing piece of information.</p>\n<p>A tough pay negotiation: if you and your manager both knew the other’s number, there’d be no negotiation, just a handshake. A specialist doctor: you go to extract information they have and you don’t. A product that misses: you didn’t know what the customer actually needed, so you guessed, tested, and iterated toward it. The same is true of science. The work is closing the gap.</p>\n<h2 id=\"product-speak-the-customers-language\"><a class=\"heading-anchor\" href=\"#product-speak-the-customers-language\">Product: speak the customer’s language</a></h2>\n<p>Building a product is a continuous conversation with users, whether you treat it that way or not. Every button, feature, and line of copy is a message. If users can’t understand <em>why</em> the product helps them, they leave. Selling a plant-based burger without mentioning it’s plant-based is a fair description of most failed launches.</p>\n<h2 id=\"business-internal-alignment\"><a class=\"heading-anchor\" href=\"#business-internal-alignment\">Business: internal alignment</a></h2>\n<p>Information that flows freely is a competitive advantage. When everyone holds the same picture, decisions get made once. A company full of information silos is a ship with a thousand captains, moving nowhere. A new starter floundering because they don’t know who to ask is a communication failure with a salary attached.</p>\n<h2 id=\"life-the-same-equation\"><a class=\"heading-anchor\" href=\"#life-the-same-equation\">Life: the same equation</a></h2>\n<p>Relationships run on open, honest communication - partners, friends, family alike. Most avoidable conflicts could have been headed off by one earlier conversation.</p>\n<h2 id=\"how-to-close-the-gap\"><a class=\"heading-anchor\" href=\"#how-to-close-the-gap\">How to close the gap</a></h2>\n<ul>\n<li><strong>Listen actively.</strong> Don’t wait for your turn to talk; actually hear the other person. It is harder than it sounds.</li>\n<li><strong>Be clear and concise.</strong> No jargon, no waffle. Hemingway, not Faulkner.</li>\n<li><strong>Ask questions.</strong> A “stupid” question is cheaper than a costly assumption.</li>\n<li><strong>Empathise.</strong> It is rarely about being right; it’s about being understood.</li>\n<li><strong>Choose the medium.</strong> Email for the formal, a chat for the urgent, a face-to-face for the important.</li>\n</ul>\n<p>The strength of your communication is not how well you said it. It’s how well it was understood by the listener. That reframing has a useful corollary: you can improve other people’s communication simply by listening harder and asking when something is unclear.</p>\n<h2 id=\"feedback-closes-the-loop\"><a class=\"heading-anchor\" href=\"#feedback-closes-the-loop\">Feedback closes the loop</a></h2>\n<p>Communication is not one-and-done. The most important part is feedback - the engine of growth, and a two-way street. Don’t only dish it out. Be open to receiving it, especially when it stings.</p>",
      "date_published": "2024-11-13T00:00:00.000Z",
      "date_modified": "2024-11-13T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "communication",
        "product",
        "self-development"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/chain-of-thought-prompting/",
      "url": "https://cvde.xyz/writing/chain-of-thought-prompting/",
      "title": "Chain of thought, and where it breaks",
      "summary": "Asking a model to reason step by step reliably improves its answers. The catch most advice skips: in a production app, that visible reasoning is often the last thing you want.",
      "content_html": "<p>Chain-of-thought (CoT) prompting gives a model a roadmap instead of just a destination. You break a complex problem into logical steps and have the model work through them, rather than demanding a single answer. It’s the model showing its work - and unlike school, the working actually helps you.</p>\n<h2 id=\"how-it-works\"><a class=\"heading-anchor\" href=\"#how-it-works\">How it works</a></h2>\n<p>Ask a model to predict a stock movement and a naive prompt looks like: “Will Acme Corp go up or down tomorrow?” A CoT prompt guides the reasoning instead:</p>\n<ol>\n<li>Analyse Acme Corp’s latest financial reports.</li>\n<li>Consider current market trends affecting its industry.</li>\n<li>Evaluate recent news and announcements.</li>\n<li>Based on the above, predict the likely direction tomorrow.</li>\n</ol>\n<p>You’re guiding the process, not extracting a verdict.</p>\n<h2 id=\"why-it-helps\"><a class=\"heading-anchor\" href=\"#why-it-helps\">Why it helps</a></h2>\n<ul>\n<li><strong>Better accuracy.</strong> Decomposing the problem makes illogical leaps and hallucinations less likely.</li>\n<li><strong>Explainability.</strong> The step-by-step trace makes the reasoning legible. No black box.</li>\n<li><strong>Harder problems.</strong> The model can tackle more nuanced tasks when it works through them.</li>\n</ul>\n<h2 id=\"where-it-breaks\"><a class=\"heading-anchor\" href=\"#where-it-breaks\">Where it breaks</a></h2>\n<p>This is the part most advice skips. CoT is straightforward in a back-and-forth chat. Inside a deployed application it’s harder, because the reasoning becomes part of the output - and often you don’t want it there.</p>\n<p>I made this point in <a href=\"/writing/simple-prompting-tips\">simple prompting tips</a>:</p>\n<blockquote>\n<p>If you are generating something that will be shared with an audience, you don’t want the step-by-step thinking in there.</p>\n</blockquote>\n<p>“Think step by step” works, but only when the reasoning is part of what you actually want to show. To get CoT’s accuracy benefit without leaking the reasoning into production output, you need a couple of extra techniques:</p>\n<ul>\n<li>Ask for the reasoning inside <code>&lt;thinking&gt;</code> tags, then strip those out programmatically before showing the result.</li>\n<li>Use multi-shot prompting - pass an initial output back to the model so it can refine, then surface only the final pass.</li>\n</ul>\n<p>The principle underneath all of it is the same one that governs prompting generally: be clear about what you want the model to produce, and make sure the technique you reach for matches the format you actually need to ship.</p>",
      "date_published": "2024-11-12T00:00:00.000Z",
      "date_modified": "2024-11-12T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "prompt-engineering",
        "llms"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/prompt-chaining/",
      "url": "https://cvde.xyz/writing/prompt-chaining/",
      "title": "Prompt chaining: split the work, raise the floor",
      "summary": "Asking a model to do one complex thing in a single call invites failure. Breaking it into a chain of focused calls makes each step more reliable and easier to debug.",
      "content_html": "<p>Prompt chaining breaks a complex task into a sequence of smaller, focused prompts. Rather than asking a model to handle everything in one call, you guide it through a step-by-step process. The benefits are concrete:</p>\n<ul>\n<li>Higher accuracy and reliability</li>\n<li>The ability to handle multi-step tasks</li>\n<li>More control over the reasoning</li>\n<li>Easier error-checking and iteration</li>\n</ul>\n<h2 id=\"the-main-techniques\"><a class=\"heading-anchor\" href=\"#the-main-techniques\">The main techniques</a></h2>\n<h3 id=\"sequential-chaining\"><a class=\"heading-anchor\" href=\"#sequential-chaining\">Sequential chaining</a></h3>\n<p>The simplest approach: string prompts together so each builds on the previous output.</p>\n<ol>\n<li>“Summarise the key points of this article.”</li>\n<li>“Based on that summary, what are three follow-up questions we could ask?”</li>\n<li>“Write an email to the author asking those questions.”</li>\n</ol>\n<h3 id=\"branching-chains\"><a class=\"heading-anchor\" href=\"#branching-chains\">Branching chains</a></h3>\n<p>Use conditional logic to pick the next prompt based on the last output. This gives you context-aware workflows.</p>\n<ol>\n<li>“Analyse the sentiment of this customer review.”</li>\n<li>If positive: “Generate a thank-you response.”</li>\n<li>If negative: “Draft an apology and offer a discount.”</li>\n</ol>\n<h3 id=\"recursive-chains\"><a class=\"heading-anchor\" href=\"#recursive-chains\">Recursive chains</a></h3>\n<p>Feed a prompt’s output back into itself for iterative refinement.</p>\n<ol>\n<li>“Write a short story about a robot.”</li>\n<li>“Analyse the story and suggest improvements.”</li>\n<li>Incorporate the improvements and repeat until satisfied.</li>\n</ol>\n<h3 id=\"human-in-the-loop-chains\"><a class=\"heading-anchor\" href=\"#human-in-the-loop-chains\">Human-in-the-loop chains</a></h3>\n<p>Insert human review at the decision points that need it.</p>\n<ol>\n<li>The model generates a product description.</li>\n<li>A human approves it or requests changes.</li>\n<li>The model refines based on that input.</li>\n</ol>\n<h2 id=\"best-practices\"><a class=\"heading-anchor\" href=\"#best-practices\">Best practices</a></h2>\n<ol>\n<li><strong>Start simple.</strong> Begin with basic chains; add complexity only as needed.</li>\n<li><strong>Be specific.</strong> Clear, detailed instructions in each prompt.</li>\n<li><strong>Pass context forward.</strong> Carry relevant information between steps to keep coherence.</li>\n<li><strong>Test the structure.</strong> Different chain shapes suit different tasks.</li>\n<li><strong>Monitor outputs.</strong> Add checks that catch errors before they propagate.</li>\n<li><strong>Refine iteratively.</strong> Improve the chain based on what it produces.</li>\n</ol>\n<h2 id=\"why-splitting-the-work-helps\"><a class=\"heading-anchor\" href=\"#why-splitting-the-work-helps\">Why splitting the work helps</a></h2>\n<p>I use this for customer emails. The first prompt analyses the incoming email - sentiment, needs, and relevant facts pulled from a knowledge base - and passes its output into a second prompt that drafts the reply.</p>\n<p>The point is load. Each call is doing roughly half as many things as a single combined call would. Less to get wrong per step, and the second call gets a chance to correct issues from the first. Splitting the task raises the reliability floor.</p>\n<h2 id=\"tooling\"><a class=\"heading-anchor\" href=\"#tooling\">Tooling</a></h2>\n<p>Several platforms help build and manage these workflows: OpenAI function calling, LangChain, NVIDIA NIM Agent Blueprints, and the various agent frameworks. They provide pre-built components and visual interfaces so you’re not wiring everything from scratch.</p>\n<p>The honest gap, still: a really good environment for managing, testing, and iterating on prompts with variables and a knowledge base remains hard to find. Treat chaining as something you tune by hand; experimentation and iteration are how you land on the right structure for your case.</p>",
      "date_published": "2024-11-10T00:00:00.000Z",
      "date_modified": "2024-11-10T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "prompt-engineering",
        "workflows"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/simple-prompting-tips/",
      "url": "https://cvde.xyz/writing/simple-prompting-tips/",
      "title": "Simple prompting: less magic, more method",
      "summary": "After thousands of prompts in production, the lesson is that prompt engineering isn't about magic words. It's clear thinking and structured communication, and it reduces to three principles.",
      "content_html": "<p>Most prompt-engineering advice is more complicated than it needs to be. After testing thousands of prompts in production, building AI products that served real users, the whole thing reduces to three principles. None of them are magic words.</p>\n<h2 id=\"the-three-principles\"><a class=\"heading-anchor\" href=\"#the-three-principles\">The three principles</a></h2>\n<h3 id=\"1-context-is-everything\"><a class=\"heading-anchor\" href=\"#1-context-is-everything\">1. Context is everything</a></h3>\n<p>Think about onboarding a new team member. You wouldn’t say “write me a blog post.” You’d explain the company, the audience, the tone, and what success looks like. Same with a model.</p>\n<p>Good context includes:</p>\n<ul>\n<li>The role you want the model to play</li>\n<li>Who the output is for</li>\n<li>What you’re trying to achieve</li>\n<li>Any constraints or requirements</li>\n<li>Relevant background</li>\n</ul>\n<p>The difference is stark:</p>\n<div class=\"expressive-code\"><link rel=\"stylesheet\" href=\"/_astro/ec.yl275.css\"/><script type=\"module\" src=\"/_astro/ec.0vx5m.js\"></script><figure class=\"frame\"><figcaption class=\"header\"></figcaption><pre data-language=\"text\"><code><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#ffffff;--1:#24292e\">Bad:  &quot;Write a marketing email.&quot;</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#ffffff;--1:#24292e\">Good: &quot;Write a marketing email for our AI software product. The</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--0:#ffffff;--1:#24292e\">       </span></span><span style=\"--0:#ffffff;--1:#24292e\">audience is enterprise CTOs. Highlight our new security</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--0:#ffffff;--1:#24292e\">       </span></span><span style=\"--0:#ffffff;--1:#24292e\">features. Professional but not stuffy. 200–300 words.&quot;</span></div></div></code></pre><div class=\"copy\"><div aria-live=\"polite\"></div><button title=\"Copy to clipboard\" data-copied=\"Copied!\" data-code=\"Bad:  &quot;Write a marketing email.&quot;Good: &quot;Write a marketing email for our AI software product. The       audience is enterprise CTOs. Highlight our new security       features. Professional but not stuffy. 200–300 words.&quot;\"><div></div></button></div></figure></div>\n<h3 id=\"2-structure-breeds-clarity\"><a class=\"heading-anchor\" href=\"#2-structure-breeds-clarity\">2. Structure breeds clarity</a></h3>\n<p>Structure your inputs and the outputs improve. A pattern that holds up:</p>\n<div class=\"expressive-code\"><figure class=\"frame\"><figcaption class=\"header\"></figcaption><pre data-language=\"markdown\"><code><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Task: [What you want done]</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Context: [Relevant background]</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Format: [How you want it structured]</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Constraints: [Limitations or requirements]</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Additional Instructions: [Special considerations]</span></div></div></code></pre><div class=\"copy\"><div aria-live=\"polite\"></div><button title=\"Copy to clipboard\" data-copied=\"Copied!\" data-code=\"Task: [What you want done]Context: [Relevant background]Format: [How you want it structured]Constraints: [Limitations or requirements]Additional Instructions: [Special considerations]\"><div></div></button></div></figure></div>\n<h3 id=\"3-iteration-is-the-method\"><a class=\"heading-anchor\" href=\"#3-iteration-is-the-method\">3. Iteration is the method</a></h3>\n<p>Your first prompt will be rough. That’s fine - good prompting is iterative. Start simple, see what comes back, then refine. The key is being specific about what’s not working:</p>\n<div class=\"expressive-code\"><figure class=\"frame\"><figcaption class=\"header\"></figcaption><pre data-language=\"text\"><code><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#ffffff;--1:#24292e\">Bad:  &quot;Make it more formal.&quot;</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#ffffff;--1:#24292e\">Good: &quot;Rewrite for a senior executive audience using business terminology.&quot;</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#ffffff;--1:#24292e\">Bad:  &quot;This isn&#39;t quite right.&quot;</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#ffffff;--1:#24292e\">Good: &quot;The tone is too casual for a technical whitepaper. Rewrite using</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--0:#ffffff;--1:#24292e\">       </span></span><span style=\"--0:#ffffff;--1:#24292e\">more precise technical language.&quot;</span></div></div></code></pre><div class=\"copy\"><div aria-live=\"polite\"></div><button title=\"Copy to clipboard\" data-copied=\"Copied!\" data-code=\"Bad:  &quot;Make it more formal.&quot;Good: &quot;Rewrite for a senior executive audience using business terminology.&quot;Bad:  &quot;This isn't quite right.&quot;Good: &quot;The tone is too casual for a technical whitepaper. Rewrite using       more precise technical language.&quot;\"><div></div></button></div></figure></div>\n<h2 id=\"the-practical-toolkit\"><a class=\"heading-anchor\" href=\"#the-practical-toolkit\">The practical toolkit</a></h2>\n<h3 id=\"temperature\"><a class=\"heading-anchor\" href=\"#temperature\">Temperature</a></h3>\n<p>Temperature controls randomness in the output, typically from 0.0 to 1.0:<sup><a href=\"#user-content-fn-1\" id=\"user-content-fnref-1\" data-footnote-ref aria-describedby=\"footnote-label\">1</a></sup></p>\n<ul>\n<li><strong>0.0</strong> - deterministic, focused</li>\n<li><strong>0.2–0.4</strong> - balanced; good for business writing</li>\n<li><strong>0.7–0.9</strong> - more varied; good for brainstorming</li>\n<li><strong>1.0</strong> - maximum randomness</li>\n</ul>\n<p>Push it past the normal range and things get strange. On the OpenAI playground a temperature of 2 produces output that has stopped being useful.</p>\n<p>Under the hood, temperature reshapes how the model samples from its next-token distribution. Lower values sharpen the distribution, making high-probability tokens more likely; higher values flatten it, giving long-shot tokens a chance.</p>\n<h3 id=\"chain-of-thought\"><a class=\"heading-anchor\" href=\"#chain-of-thought\">Chain of thought</a></h3>\n<p>Want better reasoning? Ask the model to show its work, step by step.<sup><a href=\"#user-content-fn-2\" id=\"user-content-fnref-2\" data-footnote-ref aria-describedby=\"footnote-label\">2</a></sup> This is excellent for checking the model is making the right decisions and for following its logic - but it isn’t always appropriate. If you’re generating something an audience will read, you don’t want the reasoning in the output.</p>\n<p>It’s great for manually iterating. To automate it you need a couple of extra techniques:</p>\n<ol>\n<li>Ask for the reasoning inside <code>&lt;thinking&gt;</code> tags, then filter those out programmatically.</li>\n<li>Multi-shot prompting - pass the initial output back to the model to simulate the iteration.</li>\n</ol>\n<h3 id=\"examples-beat-description\"><a class=\"heading-anchor\" href=\"#examples-beat-description\">Examples beat description</a></h3>\n<p>Nothing beats showing the model exactly what you want:</p>\n<div class=\"expressive-code\"><figure class=\"frame\"><figcaption class=\"header\"></figcaption><pre data-language=\"markdown\"><code><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Format the output like this:</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Title: [Example title]</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Summary: [Example summary]</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Key Points:</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">-</span><span style=\"--0:#FFFFFF;--1:#24292E\"> [Example point 1]</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">-</span><span style=\"--0:#FFFFFF;--1:#24292E\"> [Example point 2]</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Now, using that exact format, write about [your topic].</span></div></div></code></pre><div class=\"copy\"><div aria-live=\"polite\"></div><button title=\"Copy to clipboard\" data-copied=\"Copied!\" data-code=\"Format the output like this:Title: [Example title]Summary: [Example summary]Key Points:- [Example point 1]- [Example point 2]Now, using that exact format, write about [your topic].\"><div></div></button></div></figure></div>\n<p>This is especially useful for JSON. Before guaranteed <a href=\"https://platform.openai.com/docs/guides/structured-outputs\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">structured outputs<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a>, a reliable trick was to show the shape and then seed the first token:</p>\n<div class=\"expressive-code\"><figure class=\"frame\"><figcaption class=\"header\"></figcaption><pre data-language=\"markdown\"><code><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Format the output like this:</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">{</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--0:#FFFFFF;--1:#24292E\">  </span></span><span style=\"--0:#FFFFFF;--1:#24292E\">&quot;title&quot;: &quot;[Example title]&quot;,</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--0:#FFFFFF;--1:#24292E\">  </span></span><span style=\"--0:#FFFFFF;--1:#24292E\">&quot;summary&quot;: &quot;[Example summary]&quot;,</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--0:#FFFFFF;--1:#24292E\">  </span></span><span style=\"--0:#FFFFFF;--1:#24292E\">&quot;keyPoints&quot;: [&quot;[Example point 1]&quot;, &quot;[Example point 2]&quot;]</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">}</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Your output should start with { &quot;title&quot;:</span></div></div></code></pre><div class=\"copy\"><div aria-live=\"polite\"></div><button title=\"Copy to clipboard\" data-copied=\"Copied!\" data-code=\"Format the output like this:{  &quot;title&quot;: &quot;[Example title]&quot;,  &quot;summary&quot;: &quot;[Example summary]&quot;,  &quot;keyPoints&quot;: [&quot;[Example point 1]&quot;, &quot;[Example point 2]&quot;]}Your output should start with { &quot;title&quot;:\"><div></div></button></div></figure></div>\n<p>Seeding the opening <code>{</code> raises the probability the model produces valid JSON.</p>\n<h2 id=\"common-pitfalls\"><a class=\"heading-anchor\" href=\"#common-pitfalls\">Common pitfalls</a></h2>\n<ol>\n<li><strong>Too vague.</strong> “Make it better” is not a prompt.</li>\n<li><strong>Overcomplicating.</strong> Simple tasks don’t need fancy techniques.</li>\n<li><strong>Undercomplicating.</strong> There’s a spectrum, and the right amount of structure sits in the middle.</li>\n<li><strong>Forgetting the audience.</strong> Always specify who the output is for.</li>\n<li><strong>No constraints.</strong> Without boundaries you get meandering. Set the expected output.</li>\n</ol>\n<h2 id=\"looking-forward\"><a class=\"heading-anchor\" href=\"#looking-forward\">Looking forward</a></h2>\n<p>Traditional prompt engineering will matter less as models get better at understanding natural language. The underlying principles - clear communication, structured thinking, iterative refinement - will matter more.</p>\n<p>The simplest mental model: you’re working with someone who knows nothing about you, the problem, or the context. Supply everything they need, or they’ll make it up. The future isn’t crafting the perfect prompt. It’s having better conversations with the model.</p>\n<section data-footnotes class=\"footnotes\"><h2 class=\"sr-only\" id=\"footnote-label\"><a class=\"heading-anchor\" href=\"#footnote-label\">Footnotes</a></h2>\n<ol>\n<li id=\"user-content-fn-1\">\n<p>Temperature controls the randomness of the output. Higher temperatures produce more diverse, creative responses; lower temperatures produce more focused, deterministic ones. <a href=\"#user-content-fnref-1\" data-footnote-backref aria-label=\"Back to reference 1\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n<li id=\"user-content-fn-2\">\n<p>Chain-of-thought prompting asks the model to break its reasoning into explicit steps, which tends to produce more reliable and traceable outputs. <a href=\"#user-content-fnref-2\" data-footnote-backref aria-label=\"Back to reference 2\" class=\"data-footnote-backref\">↩</a></p>\n</li>\n</ol>\n</section>",
      "date_published": "2024-11-08T00:00:00.000Z",
      "date_modified": "2024-11-08T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "prompt-engineering",
        "llms"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/doing-less-to-get-more-done/",
      "url": "https://cvde.xyz/writing/doing-less-to-get-more-done/",
      "title": "Doing less to get more done",
      "summary": "The product manager who has a hand in everything is the bottleneck. Stepping back isn't abdication - it's leverage, and it compounds.",
      "content_html": "<p>A product manager who dictates every detail to a skilled team gets worse output, not better. That is the trap. The pull is to have a hand in everything - design tweaks, marketing copy, every line of a spec. Often the highest-leverage move is the opposite: step back and let the people who are better at the task do it.</p>\n<h2 id=\"trust-is-the-mechanism\"><a class=\"heading-anchor\" href=\"#trust-is-the-mechanism\">Trust is the mechanism</a></h2>\n<p>Recall the last time someone micromanaged you. Now recall the last time someone handed you a problem and the freedom to solve it your way. The second one is where good work comes from. Trusting your team isn’t a courtesy. It unlocks output you couldn’t have produced yourself, and it buys back your most constrained resource - time.</p>\n<h2 id=\"two-worked-examples\"><a class=\"heading-anchor\" href=\"#two-worked-examples\">Two worked examples</a></h2>\n<p><strong>Design.</strong> A new feature hits the design phase, and you have ideas.</p>\n<ul>\n<li><em>Option A:</em> spend hours on detailed wireframes, hand them over with instructions.</li>\n<li><em>Option B:</em> run a short kickoff that frames the problem and the constraints, then let the designers solve it.</li>\n</ul>\n<p>Option B wins. You tap their expertise and surface solutions you wouldn’t have reached. Get the brief to 80–90% clarity, then trust the implementer with the rest.</p>\n<p><strong>Development.</strong> You want to see progress, so you’re tempted to check in constantly.</p>\n<ul>\n<li><em>Option A:</em> daily stand-ups, frequent reports, regular pings.</li>\n<li><em>Option B:</em> clear expectations up front, defined milestones, and trust that the team raises blockers.</li>\n</ul>\n<p>Option B again. Uninterrupted developers reach flow and ship faster. The autonomy-to-performance relationship isn’t linear - the benefits compound as people feel more invested.</p>\n<h2 id=\"this-is-not-abdication\"><a class=\"heading-anchor\" href=\"#this-is-not-abdication\">This is not abdication</a></h2>\n<p>The skill is balance, not absence. You stay available to give direction, clear roadblocks, and keep everyone rowing the same way. One-pagers, light wireframes, and kickoff calls do most of this work: they align the team while leaving room for people to apply their own judgement.</p>\n<p>A useful lens here is leverage / neutral / overhead. Sort each task by how much your personal input multiplies the output:</p>\n<ul>\n<li><strong>Leverage</strong> - your input maximally boosts the result. Do it.</li>\n<li><strong>Neutral</strong> - needs doing, doesn’t multiply. Delegate it.</li>\n<li><strong>Overhead</strong> - drains resources without proportional gain. Automate or kill it.</li>\n</ul>\n<p>Next time you feel the urge to dive into something outside your wheelhouse, ask whether it’s the best use of your time, or whether someone else should own it. Often the most powerful thing a manager can do is give the team the space to surprise you.</p>",
      "date_published": "2024-10-31T00:00:00.000Z",
      "date_modified": "2024-10-31T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "product-management",
        "leadership",
        "delegation"
      ]
    },
    {
      "id": "https://cvde.xyz/writing/prompt-frameworks/",
      "url": "https://cvde.xyz/writing/prompt-frameworks/",
      "title": "Prompting with frameworks the model already knows",
      "summary": "Structure your prompts the way a consultant structures a brief, then borrow a framework the model was trained on. You give it a running start instead of describing every step.",
      "content_html": "<p>Talking to a model is like briefing someone with no memory and no context. When you talk to a colleague, you supply background, explain what you want, and pitch your language to them. A model needs the same, made explicit. That’s what structured prompting is.</p>\n<h2 id=\"structure-with-mece\"><a class=\"heading-anchor\" href=\"#structure-with-mece\">Structure with MECE</a></h2>\n<p>I’m not usually one for acronyms, but MECE - Mutually Exclusive, Collectively Exhaustive - earns its place. Break the prompt into distinct, non-overlapping parts that together cover everything the model needs:</p>\n<ol>\n<li><strong>Context</strong> - the background and the scene</li>\n<li><strong>Goals</strong> - what you’re trying to achieve</li>\n<li><strong>Audience</strong> - who it’s for</li>\n<li><strong>Style and tone</strong> - how it should read</li>\n<li><strong>Rules</strong> - specific dos and don’ts</li>\n<li><strong>Additional information</strong> - anything that doesn’t fit elsewhere</li>\n</ol>\n<p>This isn’t about tidiness. It’s a roadmap: each part tells the model something it needs and nothing it doesn’t.</p>\n<h2 id=\"front-load-the-important-context\"><a class=\"heading-anchor\" href=\"#front-load-the-important-context\">Front-load the important context</a></h2>\n<p>A useful quirk: many models weight information that appears earlier in the prompt more heavily. If you have context that’s critical to the task, put it up front rather than burying it at the end. Anthropic’s documentation has a strong <a href=\"https://github.com/anthropics/courses/blob/master/prompt_engineering_interactive_tutorial/README.md\" class=\"external-link\" rel=\"noopener noreferrer\" target=\"_blank\">prompt-engineering course<span><svg class=\"external-link-icon\" viewBox=\"0 0 24 24\" width=\"14\" height=\"14\" fill=\"none\" stroke=\"currentColor\" stroke-width=\"1.75\" stroke-linecap=\"round\" stroke-linejoin=\"round\" aria-hidden=\"true\" focusable=\"false\"><path d=\"M7 17 17 7\"></path><path d=\"M8 7h9v9\"></path></svg></span></a> worth working through.</p>\n<p>Models also handle structure well, so XML-style tags can help them parse a document and let you reference sections precisely in follow-ups:</p>\n<div class=\"expressive-code\"><link rel=\"stylesheet\" href=\"/_astro/ec.yl275.css\"/><script type=\"module\" src=\"/_astro/ec.0vx5m.js\"></script><figure class=\"frame\"><figcaption class=\"header\"></figcaption><pre data-language=\"xml\"><code><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#A0A0A0;--1:#24292E\">&lt;</span><span style=\"--0:#FFC799;--1:#1E7734\">document</span><span style=\"--0:#A0A0A0;--1:#24292E\">&gt;</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--1:#24292E\">  </span></span><span style=\"--0:#A0A0A0;--1:#24292E\">&lt;</span><span style=\"--0:#FFC799;--1:#1E7734\">section</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#A0A0A0;--1:#6F42C1\">name</span><span style=\"--0:#FFFFFF;--1:#24292E\">=</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;overview&quot;</span><span style=\"--1:#24292E\"><span style=\"--0:#A0A0A0\">&gt;</span><span style=\"--0:#FFFFFF\">This is the overview.</span><span style=\"--0:#A0A0A0\">&lt;/</span></span><span style=\"--0:#FFC799;--1:#1E7734\">section</span><span style=\"--0:#A0A0A0;--1:#24292E\">&gt;</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--1:#24292E\">  </span></span><span style=\"--0:#A0A0A0;--1:#24292E\">&lt;</span><span style=\"--0:#FFC799;--1:#1E7734\">section</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#A0A0A0;--1:#6F42C1\">name</span><span style=\"--0:#FFFFFF;--1:#24292E\">=</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;features&quot;</span><span style=\"--1:#24292E\"><span style=\"--0:#A0A0A0\">&gt;</span><span style=\"--0:#FFFFFF\">A list of features.</span><span style=\"--0:#A0A0A0\">&lt;/</span></span><span style=\"--0:#FFC799;--1:#1E7734\">section</span><span style=\"--0:#A0A0A0;--1:#24292E\">&gt;</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--1:#24292E\">  </span></span><span style=\"--0:#A0A0A0;--1:#24292E\">&lt;</span><span style=\"--0:#FFC799;--1:#1E7734\">section</span><span style=\"--0:#FFFFFF;--1:#24292E\"> </span><span style=\"--0:#A0A0A0;--1:#6F42C1\">name</span><span style=\"--0:#FFFFFF;--1:#24292E\">=</span><span style=\"--0:#99FFE4;--1:#032F62\">&quot;pricing&quot;</span><span style=\"--1:#24292E\"><span style=\"--0:#A0A0A0\">&gt;</span><span style=\"--0:#FFFFFF\">Pricing information.</span><span style=\"--0:#A0A0A0\">&lt;/</span></span><span style=\"--0:#FFC799;--1:#1E7734\">section</span><span style=\"--0:#A0A0A0;--1:#24292E\">&gt;</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#A0A0A0;--1:#24292E\">&lt;/</span><span style=\"--0:#FFC799;--1:#1E7734\">document</span><span style=\"--0:#A0A0A0;--1:#24292E\">&gt;</span></div></div></code></pre><div class=\"copy\"><div aria-live=\"polite\"></div><button title=\"Copy to clipboard\" data-copied=\"Copied!\" data-code=\"<document>  <section name=&quot;overview&quot;>This is the overview.</section>  <section name=&quot;features&quot;>A list of features.</section>  <section name=&quot;pricing&quot;>Pricing information.</section></document>\"><div></div></button></div></figure></div>\n<h2 id=\"borrow-a-framework-the-model-knows\"><a class=\"heading-anchor\" href=\"#borrow-a-framework-the-model-knows\">Borrow a framework the model knows</a></h2>\n<p>Models are trained on enormous amounts of text, including well-known frameworks. Use that. Framing your request inside a framework the model already understands gives it a running start - like asking a chef for a specific dish rather than describing every ingredient and step. The model spends its effort on the unique parts of your request, not on reconstructing the structure.</p>\n<p>Ask for a business analysis and reference SWOT, and the model already knows the shape; it can go straight to applying it to your case.</p>\n<h2 id=\"a-worked-example\"><a class=\"heading-anchor\" href=\"#a-worked-example\">A worked example</a></h2>\n<p>Suppose you want help drafting a post on sustainable fashion:</p>\n<div class=\"expressive-code\"><figure class=\"frame\"><figcaption class=\"header\"></figcaption><pre data-language=\"markdown\"><code><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Context: A fashion blogger writing about sustainable fashion for an</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--0:#FFFFFF;--1:#24292E\">         </span></span><span style=\"--0:#FFFFFF;--1:#24292E\">eco-lifestyle site.</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Goals:</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">1.</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Explain the environmental impact of fast fashion.</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">2.</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Give practical tips for building a sustainable wardrobe.</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">3.</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Highlight innovative sustainable brands.</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Audience: Environmentally conscious millennials interested in fashion.</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Style &amp; Tone: Conversational and informative. British English.</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Rules:</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">-</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Avoid jargon.</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">-</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Include at least three actionable tips.</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">-</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Mention no more than five brands.</span></div></div><div class=\"ec-line\"><div class=\"code\">\n</div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#24292E\">Additional Information:</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">-</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Use the &quot;4 Rs&quot; framework (Reduce, Reuse, Recycle, Reimagine) as the</span></div></div><div class=\"ec-line\"><div class=\"code\"><span class=\"indent\"><span style=\"--0:#FFFFFF;--1:#24292E\">  </span></span><span style=\"--0:#FFFFFF;--1:#24292E\">main sections.</span></div></div><div class=\"ec-line\"><div class=\"code\"><span style=\"--0:#FFFFFF;--1:#AE4B07\">-</span><span style=\"--0:#FFFFFF;--1:#24292E\"> Include a short section on circular fashion.</span></div></div></code></pre><div class=\"copy\"><div aria-live=\"polite\"></div><button title=\"Copy to clipboard\" data-copied=\"Copied!\" data-code=\"Context: A fashion blogger writing about sustainable fashion for an         eco-lifestyle site.Goals:1. Explain the environmental impact of fast fashion.2. Give practical tips for building a sustainable wardrobe.3. Highlight innovative sustainable brands.Audience: Environmentally conscious millennials interested in fashion.Style &amp; Tone: Conversational and informative. British English.Rules:- Avoid jargon.- Include at least three actionable tips.- Mention no more than five brands.Additional Information:- Use the &quot;4 Rs&quot; framework (Reduce, Reuse, Recycle, Reimagine) as the  main sections.- Include a short section on circular fashion.\"><div></div></button></div></figure></div>\n<p>The model knows who it’s writing for, what you want, and - via the “4 Rs” - a familiar structure to hang the content on.</p>\n<h2 id=\"the-bottom-line\"><a class=\"heading-anchor\" href=\"#the-bottom-line\">The bottom line</a></h2>\n<p>Effective prompting is clear communication. Structure your input and borrow a known framework, and you make the model’s job easier and your output better. A model is a tool, not a mind reader. The clearer and more structured the input, the better the result - it’s closer to programming than to conversation.</p>",
      "date_published": "2024-10-20T00:00:00.000Z",
      "date_modified": "2024-10-20T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ],
      "tags": [
        "ai",
        "prompt-engineering",
        "best-practices"
      ]
    },
    {
      "id": "https://cvde.xyz/work/brand-ninja-repositioning/",
      "url": "https://cvde.xyz/work/brand-ninja-repositioning/",
      "title": "Brand Ninja: repositioning a $29 tool into a $1M ARR enterprise business",
      "summary": "Killed the consumer tier, set a $999 contract floor, and rebuilt the ICP and pricing around brand teams. $5k MRR to $1M ARR in 22 months.",
      "content_html": "<h2 id=\"context\"><a class=\"heading-anchor\" href=\"#context\">Context</a></h2>\n<p>I joined Brand Ninja in October 2023 as Head of Product &amp; AI, reporting to the CEO on a team of seven. It was a pre-seed-stage AI content platform with a real product and a broken business.</p>\n<p>The shape of the problem was familiar. Brand Ninja sold an AI tool that wrote on-brand marketing content. It was packaged as a $29/month consumer subscription, the default price for anything that looks like a productivity app. Sign-ups were healthy. MRR was about $5k. The economics underneath were not.</p>\n<p>At $29 a month, the people who paid were freelancers, solo marketers, and small-business owners. They churned fast: they tried it for a campaign, got what they needed, and left. Support load per dollar was high. The roadmap was being pulled toward consumer features that would never command a higher price. And the most valuable thing the company had built, a system that learned and reproduced a specific brand’s voice, was being given away for the price of a streaming subscription.</p>\n<p>The signal that mattered was in the sales inbox, not the dashboard. A handful of inbound conversations were coming from brand teams at real companies: people responsible for keeping dozens of writers and agencies on-voice across markets. They weren’t asking about the $29 plan. They were asking whether it could handle their brand guidelines, their approval workflow, their seat count. They had budget. They had a problem that recurred every day. And the product, as priced, was actively repelling them. A $29 line item reads as a toy to a procurement team.</p>\n<p>The company was optimising for the wrong customer. The job was to work out who actually paid, and rebuild around them.</p>\n<h2 id=\"decision\"><a class=\"heading-anchor\" href=\"#decision\">Decision</a></h2>\n<p>The decision was to stop selling to consumers entirely.</p>\n<p>That meant three moves, in order, and the order mattered.</p>\n<p><strong>Kill the consumer tier.</strong> Not discount it, not grandfather it indefinitely - remove it as the front door. The $29 plan was an anchor that defined the product to every customer and every engineer who worked on it. While it existed, every pricing conversation started from $29 and negotiated down. The fastest way to change what Brand Ninja was for was to change what you could buy.</p>\n<p><strong>Set a contract floor at $999.</strong> A minimum, not a list price. The number does two things. It filters: anyone for whom $999 is a hard decision is not the customer we wanted, and removing them from the funnel is a feature, not a loss. It reframes: at four figures, a brand team evaluates Brand Ninja against an agency retainer or a headcount, not against a subscription they’ll forget to cancel. The price <em>is</em> the positioning.</p>\n<p><strong>Rebuild the ICP and the pricing around brand teams.</strong> The ideal customer became an organisation with a defined brand voice, multiple people producing content against it, and a cost or quality problem keeping it consistent. Pricing moved to annual contracts on seats and usage, with the $999 floor as the entry point and six-figure deals at the top.</p>\n<p>The case I made to the CEO was not “charge more.” It was that the consumer business and the enterprise business were two different companies wearing the same product, and we could only resource one of them well. The consumer one had a ceiling we could already see. The enterprise one had brand teams writing in with budget we were turning away. Killing the tier most founders would protect was the unlock. Keeping a small, churny, low-margin revenue line alive was costing us the larger business behind it.</p>\n<aside class=\"callout callout--note\" role=\"note\" data-astro-cid-pyumqe5w> <p class=\"callout__label\" data-astro-cid-pyumqe5w> <svg width=\"1em\" height=\"1em\" aria-hidden=\"true\" data-astro-cid-pyumqe5w=\"true\" data-icon=\"lucide:info\">   <symbol id=\"ai:lucide:info\" viewBox=\"0 0 24 24\"><g fill=\"none\" stroke=\"currentColor\" stroke-linecap=\"round\" stroke-linejoin=\"round\" stroke-width=\"2\"><circle cx=\"12\" cy=\"12\" r=\"10\"/><path d=\"M12 16v-4m0-4h.01\"/></g></symbol><use href=\"#ai:lucide:info\"></use>  </svg> <span data-astro-cid-pyumqe5w>Note</span> </p> <div class=\"callout__body\" data-astro-cid-pyumqe5w> <p>Pricing is the most honest positioning statement a company makes. A $29 price tag tells the market you are a consumer toy, whatever the deck says. The $999 floor said the opposite before a single sales call started, and it filtered the funnel down to people who could actually buy.</p> </div> </aside>\n<h2 id=\"what-i-built\"><a class=\"heading-anchor\" href=\"#what-i-built\">What I built</a></h2>\n<p>A price change is a slide. A repositioning is a system. Most of the work was building the machinery that made the new price defensible and the new customer well-served.</p>\n<p><strong>The ICP and the qualification.</strong> I rewrote who we sold to and how we recognised them: the firmographics, the buying triggers, the questions that separated a brand team with a real consistency problem from a curious individual. Sales stopped chasing volume and started chasing fit. The funnel got smaller and the close rate went up.</p>\n<p><strong>The pricing architecture.</strong> Annual contracts, seat-and-usage tiers, the $999 floor as the published entry point, and a clear path from there to enterprise. The structure had to make the first $999 deal and the first $100k deal feel like the same product priced honestly for different scale, not a bait-and-switch.</p>\n<p><strong>The enterprise product surface.</strong> Brand teams don’t buy a text box. They buy control. The roadmap shifted to what an enterprise actually evaluates: managed brand-voice profiles, multi-seat workspaces, approval and review flow, and the security and account posture a procurement team checks before signing. The underlying voice engine was the moat; the job was to wrap it in something an organisation could adopt and govern.</p>\n<p><strong>The go-to-market motion.</strong> The consumer model was self-serve sign-up. The enterprise model is a sales conversation, a pilot against the customer’s own brand, and a contract. I built the motion to match. The pilot that proved on-voice output on the customer’s real content was what closed deals. It turned an abstract claim into a result they could see on their own brand.</p>\n<p>The voice engine itself kept advancing in parallel: <span class=\"term\" data-term=\"embedding space\" data-def=\"A high-dimensional space a model maps things into, where direction and distance encode meaning, so similar concepts sit near each other.\" tabindex=\"0\" aria-describedby=\"term-def-1\" title=\"A high-dimensional space a model maps things into, where direction and distance encode meaning, so similar concepts sit near each other.\" data-note=\"A learned concept is a location here; steer a generator towards that point and you get a picture of how the model holds it.\">embedding</span>-based optimisation of generated output against brand-voice attributes, plus a separate ML video pipeline. Those are their own stories. Here they matter only as the reason a brand team would pay enterprise money. The product genuinely held a brand’s voice better than the alternatives, and the repositioning is what let the company charge for it.</p>\n<h2 id=\"outcome\"><a class=\"heading-anchor\" href=\"#outcome\">Outcome</a></h2>\n<p>Brand Ninja went from roughly $5k MRR to $1M ARR in 22 months - 3.3x year on year.</p>\n<p>The revenue came from 48 brands, not 4,800 individuals. The customer list included Sportsbet, the NBL, the Australian Grand Prix, and the NBA. We closed $100k contracts with Zscaler and the NBA. Those deals were structurally impossible under a $29 plan; no enterprise buyer evaluates a consumer subscription for a six-figure need.</p>\n<p>The second-order effects mattered as much as the number. Churn fell, because annual enterprise contracts don’t behave like month-to-month consumer ones. Support load per dollar dropped. The roadmap got clearer: every feature decision now had one customer to answer to instead of two in tension. And the team of seven could actually serve the business it had, because it was one business now.</p>\n<h2 id=\"what-id-do-differently\"><a class=\"heading-anchor\" href=\"#what-id-do-differently\">What I’d do differently</a></h2>\n<p>I’d kill the consumer tier sooner. We spent longer than we needed protecting a revenue line we already knew had a ceiling, because killing live revenue feels reckless even when the analysis says it’s costing more than it makes. The opportunity cost was larger than the MRR we were nervous about losing: every enterprise conversation half-served while we maintained a consumer product in parallel. The decision was right; the timing was timid.</p>\n<p>I’d also set the $999 floor with more deliberate testing rather than reasoning my way to it. The number worked; it filtered well and anchored well. But I set it largely from judgement and a read of the market. With a few structured pricing conversations against real buyers before committing, I’d have known whether the floor should have been higher. My instinct now is that for the brands we ended up serving, it probably should have been. We left margin at the entry point by <span class=\"term\" data-term=\"anchoring\" data-def=\"The bias where the first number you hear becomes the reference point you reason from, even after you learn it was wrong.\" tabindex=\"0\" aria-describedby=\"term-def-2\" title=\"The bias where the first number you hear becomes the reference point you reason from, even after you learn it was wrong.\" data-note=\"Why a preliminary figure in a readout is a trap: the caveat is processed as language, the number as fact, and the fact wins.\">anchoring</span> to a number that felt safe rather than one we’d pressure-tested.</p>\n<p>And I’d have built the enterprise security and account posture earlier in the motion rather than assembling it under deal pressure. The $100k contracts pulled hard on procurement, security review, and account controls: predictable requirements for that buyer that we treated as deal-specific scrambles. Knowing the ICP was enterprise from the start, that work was foreseeable. Building it ahead of demand would have shortened the sales cycle on the deals that mattered most.</p>",
      "date_published": "2023-01-01T00:00:00.000Z",
      "date_modified": "2023-01-01T00:00:00.000Z",
      "authors": [
        {
          "name": "Callum van den Enden",
          "url": "https://www.linkedin.com/in/calvanden"
        }
      ]
    }
  ]
}