Hail Mary: Why domain knowledge cannot be extracted from experts

Dieser Blogpost ist auch auf Deutsch verfügbar

TL;DR

AI approaches like BMAD assume that domain knowledge is fully available and can be extracted directly.
Cognitive science shows, however, that much of this knowledge is implicit, socially embedded, and emerges over time.
Structured interviews cannot capture these forms of knowledge and crowd out important sense-making processes.
Proven methods therefore rely on iteration, shared practice, and continuous learning rather than one-time elicitation.
The core shift is that humans adapt to the machine, not the other way around.

This post is part of a series.

Part 1: Speed vs. Skill
Part 2: AI and Elaboration: Which Coding Patterns Build Understanding?
Part 3: Understanding AI Coding Patterns Through Cognitive Load Theory
Part 4: Hail Mary: Why domain knowledge cannot be extracted from experts (this post)

This is the fourth post in “Developing with AI Through the Cognitive Lens,” a series exploring how AI tools affect the way programmers and development teams learn, work, and build expertise. Drawing on cognitive psychology research, this series examines what happens when we delegate cognitive work to AI. In this post, the lens widens beyond coding to requirements engineering. The goal of the series isn’t to arrive at a predetermined verdict on AI, but to follow the cognitive evidence wherever it leads. Sometimes, as in this post, it leads to fundamental skepticism about a whole class of tools.

In the previous posts of this series, we focused on what cognitive psychology can tell us about beneficial and harmful ways to use AI for coding. But coding is only a small part of software development. Increasingly, AI agents are being put to use for eliciting requirements. Marketing claims for tools like BMAD speak of hours instead of weeks spent on requirements engineering. Can BMAD and other tools for spec-driven development really replace established methods of learning about requirements? Cognitive psychology, it turns out, has quite a lot to say about this.

How BMAD and other SDD tools work

Tools like BMAD promise to drastically speed up the requirements engineering process. Instead of weeks of workshops, interviews, and iterative refinement, an AI agent guides stakeholders through a structured elicitation process and produces a comprehensive specification document in a matter of hours. Buildmode.dev, one of the more prominent advocates of this approach, claims to reduce requirements discovery from “2–3 weeks to 6 hours.”

The workflow usually starts with a product idea or a rough vision. Acting in the role of a business analyst, an AI agent, called Mary in the case of BMAD, then interviews the stakeholder or domain expert, asking questions about users, goals, constraints, and technical requirement. It turns the answers into a specification document that serves as the blueprint for implementation. In more ambitious setups like BMAD, additional agents decompose this specification into epics, stories, and tasks, which yet another layer of agents implements. The human moves from doing the work to providing domain knowledge and reviewing the output.

All this can sound very appealing, especially to anyone who has sat through lengthy requirements workshops that seemed to produce little more than a long list of assumptions dressed up as decisions. If an agent can do the same job faster and more systematically, why wouldn’t you use it?

The extraction paradigm

What these tools have in common is an implicit assumption about the nature of domain knowledge: that it exists in the heads of stakeholders and domain experts, waiting to be retrieved. The right questions, asked in the right order, will bring it to the surface. The business analyst agent has the role of a skilled interviewer, it is systematic, thorough, patient, and, in a way, relentless. With this assumption, the tools follow a paradigm of extraction. Knowledge is seen as a resource to be mined, the human as the deposit.

This assumption is so deeply embedded in the workflow that it rarely gets stated explicitly. But occasionally it does. The Buildmode.dev post mentioned above describes their approach as replacing “iterative discovery”. Iterative discovery implies that requirements emerge over time, through feedback, building and learning. Replacing it means believing that the knowledge is already there, fully formed, and only needs to be drawn out efficiently.

Most tools stop here and leave the human as an imperfect but tolerable source. Some go further. Ouroboros, an agent framework whose old README stated bluntly that “HUMANS ARE NOT RATIONAL,” takes the logical next step: if humans cannot reliably articulate what they know, the problem is not the extraction technique. The problem is the human.

What cognitive science tells us

Michael Polanyi’s observation that “we can know more than we can tell” is probably the most concise summary of the problem. In [1]: The Tacit Dimension (1966), Polanyi argues that much of what experts know is not consciously accessible and can thus not be articulated. This is implicit or tacit knowledge, in contract to explicit knowledge.

When an experienced domain expert describes her process to an interviewer, she will inevitably leave things out because she does not know she knows them. Some good candidates for this are the steps she always takes when dealing with a particular edge case, or the implicit check she runs before escalating an issue. There is a good chance that neither of these will surface in a structured interview, because they have long since been internalised below the threshold of conscious reflection.

The SECI model described by [2]: Nonaka (1991) adds another dimension. Tacit knowledge is not merely hard to articulate, it’s fundamentally social. Knowledge creation in organisations happens through cycles of socialisation, externalisation, combination, and internalisation. These processes require a shared context, trust, and time.

The important point is that the techniques that actually work for making tacit knowledge explicit are not interview-based. They rely on shared practice and direct observation. For instance, a co-worker who watches over someone’s shoulder and asks “why did you do that just now?” is far more likely to surface implicit knowledge than any structured questioning. An AI agent conducting a text-based interview is, by definition, outside this social fabric.

There is a third dimension that rarely enters the requirements engineering discussion: time. Graham Wallas’s classic model of creative cognition [3] identified incubation as essential to insight. Incubation refers to the fact that the mind continues working on problems below the threshold of conscious attention. [4]: Cai et al. (2009) provided empirical support for this in a study showing that REM sleep enhances the ability to integrate information and recognise non-obvious connections.

What does this mean for requirements work? First and foremost, we need to accept that some of the most valuable domain insights cannot be scheduled. They arrive when an expert is in the shower, or wakes up at 3am with sudden clarity about why the current process is broken. A six-hour interview session with an AI agent has no room for incubation. It does not compress this phase, it eliminates it.

Taken together, these three dimensions of evidence point to the same conclusion: the extraction paradigm misunderstands the nature of what it is trying to extract. Domain knowledge is not a static deposit waiting to be mined. It’s tacit, social, and temporally distributed. Any elicitation method that ignores these properties will miss the knowledge that matters most.

How to make tacit knowledge explicit

This is not a new problem, and the software development community has developed methods that take it seriously. Domain Storytelling, for instance, uses collaborative narrative sessions where domain experts tell stories about their work while a facilitator captures them in a visual notation to surface the language, the actors, and the workflows that actually matter. The method works because it creates a shared situation: experts and developers are in the same room, the story unfolds in real time, and misunderstandings become visible immediately. Similarly, Event Storming brings together developers and domain experts around a shared timeline of domain events, relying on the productive friction of different perspectives colliding to reveal what no single participant could have articulated alone.

These methods share a common assumption that stands in direct contrast to the extraction paradigm: domain knowledge exists — but it becomes explicit through multiple conversations and iterations, not in advance of them.

Domain-Driven Design is sometimes misread as an argument for thorough upfront domain modelling before implementation begins. Martin Fowler’s foreword to Eric Evans’s original book [5] corrects this directly. Fowler writes that powerful domain models evolve over time, and that even the most experienced modellers find their best ideas emerge after the initial releases of a system. Domain-driven design was never a license for big upfront design. It has always been meant as a method of sustained, iterative engagement with the domain throughout the life of a project. A lot of important knowledge arrives late, earned through the experience of building and using the system.

This is precisely what a six-hour elicitation session cannot buy.

Conclusion

The logic of the extraction paradigm, followed to its conclusion, does not stop at better interviews. Ouroboros, the agent framework I mentioned earlier, makes the next step explicit. The problem, it concludes, is not the extraction technique, it is the human. Its proposed solution is that it “fixes the human, not the machine.” The improved human is one who communicates more clearly, more consistently, more completely. One who, in short, is easier for a machine to process.

This is what the Reverse Centaur looks like in practice. The classical centaur (human judgment directing machine capability) has quietly inverted. The machine sets the agenda, defines the categories, asks the questions. The human’s job is to fit into the structure the machine provides. BMAD does not do this through coercion. It does it through the appearance of helpfulness: a guided process, structured questions, a clear output. You aren’t asked to think differently. You’re simply led through a workflow that rewards machine-readable answers and has no place for the ambiguous, the half-formed, or the tacit that cannot be articulated yet.

This inversion is not an accident. It’s the natural endpoint of a paradigm that treats knowledge as a resource to be extracted rather than a capability to be developed. And that paradigm did not begin with BMAD or Ouroboros. Large language models are themselves its most ambitious expression: trained on the accumulated written knowledge and culture of humanity, compressed into a statistical model, without the consent or compensation of those who produced it. BMAD and Ouroboros are not outliers. They are the same logic applied one step further, from extracting human knowledge into a model, to extracting domain knowledge into a specification, to gradually reshaping the human who provides it into something the model can more readily use.

The question this raises is not primarily technical. It is about the direction of adaptation. Technology has always changed how people work and think. That’s old news. What is worth noticing is when the adaptation runs in one direction only: when the human is expected to become more legible to the machine, while the machine is not expected to become more capable of meeting the human where she is.

Polanyi, M. (1966). The Tacit Dimension. Doubleday. ↩︎
Nonaka, I. (1991). The knowledge-creating company. Harvard Business Review, 69(6), 96–104. ↩︎
Wallas, G. (1926). The Art of Thought. Harcourt Brace. ↩︎
Cai, D. J., Mednick, S. A., Harrison, E. M., Kanady, J. C., & Mednick, S. C. (2009). REM, not incubation, improves creativity by priming associative networks. Proceedings of the National Academy of Sciences, 106(25), 10130–10134. ↩︎
Evans, E. (2003). Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley. (Foreword by Martin Fowler) ↩︎