The Right Kind of Hard

Dieser Blogpost ist auch auf Deutsch verfügbar

TL;DR

Good load: SDD frameworks like BMAD shine in the thinking phase—they force you to sharpen vague requirements and decompose problems into manageable pieces.
Bad load: During the building phase, things tip over. Short stories balloon into 1,000-line specs, and building turns into documenting. Tacit knowledge can’t be front-loaded into specs: Intuition and pattern recognition emerge through doing—not through writing things down ahead of time.
Thick specs ≠ shared understanding: “Shared documents are not shared understanding” (Jeff Patton). A story is a prompt for conversation, not a handoff document.
Pragmatic middle ground: Lightweight specs for thinking, tight feedback loops for building, safety nets from tests and linters—not ever-thicker documents.

Over the past few months, spec-driven development has often helped me produce more – and better – AI-generated output than before. At the same time, I’ve also been noticeably more exhausted after many sessions.

Not despite the structure, but often because of it.

At first, that threw me off. Spec-driven development initially promises exactly the right thing: less chaos, more clarity, better results with agents. And that’s true, too – just not consistently.

By now, my takeaway is pretty clear: the same structure that helps me during the thinking phase can slow me down during the build phase. Then I’m no longer building. I’m managing, phrasing, and documenting. And that’s exactly where the hidden costs of spec-driven development come from for me. I’m less interested in the big, abstract question of whether SDD is good or bad. I’m more interested in the practical one: When does this way of working create real clarity – and when does it push thinking to the wrong place?

The right kind of hard

Not everything that’s hard is automatically bad. Some development work is hard because the problem itself is hard. Another part is hard because the way we work creates unnecessary friction.

That difference is what has made spec-driven development interesting to me. With a framework like BMAD, I can think through an initiative more cleanly than I used to. It forces me to make fuzzy requirements explicit. I have to name constraints. I have to cut scope. I have to see gaps before I cover them up with generated code. That’s helpful.

But at the same time, I’m noticing: this structure doesn’t feel equally useful in every phase. Sometimes it reduces complexity. Sometimes it just relocates it. And that difference isn’t theoretical for me – I feel it very concretely in my focus, my pace, and my energy.

That’s why I no longer want to judge SDD in a blanket way. The real question, for me, is whether it’s the right kind of hard right now. Does the structure help me understand? Or does it force me to spell out things that I could clarify much more easily while building, directly on the living thing?

Why SDD can help during the thinking phase

Spec-driven development solves a real problem when you’re starting from scratch. Anyone working with agents quickly learns that good context isn’t a nice-to-have. Well-structured context contributes massively to the solution.

If I capture requirements, rough architecture, UX guardrails, and initial story cuts cleanly, a lot becomes easier later. Diffuse thoughts end up in clearer buckets. The problem gets smaller because I’ve already sorted it once. And later, the agent has something to hold on to—instead of starting from zero for every task.

This helps me a lot, especially early in a project. Not because I never thought things through before, but because the framework forces me to really work out loose thoughts and write them down properly.

What is this or that feature actually supposed to do, and for whom? Which constraints matter? Where do I deliberately want to decide nothing yet? For that kind of thinking, some structure is useful.

If I described this using Daniel Westheide’s cognitive-load lens, it would be like this: in this phase, the framework reduces complexity. It breaks a big problem into smaller, workable parts.

That’s the right kind of hard. Or at least effort that feels meaningful to me.

When building devolves into documentation work

For me, the problem often starts exactly where things should become productive: implementation.

Then I’m no longer sitting in front of the actual problem. I’m sitting in front of a story spec, writing acceptance criteria for things that might never happen. I’m describing behavior for edge cases that I probably would have clarified in two iterations with the agent if I were building directly. I’m trying to press intuition into text upfront.

I notice this most clearly when generating a story from an epic. Before that, I’m still talking about functional requirements, non-functional requirements, and rough story skeletons – who are we building this for, what should happen, how will we know it’s enough. Instead, technical details and code examples creep in very quickly.

One concrete example made this very clear. A story that was actually pretty simple – “Create User Profile” – turned into a massive package: 19 acceptance criteria, database fields, component mapping, dependency notes, file lists, test checklists, and later even review follow-ups. 1,070 lines of Markdown.

Formally, it was still called a story. In practice, it was half specification, half implementation plan, and half QA backlog. Yes, exactly – that’s three halves.

That shifts the whole mode of thinking. I’m no longer talking about a story that makes sense to humans; I’m discussing a pre-decided implementation. The feedback loop gets longer because suddenly there are things on the table that didn’t need to be decided at that point. It gets even more problematic because those code snippets aren’t neutral. They prime the model that later implements the story. If I ever want to refactor the code or regenerate it with a better model, the story already contains a technical bias from an earlier planning step. That pre-commitment has long since seeped into the material.

Why not everything can be made explicit

Part of my discomfort has to do with something else: some of what makes development work good can’t be cleanly translated into language upfront.

The best way I can describe it is through driving: when you’ve been driving for a while, you develop a sense for what will happen next. You see brake lights flicker three cars ahead. You ease off the gas, move into the left lane, pass, react proactively. It happens fast, almost without thinking. Coding feels similar to me. With some experience, you often sense early that a decision will cause problems three steps down the road. You adjust something before it breaks. Not because you could already articulate everything, but because you’re immersed in the thing itself.

Spec-driven frameworks try to close a gap with documents. That’s understandable. But part of that gap isn’t simply a documentation problem. It’s a problem of tacit knowledge. While building, we often know more than we can precisely say in advance. We recognize patterns. We notice something looks off before we can explain why.

Trying to press that experience fully into specs costs a lot of energy. And that’s the real loss for me: I have to switch from a mode of recognizing to a mode of explaining before I’ve even started building anything.

Why thick specs don’t create shared understanding

One thought has stuck with me since a product owner training in 2017 with Jeff Patton and Jeff Gothelf: a story is not a document. It’s a token for a conversation.

First Card, then Conversation, then Confirmation. Not the other way around.

Acceptance criteria should confirm that you understood each other. They shouldn’t replace understanding with more and more upfront detail. That’s exactly where, for me, SDD tips over during the build phase. I write a document. The agent consumes it. There’s no real conversation – just a handoff.

And that removes something stories were originally meant to provide.

Four images: A team believes they agree, then realizes they have different mental models, works through the problem together, and only then reaches real understanding. — Real understanding only emerges once everyone grapples with the same problem. Inspired by Jeff Patton.

I experienced exactly that in a client project last year. Within the team, I invested a lot of effort into detailed user stories (specs). No BMAD involved, and with genuinely good intentions: a solid first draft, including lots of acceptance criteria, so the team could hit the ground running instead of starting from scratch. But the team read the specs very differently. To them, it wasn’t a draft for discussion – it was already a finished handoff document. Everything’s in there already, right? So there was a lot of discussion, but not about the thing we needed to build; it was about the scope and shape of the stories.

The result: on one side, someone wondering why they’re getting only negative feedback. On the other side, a team wondering why they should still have a say if everything already seems decided.

The stories stayed that thick for quite a while. And the review load grew with them. Because from a thick story, an agent quickly produces a huge implementation plan including database schema, component structure, and so on. And at some point, nobody reads it properly anymore. That’s human. But it undermines exactly what the stories were supposed to ensure.

“Shared documents are not shared understanding”

Patton nails it for me. The story gets thicker. I feel briefly safer. But that doesn’t mean something better will be built later. The document grows, but understanding doesn’t automatically grow with it.

In the worst case, the opposite happens: I’ve produced a lot of text and convince myself I’ve thoroughly understood the problem – when what I’ve really produced is a clean handoff document.

It looks like clarity. But sometimes it’s just a well-organized illusion of safety.

BMAD has a so-called party mode where different AI experts, each with a different perspective, discuss a story. It simulates a team conversation and sometimes brings surprising insights. But it remains a simulation. And the insights remain inside the AI session. A team that later has to live with the spec and the code learns nothing from it. No shared understanding flows back into people’s heads.

Why cheap iteration changes the equation

Many SDD frameworks invest during planning as if iteration were still expensive. But with agents, it often isn’t anymore. A few quick rounds of “build it like this,” “no, more like that,” “yes, that’s it” are, in many situations, cheaper than spending hours polishing a perfect spec upfront.

Sometimes the better answer is perhaps: start doing sooner and put better safety nets in place. That doesn’t mean planning becomes pointless. It just means the equation shifts. If the next testable prototype is minutes away instead of days, some of the upfront bureaucracy loses its economic advantage.

More spec isn’t automatically the more reasonable form of risk management. Sometimes it’s more expensive thinking work – in more than one sense.

What works better for me

That’s why I didn’t end up at “ditch the framework, just vibe-code.” That would be the wrong overcorrection. For example, I find an agent’s planning mode extremely useful. I just don’t want everything that is temporarily helpful in that thinking space to end up permanently in a story file.

What I do instead – and what I no longer do:

In the thinking phase, I try to secure shared understanding in the team first, and only then capture the relevant information from that in a spec. Not the other way around. I still use a fixed structure for this – either BMAD or something homegrown. But I only write as much spec as actually creates real clarity for me or my team. The story stays short and describes what should happen for whom and how we’ll know it’s enough. Plus a few links to more context if needed. Essentially, progressive disclosure for potentially relevant additional information such as the associated epic, personas, requirements, etc.
In the build phase, I work in short loops close to the code. Ideally in a flow state. I let the agent plan based on the specs, but that specific plan can be ephemeral. It’s a working document, not an artifact. What I no longer do: stuffing the plan—including code examples, database fields, or file lists—into the story.
Decisions are something I deliberately separate from the spec. If architecture-relevant decisions come up during planning or implementation, I write an ADR. If it’s a product decision, it goes back into refinement or an updated story. If it’s a technical trade-off, it lands as a comment in the code or in architecture documentation. Decisions need a visible home, not a line in a 1,000-line Markdown file.
Safeguards I gradually shift from the document into the harness: tests, linters, architecture rules, reviews, and anything that makes recurring problems reliably—and ideally deterministically—visible to an agent. Relevant learnings from that belong in the harness-specific project memory. I like the harness idea so much because it shifts the emphasis: systematically safeguard what repeatedly goes wrong instead of spelling everything out upfront. Less upfront bureaucracy. More robust guardrails.

As much as needed, as little as possible

I don’t think spec-driven development is a dead end. BMAD has a clear place for me: thinking, sorting, slicing, and clarifying. That doesn’t just help me solo; it can also support teams—especially when this step gets shortchanged or quietly skipped.

But when it comes to implementation work, I quickly end up writing more about work than I actually do. And then the story is no longer a prompt for conversation, but a handoff document written behind closed doors.

SDD frameworks noticeably increase output in every phase. But output is not outcome. Or as Patton put it in that product ownership training:

“Minimize Output, Maximize Outcome & Impact.”

What I learned the long way around: the thicker spec was never the goal. It was a symptom that I was looking for safety in the document instead of in the process.