{"version":"https://jsonfeed.org/version/1","title":"INNOQ Blog","home_page_url":"https://www.innoq.com/","feed_url":"https://www.innoq.com/en/blog.json","icon":"https://res.cloudinary.com/innoq/image/asset/favicon-1b518c5b5495aeba34b46ed6a1764b40.svg?_a=BACMTiAE","favicon":"https://res.cloudinary.com/innoq/image/asset/favicon-1b518c5b5495aeba34b46ed6a1764b40.svg?_a=BACMTiAE","items":[{"id":"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/","url":"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/","date_published":"2026-06-08T00:00:00.000+00:00","date_modified":"2026-06-08T10:49:40.654+02:00","author":[{"name":"Daniel Westheide","url":"https://www.innoq.com/en/staff/daniel-westheide/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eIn my blog post \u003ca href=\"https://www.innoq.com/en/blog/2025/07/erst-agil-dann-agentisch/\"\u003e“First Agile, Then Agentic”\u003c/a\u003e, I argued that AI amplifies existing organisational capabilities, and that faster experimentation is one of the potential benefits for well-positioned teams. The more quickly you can implement, the more experiments you can run in the same period of time, and the more you can learn. But faster experimentation is not automatically better. In many contexts, successful product development depends on entirely different factors.\u003c/p\u003e\n\n\u003ch3 id=\"where-experimentation-culture-comes-from\"\u003eWhere experimentation culture comes from\u003c/h3\u003e\n\n\u003cp\u003eIt is worth taking a look at the environment in which the culture of experimentation actually emerged. It developed in the consumer internet companies of the 2000s and 2010s, where very specific conditions held: millions of daily users, a tight connection between product decisions and measurable outcomes, and a business model optimised for engagement and conversion. In that context, controlled experiments made sense. Statistical significance was achievable within days, and the metric being optimised — clicks, purchases, time on site — was closely tied to revenue. Lean Startup, growth hacking, and the DevOps movement all crystallised in this specific environment and were then packaged as universal methodology.\u003c/p\u003e\n\n\u003cp\u003eToday, experimentation culture is most at home wherever the same basic principle applies: social media, digital media offerings, free-to-play games — products whose business model is based on attention and ad revenue. In these contexts, users are not the customers. The advertisers are the customers, and the users are the product. Maximising engagement is not a proxy for user value — it is the actual goal. The tension between company outcomes and user outcomes simply does not arise.\u003c/p\u003e\n\n\u003cp\u003eMelissa Perri argues that good product work finds exactly this overlap: outcomes that create value for both the company and the users\u003ca href=\"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/#fn:1\" id=\"fnref:1\" title=\"see footnote\" class=\"footnote\"\u003e[1]\u003c/a\u003e. That is demanding work, and it requires understanding both sides. It also requires that users are the customers. Experimentation culture in the ad-revenue context has elegantly sidestepped this problem.\u003c/p\u003e\n\n\u003cp\u003eThe problem is that most software projects do not exist in that environment.\u003c/p\u003e\n\n\u003cp\u003eEnterprise software and domain-specific applications have different success criteria, and a fundamentally different relationship between what can be measured and what actually matters. Perri calls the result the Build Trap: organisations that fixate on output metrics lose the thread between what they build and what users actually need. Features ship, velocity is high — but the software does not serve the real needs of the people who use it.\u003c/p\u003e\n\n\u003ch3 id=\"output-instead-of-outcome\"\u003eOutput instead of outcome\u003c/h3\u003e\n\n\u003cp\u003eBaldur Bjarnason describes in “Out of the Software Crisis”\u003ca href=\"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/#fn:2\" id=\"fnref:2\" title=\"see footnote\" class=\"footnote\"\u003e[2]\u003c/a\u003e what this looks like in practice:\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003e“We decide on the problem without checking to make sure it’s a real problem for our end users. We then design without researching the nature and structure of the problem we’re trying to address. We ship without testing to see if it actually does the job it’s supposed to. Only then do we do some actual testing, often A/B tests. We throw two half-baked unfinished designs into a functional shipping application that people rely on to do their work and use Data™ to see which unmitigated disaster is marginally less disastrous for the working lives of those held hostage by our applications.”\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eThe first sentence is the decisive one. Failing to check whether you are solving the right problem is output orientation in its purest form. You measure what is easy to measure, not what matters.\u003c/p\u003e\n\n\u003cp\u003eTeresa Torres makes the consequence clear: Continuous Discovery\u003ca href=\"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/#fn:3\" id=\"fnref:3\" title=\"see footnote\" class=\"footnote\"\u003e[3]\u003c/a\u003e is not about validating arbitrary hypotheses faster. It is about developing understanding before it is even clear what is worth testing. A/B tests tell you whether a specific change moves a specific metric. They do not tell you whether you are moving the right metric, or whether the metric connects to anything users actually care about.\u003c/p\u003e\n\n\u003cp\u003eBjarnason, drawing on Deming\u003ca href=\"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/#fn:4\" id=\"fnref:4\" title=\"see footnote\" class=\"footnote\"\u003e[4]\u003c/a\u003e, calls this tampering: reacting to symptoms as though they were causes. If the underlying assumption is wrong, A/B testing only optimises more thoroughly in the wrong direction. A/B tests are valuable when the conditions are right. Too often, however, they are used to replace genuine user understanding rather than complement it.\u003c/p\u003e\n\n\u003ch3 id=\"invisible-stakeholders\"\u003eInvisible stakeholders\u003c/h3\u003e\n\n\u003cp\u003eA/B tests at high frequency without an adequate foundation can at best land a lucky hit. So how does one arrive at genuine user understanding?\u003c/p\u003e\n\n\u003cp\u003eIn my blog post \u003ca href=\"https://www.innoq.com/en/blog/2026/03/ai-cognitive-lens-domain-knowledge/\"\u003e“Hail Mary”\u003c/a\u003e, I argued that domain knowledge is often not consciously accessible and is socially distributed — what Polanyi calls tacit knowledge. The same applies to user knowledge. Users are often unable to articulate what they really need, but they know it when they experience it. Metrics are an attempt to replace this implicit knowledge with behavioural data.\u003c/p\u003e\n\n\u003cp\u003eIn “Continuous Discovery Habits”, Torres recommends regular interviews with users and describes the pitfalls to avoid and how to ask the right questions to surface this implicit knowledge: general questions tempt people to activate their generalised self-image rather than reflect on their actual behaviour. People are poor at knowing what they typically do, but good at recalling concrete episodes. Instead of asking “What criteria matter to you when choosing a restaurant?”, you should say: “Tell me about the last time you went out to eat with someone.”\u003c/p\u003e\n\n\u003cp\u003eTools like BMAD do ask discovery-oriented questions — about the problem, the market environment, and the users. But they compress a process that requires time. The answers that emerge are what respondents can consciously retrieve in that moment, not what develops through observation, iteration, and incubation over time. And even if BMAD were to interview the right people, the question structure would systematically surface what users believe they do, not what they actually do. The implicit knowledge remains invisible.\u003c/p\u003e\n\n\u003cp\u003eMany organisations actively work against building this user understanding, deliberately shielding product teams and developers from the people who use their software.\u003c/p\u003e\n\n\u003cp\u003eAnd even when interviews with real users do take place, there is a further problem: software affects not only those who use it directly, but also people who never interact with the system at all. I call these people invisible stakeholders. Even when organisations do talk to users, these invisible stakeholders are almost never considered in design decisions.\u003c/p\u003e\n\n\u003cp\u003eAn example of such an invisible stakeholder would be the person who stocks shelves at a supermarket. She probably has no access to the ordering software. But when that software miscalculates an order, she works overtime. When a new version changes how stock levels are displayed and the system behaves differently from what her manager expects, the confusion lands in the store — not in the product team’s retrospective. She is not a stakeholder in the conventional sense of the word. She is simply the one who bears the consequences. And whether the software serves her needs well or poorly will not be measured by any A/B test.\u003c/p\u003e\n\n\u003ch3 id=\"being-there-instead-of-measuring\"\u003eBeing there instead of measuring\u003c/h3\u003e\n\n\u003cp\u003eMy fellow student Jörg Niesenhaus recently reflected \u003ca href=\"https://www.linkedin.com/feed/update/urn:li:activity:7428949660557484032/\"\u003ein a LinkedIn post\u003c/a\u003e on his first weeks at ALDI DX. As part of his onboarding, he spent two weeks working in a store like any other member of staff: stocking shelves, operating the checkout, handling edge cases like broken freezer units and attempted theft. His friends at other IT companies asked why anyone would spend more than a day doing that. His answer: one day teaches you the basic processes. Two weeks teaches you the edge cases, the informal knowledge that colleagues share with each other, and above all how much work goes into selling a single yoghurt or cucumber. After seven years, he says, he can still trace decisions back to what he learned in that store.\u003c/p\u003e\n\n\u003cp\u003eThis is not empathy. Empathy presupposes a distance. What ALDI DX built into their onboarding is more direct: knowledge that comes from being there and experiencing it yourself. You know what it means when the label printer jams because it jammed on you. You have experienced the cashier, the shelf stocker, the store manager who checks the order at six in the morning. You have felt firsthand what it means when the software does not know an edge case and you have to improvise the workaround yourself.\u003c/p\u003e\n\n\u003cp\u003eNot every company can or wants to send its IT staff to work in a store for two weeks. But the underlying principle can be applied in a less intensive form. Ethnographic field research, and its more accessible variant, contextual inquiry, does not require you to do the work yourself. Being there is enough. The researchers observe in the real context, ask questions while the work is happening, and see the workarounds that nobody documents and the edge cases that never appear in any interview. Jared M. Spool calls this principle \u003ca href=\"https://articles.centercentre.com/user_exposure_hours/\"\u003eexposure hours\u003c/a\u003e: systematic, regular presence with real users in real situations.\u003c/p\u003e\n\n\u003cp\u003eThe hierarchy is roughly this: doing the work yourself is the strongest form, because you are an actor and not an observer. Being there and observing is the next best option — you are in the context, you see what actually happens. Regular presence without a formal framework is easier to organise and keeps the intuition alive. After that come short interviews with users, as Torres describes. And then, a long way behind: structured interviews and A/B tests. Experimentation culture too often operates exclusively at this last level, structurally maintaining the distance between the people who build software and the people whose working lives depend on it. Users become a source of behavioural signals — not people with a job to do.\u003c/p\u003e\n\n\u003ch3 id=\"even-more-speed\"\u003eEven more speed\u003c/h3\u003e\n\n\u003cp\u003eThere is a cognitive dimension to this debate that rarely gets discussed. \u003ca href=\"https://en.wikipedia.org/wiki/ISO_9241#ISO_9241-110\"\u003eISO 9241–110\u003c/a\u003e, the international standard for interaction design principles, lists conformance to user expectations as one of seven fundamental principles. A system that runs continuous experiments on its own interface through agentically accelerated development is structurally incapable of meeting this standard. Users build mental models of software through repeated use. Every experiment that changes something resets part of that model. The cognitive load this creates is real, but it is generally invisible in the metrics used to evaluate whether an experiment succeeded. Added to this: the higher the experimentation frequency, the more discipline is required to clean up failed experiments — dead code paths, orphaned feature flags, UI elements that belonged to a variant that lost.\u003c/p\u003e\n\n\u003cp\u003eAgentic development makes this more urgent. If experimentation culture was already a questionable fit for most enterprise contexts, tools that generate features faster and lower the cost of running more experiments do not fix the mismatch — they amplify it. BMAD promises requirements discovery in hours, with an agent interviewing the people who have access to the tool. But the knowledge that matters most in complex domains is implicit, socially distributed, and not accessible through structured interviews. The colleague who stocks the shelves will not be in that interview. Her knowledge will not appear in the specification.\u003c/p\u003e\n\n\u003cp\u003eThe question is what you need to understand before experimentation becomes meaningful, and whether the current pressure towards speed enables that understanding or structurally prevents it. Users cannot build a stable relationship with software that changes continuously. Sometimes the wisest thing you can do is change nothing.\u003c/p\u003e\n\n\u003cfoot-notes class=\"footnotes\"\u003e\n\n\u003col class=\"footnotes__list\"\u003e\n\n\u003cli id=\"fn:1\"\u003e\n\u003cp\u003eMelissa Perri: Escaping the Build Trap: How Effective Product Management Creates Real Value, O’Reilly Media, 2018 \u003ca href=\"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/#fnref:1\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:2\"\u003e\n\u003cp\u003eBaldur Bjarnason: Out of the Software Crisis, 2023 \u003ca href=\"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/#fnref:2\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:3\"\u003e\n\u003cp\u003eTeresa Torres: Continuous Discovery Habits, Product Talk LLC, 2021 \u003ca href=\"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/#fnref:3\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:4\"\u003e\n\u003cp\u003eW. Edwards Deming: Out of the Crisis, MIT Press, 1982 \u003ca href=\"https://www.innoq.com/en/blog/2026/06/faster-is-not-better/#fnref:4\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003c/ol\u003e\n\u003c/foot-notes\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/218rdlmhzyj37pm060ys07f4rq1x?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/06/openproject-besser-als-jira/","url":"https://www.innoq.com/en/blog/2026/06/openproject-besser-als-jira/","date_published":"2026-06-07T00:00:00.000+00:00","date_modified":"2026-06-09T15:57:05.878+02:00","author":[{"name":"Holger Kraus","url":"https://www.innoq.com/en/staff/holger-kraus/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003e\u003cem\u003eInspired by the \u003ca href=\"https://di.day/en\"\u003eDigital Independence Day\u003c/a\u003e, which calls for practical recipes for digital sovereignty, we regularly share our own approaches to topics that matter to our clients - from now on, every first Sunday of the month.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003e\u003cem\u003e\u003cstrong\u003eUpdate:\u003c/strong\u003e The OpenProject team let us know that Action Boards have actually been included in the Community Edition since version 17.3 — something I’d unfortunately overlooked. I think this is great news: it lets teams run OpenProject in their everyday agile workflow and thoroughly evaluate it in real-world production over a longer stretch of time.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eI spent several weeks thoroughly testing OpenProject. To do that, I designed example scenarios that let me demonstrate the software’s capabilities. Following the approach of my colleague’s Jira article, I also cover how to use OpenProject alongside an AI model.\u003c/p\u003e\n\n\u003cp\u003eThis article turned out long. I completely understand if you don’t want to read it from start to finish. But since not every section is equally relevant to everyone, I didn’t want to cut anything. The section you’d skip might be exactly the one you needed. Feel free to jump to whatever interests you most. At the end, I summarize my key takeaways in the conclusion. If you do read the whole thing: thank you for your time and attention.\u003c/p\u003e\n\n\u003cp\u003eAs part of our Digital Independence Day article series, I set out to find a data-sovereign alternative to Jira. This article doesn’t compare OpenProject and Jira feature by feature. Instead, it answers a practical question:\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eIf I want to replace Jira today - is OpenProject a viable alternative?\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eGetting to know OpenProject, I quickly realized the software doesn’t just hold its own. I started to suspect it might actually be better than Jira in some areas. But first, let me explain the criteria I used to evaluate it.\u003c/p\u003e\n\n\u003ch3 id=\"different-perspectives-on-jira\"\u003eDifferent Perspectives on Jira\u003c/h3\u003e\n\n\u003cp\u003eJira means different things to different user groups, each with their own goals. I focused on two perspectives:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eThe perspective of an agile development team: developers, product owners, Scrum masters, and so on.\u003c/li\u003e\n\u003cli\u003eThe perspective of a project manager, representing the organizational view.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch4 id=\"the-agile-development-team-s-perspective\"\u003eThe Agile Development Team’s Perspective\u003c/h4\u003e\n\n\u003cp\u003eI’ve used Jira extensively across many client projects as a developer on agile software teams. Jira shaped my expectations of what project management software should do — something I only noticed when I had to use something else. From an agile standpoint, the following Jira features matter most to me:\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eA planning mode:\u003c/strong\u003e When my team works with Scrum, I need a clear distinction between tickets in the current sprint and those still in the backlog. Tools that don’t make this distinction turn the backlog into just another board column, cluttered with too many tickets to navigate. Jira cleanly separates the backlog from the active sprint: during planning, you can pull tickets from the backlog into the sprint. I expect any viable Jira alternative to offer this same planning mode.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eWork item types at different hierarchy levels:\u003c/strong\u003e Some tasks are too large for a single sprint and need to be broken into subtasks worked on separately. In Jira, this typically happens through epics — a way to group related user stories under a common theme. You only realize how important this is once you no longer have it.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eSupport for multiple agile approaches:\u003c/strong\u003e I don’t want my project management tool to dictate which agile methodology I can use. At minimum, I expect Scrum and Kanban support. Scrum support means the planning mode clearly separates backlog from sprint. Kanban support means I can define WIP limits per column, with visual feedback when I exceed the limit on any given column.\u003c/p\u003e\n\n\u003ch4 id=\"criteria-from-the-organizational-project-management-perspective\"\u003eCriteria from the Organizational / Project Management Perspective\u003c/h4\u003e\n\n\u003cp\u003eI’ve primarily experienced Jira from a developer’s perspective. But client work has shown me that Jira has features I never needed day-to-day that clearly matter to organizations. I’ve evaluated Jira alternatives through that organizational lens as well.\u003c/p\u003e\n\n\u003cp\u003eThe needs I identified:\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMulti-level project hierarchy:\u003c/strong\u003e Strategic initiatives at the top-management level often span multiple business units, which means multiple parts of the organization contribute to shared goals. This practically requires a hierarchy: there needs to be a coordination layer where activities from different teams and projects can be tracked and steered. The actual work happens in specialized projects that ideally know as little about each other as possible. Ideally, the project management tool reflects this hierarchy, letting me define top-level tasks, break them into subtasks, and assign them to sub-projects.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eA clear view of overall project status:\u003c/strong\u003e Even when work is distributed across multiple sub-projects, project managers need an easy way to see the overall picture. At this level, the goal is to understand project status, not to micromanage execution in a waterfall fashion.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eLegal compliance and digital sovereignty:\u003c/strong\u003e This is what motivated the article. As a company, I want to stand on solid legal footing. In Germany, that means full GDPR compliance. Strategically, I also don’t want to become overly dependent on a single vendor — I want to be able to assess whether adopting a new project management tool leads me into a dead end I can’t easily escape.\u003c/p\u003e\n\n\u003cp\u003eThese were the criteria I thought through before evaluating any tools. Some additional criteria emerged as I got to know specific products, particularly on the organizational side. The developer perspective was shaped primarily by my own hands-on experience.\u003c/p\u003e\n\n\u003ch3 id=\"why-i-focused-on-openproject\"\u003eWhy I Focused on OpenProject\u003c/h3\u003e\n\n\u003cp\u003eI started my search for a Jira alternative with online research. I looked at the options and compared them against my criteria. OpenProject caught my attention immediately, so I focused my investigation there and only briefly looked at other solutions. If you want more detail on the other candidates, see the appendix at the end of this article.\u003c/p\u003e\n\n\u003cp\u003eThree things stand out when you visit the OpenProject website:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eData sovereignty and data security\u003c/li\u003e\n\u003cli\u003eSupport for classical, agile, and hybrid project management\u003c/li\u003e\n\u003cli\u003eStrong references: Siemens, Deutsche Bahn, Fraunhofer, Greenpeace, The Linux Foundation, AMG, and other well-known customers\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch4 id=\"data-sovereignty\"\u003eData Sovereignty\u003c/h4\u003e\n\n\u003cp\u003eOpenProject is open source. You can self-host it or opt for a managed cloud version. What makes the cloud option especially attractive: you choose your cloud provider. Options include AWS Europe and Scaleway, a French provider. The pricing is also compelling. Cloud and self-hosted licenses cost the same, so choosing the cloud saves administration overhead without costing more.\u003c/p\u003e\n\n\u003cp\u003eI won’t compare the two deployment options in depth here. The key point is that OpenProject gives you options. That strengthens your data sovereignty: you decide how much control you retain over operations and data.\u003c/p\u003e\n\n\u003ch4 id=\"classical-agile-and-hybrid-project-management\"\u003eClassical, Agile, and Hybrid Project Management\u003c/h4\u003e\n\n\u003cp\u003eOpenProject bridges the development team and organizational perspectives. Teams pick their preferred agile methodology and work independently within it. Management can simultaneously plan the project framework using a classical waterfall approach while keeping an overview of the overall project. We’ll dig into this in detail later.\u003c/p\u003e\n\n\u003ch4 id=\"notable-reference-customers-and-stakeholders\"\u003eNotable Reference Customers and Stakeholders\u003c/h4\u003e\n\n\u003cp\u003eThe reference customers show that even very large organizations have chosen OpenProject. Beyond that, OpenProject is a module of openDesk. A sovereign office and collaboration suite for public administration, assembled and maintained by ZenDiS (the Center for Digital Sovereignty of Public Administration), wholly owned by the Federal Republic of Germany.\u003c/p\u003e\n\n\u003cp\u003eThe references and the ZenDiS connection signal that many influential stakeholders have a long-term interest in keeping OpenProject viable. That makes a bet on OpenProject comparatively safe. A full customer list is available \u003ca href=\"https://www.openproject.org/customers/\"\u003ehere\u003c/a\u003e.\u003c/p\u003e\n\n\u003ch3 id=\"openproject-in-detail\"\u003eOpenProject in Detail\u003c/h3\u003e\n\n\u003cp\u003eNow let’s take a closer look at OpenProject.\u003c/p\u003e\n\n\u003ch4 id=\"licensing-model\"\u003eLicensing Model\u003c/h4\u003e\n\n\u003cp\u003eAnyone who wants to evaluate OpenProject can do so through a 14-day free trial that includes all enterprise features. The enterprise plans are:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eBasic\u003c/li\u003e\n\u003cli\u003eProfessional\u003c/li\u003e\n\u003cli\u003ePremium\u003c/li\u003e\n\u003cli\u003eCorporate\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eThey differ in three ways:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003efeatures included\u003c/li\u003e\n\u003cli\u003eminimum number of users\u003c/li\u003e\n\u003cli\u003esupport level\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eThe Community Edition is licensed under the GNU General Public License v3, keeping OpenProject permanently open source.\u003c/p\u003e\n\n\u003ch4 id=\"key-enterprise-features\"\u003eKey Enterprise Features\u003c/h4\u003e\n\n\u003cp\u003eEnterprise features are unlocked via an enterprise token, which administrators receive when they purchase a license. OpenProject validates the token cryptographically and then activates the purchased features.\u003c/p\u003e\n\n\u003cp\u003eThe Community Edition gets you surprisingly far unless you need robust agile tooling. In practice, most teams want Action Boards, which are included in every enterprise tier. Single sign-on is another common requirement; OpenID Connect and SAML/SCIM support are available from the Professional tier upward.\u003c/p\u003e\n\n\u003cp\u003eThe pricing page on the OpenProject website isn’t particularly clear, though. It’s hard to tell which tier includes which features, and Action Boards are missing from the comparison table entirely. The in-product experience handles this better: when you’re on the Community Edition, unavailable features clearly tell you which tier unlocks them.\u003c/p\u003e\n\n\u003cp\u003eFor professional use, an enterprise license is almost always necessary.\u003c/p\u003e\n\n\u003ch4 id=\"cloud-vs-on-premises\"\u003eCloud vs. On-Premises\u003c/h4\u003e\n\n\u003cp\u003eCloud and on-premises tiers offer the same features at the same price. The main difference is the minimum user count: the cloud Basic tier starts at five users; on-premises requires at least 25 user licenses. OpenProject also offers a paid add-on for construction project management, which I haven’t covered here.\u003c/p\u003e\n\n\u003ch4 id=\"no-marketplace-apps\"\u003eNo Marketplace Apps\u003c/h4\u003e\n\n\u003cp\u003eOne thing I appreciate: all features are part of the core product. In Jira, additional capabilities come via Marketplace apps, adding cost and frequently raising data privacy concerns, since third-party vendors gain access to your Jira data. OpenProject’s all-in-one approach means more predictable long-term license costs.\u003c/p\u003e\n\n\u003ch4 id=\"cost-comparison\"\u003eCost Comparison\u003c/h4\u003e\n\n\u003cp\u003eJira Premium is priced at $14.54 per user per month (roughly €13.60). OpenProject Professional comes in at €10.95 per user per month, both cloud pricing. The gap widens significantly for Jira Data Center, which has a 500-user minimum. For this article, I’m assuming around 100 users, which is the typical project size in my experience. Under these conditions, Jira is noticeably more expensive than OpenProject. You can calculate licensing costs for your own organization \u003ca href=\"https://www.openproject.org/pricing/\"\u003ehere\u003c/a\u003e.\u003c/p\u003e\n\n\u003ch3 id=\"hybrid-project-management-in-openproject\"\u003eHybrid Project Management in OpenProject\u003c/h3\u003e\n\n\u003cp\u003eI didn’t encounter hybrid project management until I started working with OpenProject, so let me briefly explain the concept.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eWhat is hybrid project management?\u003c/strong\u003e It rests on two pillars:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eagile\u003c/li\u003e\n\u003cli\u003eclassical, i.e. waterfall\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eIt draws a clear line between the strategic and operational levels of project management.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eStrategic-level planning:\u003c/strong\u003e High-level framing happens classically. Senior management defines what should be delivered within a large, cross-team project. Rough work packages are defined along with their deadlines, setting clear expectations that teams can orient themselves around.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eOperational team-level planning:\u003c/strong\u003e Teams receive the high-level work packages from the strategic layer and break them down into actionable tasks. From there, they plan execution agilely, deciding themselves what to tackle in which sprint. Since they understand management’s expectations, they can factor those into their own planning.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eTracking goals at the team level:\u003c/strong\u003e Teams set their own sprint goals during sprint planning and keep them in view through their daily standups and regular work.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eTracking goals at the strategic level:\u003c/strong\u003e Because management has set deadlines based on high-level work packages, it wants to stay informed about progress against those packages. In the worst case, teams have to write status reports. Ideally, the project management tool maps team-level activity back to the strategic work packages automatically — no extra effort required from individual teams.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eHow does OpenProject support hybrid project management?\u003c/strong\u003e I’ll walk you through setting up hybrid project management in OpenProject step by step, using typical scenarios. Your own requirements may differ — but you’ll see whether and how you could adapt them.\u003c/p\u003e\n\n\u003ch3 id=\"openproject-handles-project-and-work-package-hierarchies-with-remarkable-flexibility\"\u003eOpenProject Handles Project and Work Package Hierarchies with Remarkable Flexibility\u003c/h3\u003e\n\n\u003cp\u003eIn theory, you can nest projects and work packages as deeply as you like in OpenProject. In practice, you should think through what structure makes sense for your context. For my example, I use a parent project with two sub-projects: simple enough to stay clear, but rich enough to illustrate the key features. The top-level project represents the strategic planning layer; the two sub-projects represent operational team-level work.\u003c/p\u003e\n\n\u003cp\u003eSub-projects can be further subdivided and extended with additional levels as needed.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"UI screenshot titled \" gro enterprise-projekt with overview tabs description and subitems style=\"width: 50%; height: auto;\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/krcylnj2sgdzsbacrpyaj2kgpkzl?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eA parent project with two sub-projects\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003ch4 id=\"planning-an-initiative\"\u003ePlanning an Initiative\u003c/h4\u003e\n\n\u003cp\u003eIn my example, I create a cross-project initiative at the strategic planning level, then create two epics for the sub-projects, and finally add a milestone to keep a key overall project deadline in view.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"Work package list with items \" eine initiative f alle task and two epics status new priority normal. loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/wkcyvk0lon1ehx36boihlhp0k7ir?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eA cross-team initiative broken down into two team-specific epics\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003eI then assign the epics to the respective sub-projects. The screenshot below shows the work package view in Sub-project 1, where the epic has already been broken into five user stories.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"Work item list showing \" eine initiative f alle an epic and story type status columns show loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/le7qnsudx19cky39pw212qu5ttlw?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eAn epic in a sub-project, broken down into individual stories\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003ch4 id=\"executing-work-in-project-teams\"\u003eExecuting Work in Project Teams\u003c/h4\u003e\n\n\u003cp\u003eInside a sub-project, you can hide the parent project context and focus on your team’s scope. The screenshot below shows the board for the Sub-project 1 team. I’ll explain how OpenProject supports agile work in more detail below.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"OpenProject board \" unser team-board with columns status new in progress ready for review and developed showing user stories. loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/r7epjnhg7w1iparkbz5pmuerszg2?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eA sprint board in Sub-project 1\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003ch4 id=\"project-level-overview\"\u003eProject-Level Overview\u003c/h4\u003e\n\n\u003cp\u003eAt the top-level project, you can display a Gantt chart spanning the entire project. The view is also available in sub-projects, but since strategic control happens at the top level, I’ll focus on that view here.\u003c/p\u003e\n\n\u003cp\u003eFor the Gantt chart to be meaningful, teams in sub-projects need to keep their work packages well-maintained — particularly predecessor/successor relationships and start and end dates at the lowest level.\u003c/p\u003e\n\n\u003cp\u003eIf you set work packages at higher hierarchy levels to “Automatic Scheduling,” OpenProject propagates changes from lower levels upward automatically. When a date shifts at the bottom, the parent date updates as well.\u003c/p\u003e\n\n\u003cp\u003eIn my example, the milestone is defined as a successor of the epics. When an epic’s end date changes, OpenProject automatically moves the milestone along with it.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"Gantt chart of project work items (Summary Task, Epic, User Story) across 2026–2027, ending with milestone \" product launch loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/kbfjt9p5pojcvywsngwhp60twfe5?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eThe cross-project Gantt chart: the milestone shifts automatically as epics move\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003ch4 id=\"my-take-on-hybrid-project-management-in-openproject\"\u003eMy Take on Hybrid Project Management in OpenProject\u003c/h4\u003e\n\n\u003cp\u003eI’ve never personally worked on a project that was visibly managed in a hybrid way. What I did experience were teams working agilely while management imposed fixed deadlines. In those situations, a project setup like the one I’ve described here would have added real transparency and mutual understanding.\u003c/p\u003e\n\n\u003cp\u003eI can only evaluate what I experienced building my example scenario. What impressed me most is how flexibly OpenProject lets you define hierarchies for projects and work packages.\u003c/p\u003e\n\n\u003cp\u003eThat said, generating meaningful Gantt charts at the team level quickly pulls you away from true agile practice. Work package relationships must mirror reality, and start and end dates need to be maintained. One feature I missed: automatically inheriting start and end dates from the sprint a work package belongs to. It would also help to make active sprints directly visible in the Gantt chart.\u003c/p\u003e\n\n\u003cp\u003eOn balance, I find the approach convincing: teams can work agilely while project management keeps an overview — no status reports needed.\u003c/p\u003e\n\n\u003ch3 id=\"agile-project-management-in-openproject\"\u003eAgile Project Management in OpenProject\u003c/h3\u003e\n\n\u003cp\u003eAgile development teams care about one question above all: how does OpenProject support their day-to-day work?\u003c/p\u003e\n\n\u003cp\u003eI focus on Scrum and Kanban, the same two frameworks Jira centers on. I’ll start with Scrum. Kanban comes up along the way, though in my view OpenProject has built out more features for Scrum than for Kanban.\u003c/p\u003e\n\n\u003ch4 id=\"how-openproject-supports-sprint-planning\"\u003eHow OpenProject Supports Sprint Planning\u003c/h4\u003e\n\n\u003cp\u003eWork items are created in the Work Packages module, where teams describe tasks, break them into subtasks, and manage dependencies between work packages.\u003c/p\u003e\n\n\u003cp\u003eThe actual planning happens in the Backlogs module. OpenProject has significantly expanded this area in recent versions. It used to manage mostly generic version assignments; today, OpenProject understands sprints as properly defined containers for tickets. Teams assign start and end dates to each sprint. Once a sprint kicks off, OpenProject automatically generates a board with all tickets for that sprint and displays a live burndown chart.\u003c/p\u003e\n\n\u003cp\u003eThe screenshot shows OpenProject’s planning mode. On the left, all open tickets not yet assigned to a sprint sit in the backlog. On the right, the created sprints appear, with story point totals shown in the header.\u003c/p\u003e\n\n\u003cp\u003eWhat I especially like is showing the backlog and sprints side by side, each scrolling independently. That gives a much clearer view than Jira, which stacks both areas vertically and makes it easy to lose your place while scrolling.\u003c/p\u003e\n\n\u003cp\u003eCreating new work packages on the fly during planning is also possible, though a bit buried. It lives in the three-dot menu in the sprint header.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"\" backlog and sprints board showing items with user stories points. loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/64qedrd7s8bkwpaz6d1x7ruw3see?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003ePlanning mode: backlog and sprint side by side\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003ch4 id=\"types-of-boards\"\u003eTypes of Boards\u003c/h4\u003e\n\n\u003cp\u003eOnce planning is done, the question becomes how OpenProject supports team collaboration. There are several board types worth knowing about.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eThere are two fundamental board types:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eBasic Boards\u003c/strong\u003e are part of the Community Edition. They’re functional, but limited: you have to populate them manually. Every ticket that should appear on the board must be added individually. Tickets don’t appear automatically just because they belong to a sprint. Likewise, moving a ticket to another column doesn’t update any of its fields. Dragging a ticket to an “In Progress” column doesn’t change its status. You’d still have to update the status separately by hand.\u003c/p\u003e\u003c/li\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eAction Boards\u003c/strong\u003e behave the way most Jira users would expect: tickets appear on the board automatically based on sprint membership and board filters, and moving a ticket updates the corresponding fields. I was initially disappointed to find that Action Boards require an enterprise license. But I’ve come around on this: they’re a compelling reason to invest in an enterprise tier, and that investment is precisely what ensures OpenProject has a long-term future and remains a safe bet for its customers.\u003c/p\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003e\u003cstrong\u003eTypes of Action Boards:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cp\u003eOpenProject offers a range of Action Boards that address common scenarios directly and clearly — things that used to require clicking through many different pages in Jira. I’m not sure whether Jira has comparable solutions.\u003c/p\u003e\n\n\u003cp\u003eHere’s a quick overview of the Action Boards OpenProject offers:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eThe \u003cstrong\u003eKanban Board\u003c/strong\u003e is the equivalent of Jira’s classic board, with columns organized by status. It’s well-suited for day-to-day agile software development and serves as the primary board even in Scrum teams.\nOne notable gap: there are no WIP limits. Limiting work in progress is a core principle of Kanban, and without it, teams can lose track quickly and bottlenecks go unnoticed longer.\u003c/li\u003e\n\u003cli\u003eThe \u003cstrong\u003eAssignee Board\u003c/strong\u003e organizes tickets by team member. It gives an instant read on workload: who’s overloaded, who has capacity?\u003c/li\u003e\n\u003cli\u003eThe \u003cstrong\u003eVersion Board\u003c/strong\u003e has one column per product version. It helps product owners build release plans and assign features to target software releases.\u003c/li\u003e\n\u003cli\u003eThe \u003cstrong\u003eSubproject Board\u003c/strong\u003e is similar to the Assignee Board, but columns are organized by sub-project rather than person. It answers the question: which team is working on what? Since sub-projects have a functional focus and tickets can’t be freely reassigned between them, this board is less about capacity management and more about visibility.\u003c/li\u003e\n\u003cli\u003eThe \u003cstrong\u003eParent-Child Board\u003c/strong\u003e reflects parent work packages as columns. It helps create project structure plans or a work breakdown structure (WBS) — you can quickly see whether subtasks are assigned to the right packages and rearrange as needed.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch4 id=\"does-openproject-support-scaled-agile-frameworks\"\u003eDoes OpenProject Support Scaled Agile Frameworks?\u003c/h4\u003e\n\n\u003cp\u003eI’m not deeply familiar with scaled agile frameworks, so I won’t claim to offer an authoritative assessment here. That said, OpenProject’s ability to nest projects within each other provides a lot of flexibility for representing complex organizational structures. Two features I haven’t mentioned yet are also relevant in this context.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eShared sprints:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cp\u003eTo synchronize sprints across multiple projects, you can create a shared sprint at the parent project level. It becomes visible in all sub-projects it’s shared with. Only the time window is shared — each team still decides what goes into its own sprint. The shared sprint is started and stopped at the parent project level.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eRoadmaps and versions:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cp\u003eSince OpenProject introduced proper sprint containers, version management has become a little harder to find — it’s tucked away in each project’s settings. Once you’ve set up a first version, the Roadmap module becomes available. In my example, I created versions at the parent project level and shared them with the sub-projects. In SAFe, for instance, this could represent a Program Increment. Versions can be assigned start and end dates.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"Roadmap view listing \" gro enterprise-projekt releases mlp mmp mvp all at progress. loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/ba9ojw6ajqnlmykll06yjuy1i8yj?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eA roadmap overview\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003ch4 id=\"my-take-on-agile-project-management-in-openproject\"\u003eMy Take on Agile Project Management in OpenProject\u003c/h4\u003e\n\n\u003cp\u003eOverall, I think agile project management in OpenProject is solid. My initial disappointment, that the Basic Board barely gets you anywhere, faded once I understood the licensing model. Anyone planning to use OpenProject productively needs at least the Basic Enterprise license. With Action Boards unlocked, meaningful agile work becomes possible.\u003c/p\u003e\n\n\u003cp\u003eThe UI doesn’t feel particularly modern to me, not because of any individual feature, but because of the overall impression.\u003c/p\u003e\n\n\u003cp\u003eThe planning mode, on the other hand, is a genuine strength: backlog and sprint sit side by side, each independently scrollable, making ticket management easier than in Jira.\u003c/p\u003e\n\n\u003cp\u003eMy real gripe is the missing WIP limits per column. Without them, OpenProject’s Kanban support falls short in my view. Scrum-focused teams probably won’t care — but here, Jira still has the edge for me.\u003c/p\u003e\n\n\u003cp\u003eOn scaled agile frameworks: I can’t offer a solid judgment. I know Jira’s relevant features too little to compare them fairly. What I have shown is what OpenProject can do. I hope that’s enough for you to form your own view.\u003c/p\u003e\n\n\u003ch3 id=\"classical-project-management-in-openproject\"\u003eClassical Project Management in OpenProject\u003c/h3\u003e\n\n\u003cp\u003eI haven’t looked at OpenProject specifically through a classical project management lens. Waterfall-style projects have long since disappeared from my day-to-day work. Even so, several classical PM features have already come up:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eWork packages can be ordered in a logical sequence.\u003c/li\u003e\n\u003cli\u003eWork packages can have start and end dates.\u003c/li\u003e\n\u003cli\u003eMilestones can be created independently and placed in sequence.\u003c/li\u003e\n\u003cli\u003eProjects and work packages can be hierarchically nested.\u003c/li\u003e\n\u003cli\u003eCross-project Gantt charts are available.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eA few additional features fall into the classical PM category, though they’re useful in agile contexts too. I haven’t tested these in depth — the following descriptions are based on the official documentation.\u003c/p\u003e\n\n\u003ch4 id=\"budget-planning-and-tracking\"\u003eBudget Planning and Tracking\u003c/h4\u003e\n\n\u003cp\u003eOpenProject supports project budgets that cover both planned labor costs and material costs such as supplies or travel. The system calculates labor costs based on users' configured hourly rates. Teams can define custom cost types with their own units and rates for material expenses.\u003c/p\u003e\n\n\u003ch4 id=\"time-tracking-and-cost-control\"\u003eTime Tracking and Cost Control\u003c/h4\u003e\n\n\u003cp\u003eWork packages can be assigned to a budget. Time and material costs booked against those work packages are automatically applied to the budget. Budget overviews and project widgets provide a live planned-vs-actual comparison and remaining budget at any time.\u003c/p\u003e\n\n\u003ch4 id=\"change-tracking-with-baseline\"\u003eChange Tracking with Baseline\u003c/h4\u003e\n\n\u003cp\u003eOpenProject doesn’t offer a classical baseline function like Microsoft Project — there’s no frozen project plan snapshot. Instead, the baseline feature compares table views of work packages against an earlier state, such as yesterday, last week, or a specific date.\u003c/p\u003e\n\n\u003cp\u003eThe system highlights changed, added, and removed work packages, displaying old and new values side by side. A Gantt chart representation isn’t available yet. Except for comparisons with the previous day, the baseline feature requires an enterprise license.\u003c/p\u003e\n\n\u003ch3 id=\"teaching-an-ai-to-talk-to-openproject\"\u003eTeaching an AI to Talk to OpenProject\u003c/h3\u003e\n\n\u003cp\u003eUsing AI with OpenProject works the same way as in the Jira scenario: through an MCP server (Model Context Protocol). Several options are available.\u003c/p\u003e\n\n\u003ch4 id=\"openproject-s-built-in-mcp-server\"\u003eOpenProject’s Built-In MCP Server\u003c/h4\u003e\n\n\u003cp\u003eWith an enterprise license, you can use the MCP server built directly into OpenProject. At present it supports only read operations — no writes yet. That’s enough for scenarios like asking a coding agent to implement a specific ticket. In my tests with Claude Code, this combination worked seamlessly.\u003c/p\u003e\n\n\u003ch4 id=\"a-wide-range-of-open-source-mcp-servers\"\u003eA Wide Range of Open-Source MCP Servers\u003c/h4\u003e\n\n\u003cp\u003eIf you want to use AI to automatically create work packages from meeting notes, you need an MCP server that also supports write operations. A quick search turns up many solid options. There are fundamentally two ways clients can communicate with MCP servers:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003e\n\u003cstrong\u003eStandard I/O (stdio):\u003c/strong\u003e The client starts the MCP server locally and communicates with it directly via the command line.\u003c/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eNetwork server / HTTP:\u003c/strong\u003e The MCP server runs remotely on the network, and clients communicate with it via HTTP.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eThe \u003cstrong\u003estdio\u003c/strong\u003e approach has the advantage that the MCP server runs with the user’s personal API token — meaning it has exactly the same privileges as that user and cannot exceed them. The downside: everyone needs to install their own local instance, which can be a problem in some environments.\u003c/p\u003e\n\n\u003cp\u003eThat’s why I’d recommend the open-source MCP server \u003ca href=\"https://github.com/tmskln/spring-openproject-mcp-server\"\u003etmskln/spring-openproject-mcp-server\u003c/a\u003e. It forwards the user’s API token to the OpenProject API, ensuring that user privilege boundaries are respected — while still allowing a single central server instance for everyone. For a concrete walkthrough of using OpenProject with a local AI model in LM Studio, refer to my colleague Nicolas Inden’s article. The setup transfers directly to OpenProject.\u003c/p\u003e\n\n\u003ch3 id=\"more-useful-openproject-features\"\u003eMore Useful OpenProject Features\u003c/h3\u003e\n\n\u003ch4 id=\"wiki-and-documents\"\u003eWiki and Documents\u003c/h4\u003e\n\n\u003cp\u003eOpenProject includes a built-in wiki. It doesn’t replace Confluence — each wiki is scoped to a specific project, so it’s best suited for project-specific documentation. Somewhat confusingly, OpenProject also has a separate Documents module, which lets you create items of different document types: Proposal, Idea, Specification, Documentation, and so on.\u003c/p\u003e\n\n\u003cp\u003eI’d use the wiki when:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003emultiple people are collaborating on content,\u003c/li\u003e\n\u003cli\u003ethe documentation has a hierarchical structure,\u003c/li\u003e\n\u003cli\u003epages cross-reference each other,\u003c/li\u003e\n\u003cli\u003eyou prefer writing in Markdown.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eEvery wiki page is stored internally as Markdown, and you can toggle between Markdown and WYSIWYG editing at any time. The wiki runs on CKEditor 5; the Documents module uses BlockNote, which currently lacks a Markdown mode.\u003c/p\u003e\n\n\u003cp\u003eOnce a document is finished, you can technically move it into the Documents module — but OpenProject doesn’t offer a graceful way to do it. Your options are exporting the wiki page as a PDF and importing it as an attachment, or copying and pasting the content.\u003c/p\u003e\n\n\u003cp\u003eSince OpenProject 17.0, documents support real-time collaboration which is a feature I would have expected in the wiki. The likely reason is that BlockNote’s technical foundation enables it. Conceptually, though, it blurs the line between wiki and documents further. It may signal that OpenProject intends the Documents module to eventually replace the wiki, but that’s speculation on my part. What’s clear is that the two currently overlap in ways that make their respective roles hard to distinguish.\u003c/p\u003e\n\n\u003ch4 id=\"meeting-management\"\u003eMeeting Management\u003c/h4\u003e\n\n\u003cp\u003eThe Meetings module is particularly useful when you want to tie meetings directly to work packages. You can build an agenda from work packages or from standalone agenda items.\u003c/p\u003e\n\n\u003cp\u003eWhen you use work packages as agenda items, results are captured directly in those packages and progress can be tracked afterward. With standalone agenda items, notes are saved in the item itself — but they’re hard to find later, since you’d need to reopen the meeting to see them.\u003c/p\u003e\n\n\u003cp\u003eThe more practical approach is to create a new work package directly from an agenda item, or update an existing one on the spot.\u003c/p\u003e\n\n\u003cp\u003eThe invitation feature is also handy: you can send meeting invitations directly to all attendees from within the module, saving several manual steps.\u003c/p\u003e\n\n\u003ch4 id=\"gitlab-and-github-integration\"\u003eGitLab and GitHub Integration\u003c/h4\u003e\n\n\u003cp\u003eOpenProject can display GitLab or GitHub activity in the activity tab of the linked work package.\u003c/p\u003e\n\n\u003cp\u003eI only tested the GitLab integration and it worked reliably.\u003c/p\u003e\n\n\u003ch4 id=\"nextcloud-integration\"\u003eNextcloud Integration\u003c/h4\u003e\n\n\u003cp\u003eOpenProject integrates closely with Nextcloud. I didn’t test it myself, but it’s \u003ca href=\"https://www.openproject.org/integrations/nextcloud/\"\u003ewell documented\u003c/a\u003e on the OpenProject website. The integration works in both directions: you can link work packages to files in Nextcloud and view a work package overview from within Nextcloud.\u003c/p\u003e\n\n\u003ch4 id=\"creating-work-packages-via-email\"\u003eCreating Work Packages via Email\u003c/h4\u003e\n\n\u003cp\u003eWork packages can be created by email. You can also reply to work package notifications, and OpenProject automatically adds your reply as a comment on the relevant work package — a real convenience for communication.\u003c/p\u003e\n\n\u003ch4 id=\"jira-migrator\"\u003eJira Migrator\u003c/h4\u003e\n\n\u003cp\u003eThe OpenProject team provides a dedicated migration tool for bringing existing Jira projects into OpenProject: \u003cstrong\u003ethe Jira Migrator\u003c/strong\u003e. I didn’t get a chance to test it, but full documentation of its features is available \u003ca href=\"https://www.openproject.org/docs/installation-and-operations/jira-migration/\"\u003ehere\u003c/a\u003e.\u003c/p\u003e\n\n\u003cinfo-box title=\"Appendix: What Other Candidates Did I Look At?\"\u003e\u003cdetails\u003e\u003csummary\u003e\u003cdiv class=\"infobox__teaser\"\u003e\n\u003cdiv class=\"infobox__teaser__left\"\u003e\n\u003cspan class=\"infobox__teaser__heading\"\u003eAppendix: What Other Candidates Did I Look At?\u003c/span\u003e\u003cspan class=\"icon icon-info\"\u003e\u003c/span\u003e\n\u003c/div\u003e\n\u003cdiv class=\"infobox__teaser__right\"\u003e\u003cspan class=\"icon icon-chevron-down icon--small\"\u003e\u003c/span\u003e\u003c/div\u003e\n\u003c/div\u003e\u003c/summary\u003e\u003cdiv class=\"infobox__content\"\u003e\r\n\n\u003cp\u003eAt the start of my search for data-sovereign Jira alternatives, I surveyed the landscape through online research and conversations with my preferred AI model. Here’s a brief look at the candidates I considered and ruled out.\u003c/p\u003e\n\n\u003ch3 id=\"gitlab\"\u003eGitLab\u003c/h3\u003e\n\n\u003cp\u003eWe already run an internal GitLab instance, and GitLab does support issue tracking, so it was a natural first candidate. What ruled it out: epics are a GitLab Premium feature, currently priced at $29 per user per month. That’s considerably more than competing tools, so GitLab dropped off my list.\u003c/p\u003e\n\n\u003ch3 id=\"taiga\"\u003eTaiga\u003c/h3\u003e\n\n\u003cp\u003eTaiga is an open-source project focused on agile project management. The look and feel didn’t appeal to me, and one specific behavior threw me off: you can’t move user stories on the Scrum board. Stories stay in the left column; you move subtasks instead. For a small project where you want something license-free you can self-host, Taiga is worth a closer look — there’s also a paid cloud option. That said, Taiga assumes a flat project hierarchy, which is another point where it diverged from my requirements.\u003c/p\u003e\n\n\u003ch3 id=\"plane\"\u003ePlane\u003c/h3\u003e\n\n\u003cp\u003ePlane is also open source and self-hostable, with cloud options available. It appealed to me strongly on UX grounds — everything feels clean, organized, and easy to navigate; I could find my way around quickly. For a self-hosted agile project management tool, Plane would be my first choice. As a Jira replacement, I set it aside because it’s still a relatively young project — only three years old, compared to OpenProject’s origins in 2012. Although Plane supports project hierarchies too, OpenProject felt like the more mature product with better long-term investment security. That said, Plane is definitely worth a second look if OpenProject doesn’t suit you.\u003c/p\u003e\n\n\u003c/div\u003e\u003c/details\u003e\u003c/info-box\u003e\r\n\n\u003ch3 id=\"summary-and-conclusion\"\u003eSummary and Conclusion\u003c/h3\u003e\n\n\u003cp\u003eSo, is OpenProject better than Jira? The question is a bit of a trap. Both tools solve similar problems in different ways. What matters is what your organization values. Even after spending intensive time with OpenProject, I keep discovering new ways to model project scenarios.\u003c/p\u003e\n\n\u003cp\u003eAfter a month of serious use, my verdict is clear: \u003cstrong\u003eOpenProject is a genuine Jira alternative — especially where data sovereignty, predictable costs, and European infrastructure matter most.\u003c/strong\u003e\u003c/p\u003e\n\n\u003ch4 id=\"where-openproject-has-the-edge\"\u003eWhere OpenProject Has the Edge\u003c/h4\u003e\n\n\u003cp\u003e\u003cstrong\u003eData sovereignty:\u003c/strong\u003e OpenProject is open source under GPLv3. You self-host it or use an EU cloud via Scaleway. There’s no marketplace with uncontrolled third-party data flows.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eHybrid project management without switching tools:\u003c/strong\u003e Strategic planning with Gantt charts and operational team work with backlogs and Action Boards all run in one application. With Atlassian, you typically need Jira, Advanced Roadmaps, and additional Marketplace plugins to get there.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eBetter sprint planning:\u003c/strong\u003e OpenProject shows the backlog and sprint side by side, scrolling independently. It sounds like a small thing — but it solves a concrete daily problem that Jira still hasn’t addressed elegantly.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMore features in the core product:\u003c/strong\u003e From the Professional tier up, the enterprise license includes all essential functionality. No extra plugins required. That saves money and avoids the typical headaches of a plugin ecosystem:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eescalating plugin costs\u003c/li\u003e\n\u003cli\u003edata privacy concerns with third-party apps\u003c/li\u003e\n\u003cli\u003eversion conflicts between the core product and extensions\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch4 id=\"where-jira-still-has-the-edge\"\u003eWhere Jira Still Has the Edge\u003c/h4\u003e\n\n\u003cp\u003e\u003cstrong\u003eKanban:\u003c/strong\u003e OpenProject doesn’t support WIP limits. For teams that practice Kanban seriously, that’s a genuine drawback. Limiting work in progress is a fundamental principle of the methodology.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eWorkflow and view customization:\u003c/strong\u003e Jira’s workflow editor and JQL set the standard. OpenProject is catching up, but isn’t there yet. Anyone running complex Jira workflows should carefully assess what would be missing before migrating.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eUI and UX:\u003c/strong\u003e OpenProject is functional and understated. Jira Cloud looks more modern and polished. That’s a matter of taste, but taste affects daily team life.\u003c/p\u003e\n\n\u003ch3 id=\"my-personal-conclusion\"\u003eMy Personal Conclusion\u003c/h3\u003e\n\n\u003cp\u003eFor me, OpenProject is currently the most compelling data-sovereign alternative to Jira. Not better in every dimension but stronger in exactly the areas that matter right now and are likely to matter even more going forward: \u003cstrong\u003esovereignty and predictable costs\u003c/strong\u003e.\u003c/p\u003e\n\n\u003cp\u003eI was genuinely surprised to discover that such a strong open-source product comes from Germany, and that I hadn’t heard of it before. What convinced me most is OpenProject’s flexibility: you can map out very different organizational and project hierarchies, and mix approaches from multiple project management models. That also means I can’t offer a one-size-fits-all judgment. My recommendation: use the 14-day trial and test your actual use case.\u003c/p\u003e\n\n\u003cp\u003eOpenProject is powerful software, and unlocking its full potential takes some ramp-up time. If you have experience with OpenProject and would like to share it, feel free to drop me an email: holger.kraus@innoq.com. And if you spot any factual errors in this article, I’d be grateful for a quick note as well.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/4x742cpowtwe7k0x8iyga6o9i85s?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/05/the-agentic-trio/","url":"https://www.innoq.com/en/blog/2026/05/the-agentic-trio/","date_published":"2026-05-27T00:00:00.000+00:00","date_modified":"2026-05-27T12:18:49.014+02:00","author":[{"name":"Daniel Westheide","url":"https://www.innoq.com/en/staff/daniel-westheide/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eThe best product teams, whether in startups or large organizations, often work in ways others struggle to replicate: a small unit, typically a product manager, UX designer, and engineer, systematically researches relevant user problems before solutions are built. Rather than implementing stakeholder wishes, it practices continuous discovery to find out what should actually be built. Delivery is handled by the larger product team behind it. Teresa Torres described this way of working as the Product Trio.\u003c/p\u003e\n\n\u003cp\u003eWhat was missing until now: the trio owned discovery, while delivery belonged to the larger team behind it. Agentic development changes that.\u003c/p\u003e\n\n\u003cp\u003eWe call this evolution the \u003cstrong\u003eAgentic Trio\u003c/strong\u003e.\u003c/p\u003e\n\n\u003ch3 id=\"the-shifted-economics-of-delivery\"\u003eThe shifted economics of delivery\u003c/h3\u003e\n\n\u003cp\u003eDelivery is getting faster and cheaper. The continuous discovery-delivery cycle can happen more frequently because the threshold at which another cycle pays off has dropped. This creates capacity. How that capacity is used is a strategic choice.\u003c/p\u003e\n\n\u003cp\u003eFor teams and organizations that don’t make this choice explicitly, the result is often \u003cstrong\u003emore output, not better output\u003c/strong\u003e. But “more discovery instead of more output” is not a trade-off. Teams that invest more in discovery identify more relevant problems, prioritize better, and build solutions that actually work. Over the same period, more opportunities are identified, more are validated before building begins, and more are addressed with solutions that function. Less stuff gets built, but more of what gets built matters.\u003c/p\u003e\n\n\u003cp\u003eThe obvious objection is: if building is cheap, why not just try everything and learn in production what works? The objection sounds pragmatic, but it assumes that relevant uncertainty sits exclusively at the solution level. In reality, many of the riskiest assumptions still live in the opportunity space: Is the problem real? Do enough people have it? Is it relevant enough that they would do something about it? No built feature can answer these questions. You can use agents to generate a solution in a few hours and know just as little about the underlying opportunity afterward as you did before. You’ve tested whether the solution is implementable, not whether the problem exists and whether the solution actually addresses it.\u003c/p\u003e\n\n\u003cp\u003eTorres distinguishes several categories of assumptions: \u003cstrong\u003edesirability\u003c/strong\u003e (\u003cem\u003edo users even want this?\u003c/em\u003e), \u003cstrong\u003efeasibility\u003c/strong\u003e (\u003cem\u003ecan we build it?\u003c/em\u003e), \u003cstrong\u003eviability\u003c/strong\u003e (\u003cem\u003eis it economically sensible to build it?\u003c/em\u003e), and \u003cstrong\u003eusability\u003c/strong\u003e (\u003cem\u003ecan users use it?\u003c/em\u003e). Usage metrics are limited in their ability to distinguish between these. If a feature goes unused, it could be due to missing desirability, poor usability, or a workaround users have internalized and won’t give up. The signal is confounded. A short conversation before building would have answered the question more directly and cheaply. Desirability assumptions are often the riskiest and simultaneously the cheapest to test. Skipping them and building directly means validating the wrong things, just faster.\u003c/p\u003e\n\n\u003cp\u003eThis doesn’t apply to every type of assumption. Usability and feasibility assumptions can often be validated more cheaply through prototyping or direct building than through interviews. Agentic development lowers this threshold further. Quick prototypes get cheaper, the cycle between building and learning gets shorter. That is a real shift. But it only affects part of the assumption space.\u003c/p\u003e\n\n\u003ch3 id=\"why-more-output-destabilizes-the-system\"\u003eWhy more output destabilizes the system\u003c/h3\u003e\n\n\u003cp\u003eThe choice for more output over systematic and continuous discovery has a second, less obvious cost: it destabilizes the system. Data from \u003ca href=\"https://www.faros.ai/ai-productivity-paradox\"\u003eFaros AI\u003c/a\u003e and \u003ca href=\"https://cloud.google.com/resources/content/2025-dora-ai-assisted-software-development-report\"\u003eDORA\u003c/a\u003e demonstrate this empirically.\u003c/p\u003e\n\n\u003cp\u003eFaros calls the pattern \u003cstrong\u003e“Acceleration Whiplash”\u003c/strong\u003e: when teams use agentic development primarily for more throughput, output rises — tasks are completed 34% more often, epics 66% more often, code-specific tasks even 210% more often. At the same time, all quality indicators fall. Bugs per developer have risen by 54%, the incidents-to-PR ratio has more than tripled (+242.7%), median review time has risen by 441%, and 31.3% more pull requests are merged without any review. This affects even teams that DORA classified as “high-performing” — teams that distinguished themselves before AI adoption by fast delivery and low change failure rates.\u003c/p\u003e\n\n\u003cp\u003eReinertsen provides the systemic explanation: more output increases utilization at critical bottlenecks. The critical bottleneck today is review by senior engineers. What suffers is review quality. The result: more bugs reach production. High utilization at bottleneck resources is the most expensive state in a product development system.\u003c/p\u003e\n\n\u003cp\u003eMore thorough discovery reduces input into the system at the right point, before unvalidated solutions fill the pipeline and back up at critical gates. Building less stuff also means less review load for senior engineers, a lower change failure rate, and a more stable system.\u003c/p\u003e\n\n\u003ch3 id=\"torres-product-trio\"\u003eTorres' Product Trio\u003c/h3\u003e\n\n\u003cp\u003eTeresa Torres' Product Trio consists of \u003cstrong\u003eproduct manager, UX designer, and engineer\u003c/strong\u003e. The idea was that all three would conduct discovery together: each role contributes its perspective, none is merely a recipient of others' decisions. The trio makes discovery decisions as a small, capable unit. The rest of the team is solely responsible for delivery.\u003c/p\u003e\n\n\u003cp\u003eThe trio doesn’t explore arbitrarily. The desired business outcome is typically set externally. This can be a metric, a strategic goal, or a customer problem the organization wants to solve. The trio searches within this frame for opportunities and decides which solutions to pursue. The autonomy lies in the how and what, not the why. All three trio members participate in regular user interviews, opportunity mapping, assumption testing, and solution ideation.\u003c/p\u003e\n\n\u003cp\u003eIn practice, the approach sometimes failed on a straightforward capacity problem: the engineer was often also heavily involved in delivery and had less time for discovery than the product manager and designer. The internal separation Torres wanted to overcome reproduced itself under the pressure of day-to-day work. Delivery required a larger team, and those who deliver have no time to discover.\u003c/p\u003e\n\n\u003cp\u003eAgentic development removes this constraint and thereby makes possible what Torres originally had in mind.\u003c/p\u003e\n\n\u003ch3 id=\"the-agentic-trio\"\u003eThe Agentic Trio\u003c/h3\u003e\n\n\u003cp\u003eIn the Agentic Trio, a small trio takes on not only discovery but also delivery. The trio thus operates as a fully autonomous unit. The roles are the same as in Torres: product manager, UX designer, engineer. What changes is the scope of their shared responsibility. All three do discovery and delivery, without support from team members not involved in the discovery work.\u003c/p\u003e\n\n\u003cp\u003eThis changes not only how teams work, but how organizations can be structured. Smaller, more autonomous units with genuine end-to-end responsibility become possible without sacrificing discovery quality. The capacity freed up by faster delivery flows not into more output, but into deeper problem recognition.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eOne distinction is important here:\u003c/strong\u003e the Agentic Trio and stream-aligned teams from Team Topologies are orthogonal. Team Topologies answers how teams are structured. The Agentic Trio answers how discovery and delivery are organized within a team. In the classic stream-aligned team — what Torres calls the product team — part of the team could form the trio while others focus exclusively on delivery. The Agentic Trio collapses exactly this internal separation. A stream-aligned team is not a necessary condition for agentic trios, but is in any case desirable: the more decision-making autonomy the trio has, the more fully it can leverage the shortened cycles of agentic development.\u003c/p\u003e\n\n\u003ch3 id=\"convergence-of-roles\"\u003eConvergence of roles\u003c/h3\u003e\n\n\u003cp\u003eThe Agentic Trio requires a T-shaped profile from all three roles: each role retains its area of depth, but the shared base grows substantially.\u003c/p\u003e\n\n\u003cp\u003eOn the delivery side, all three roles use coding agents to implement solutions. Product managers and designers become active authors and independent producers. Prompting and working with coding agents become baseline competencies for everyone involved, not just the engineer, who remains the deep expert she is.\u003c/p\u003e\n\n\u003cp\u003eOn the discovery side, all three participate equally in user interviews, opportunity mapping, solution ideation, and assumption testing. In the ideal Product Trio, this was always the goal. In the Agentic Trio, it is possible for the first time without compromise, because the engineer is no longer primarily bound by delivery.\u003c/p\u003e\n\n\u003cp\u003eThe convergence is not a leveling. Product managers and designers need to understand what generated code can do, where architecture decisions are product decisions, where technical debt constrains strategy. The engineer needs to take the user perspective, grasp problems before the solution level, and apply discovery methods. Each role understands more about the others' specializations, but the areas of depth remain different.\u003c/p\u003e\n\n\u003ch3 id=\"the-developer-as-an-agent-harness-engineer\"\u003eThe developer as an Agent Harness Engineer\u003c/h3\u003e\n\n\u003cp\u003eOn the delivery side, the engineer’s role in the Agentic Trio is not primarily that of a producer, but of a guardian of the ability to deliver sustainably, not just now, but in the future, and thus to generate meaningful outcomes. Her core task: building and maintaining the \u003cstrong\u003eagent harness\u003c/strong\u003e, meaning the infrastructure in which all three can build safely and productively.\u003c/p\u003e\n\n\u003cp\u003eThe harness encompasses machine-enforced quality gates (fitness functions, linter rules, architecture checks), specialized skills and subagents for recurring tasks, a CI/CD pipeline that lets no generated code reach production without a gate, as well as observability, deployment, and monitoring. The engineer is responsible not only for the structures, but also for the infrastructure that prevents agentically generated code from undermining those structures.\u003c/p\u003e\n\n\u003cp\u003eThis has an important sequential implication: the harness must be in place before product managers and designers start building. Otherwise, exactly the kind of debt accumulates that agentic development is already producing in poorly prepared organizations. Agent harness engineering is therefore one of the central key competencies for senior engineers in the agentic era. Developers who define guardrails, curate subagent repertoires, and build pipelines that reliably secure agentic output are not replaceable by faster code generation. They are the ones who make fast code generation safe in the first place.\u003c/p\u003e\n\n\u003cp\u003eBeyond the harness, the engineer carries a responsibility that cannot be automated: she maintains the mental model of the system. Peter Naur describes in “Programming as Theory Building” that the actual core of software development is not the code, but the theory behind it: the understanding of why the system is built the way it is, which decisions justify which structures, what a change in one place means for others. This theory lives in the minds of developers, not in the code itself.\u003c/p\u003e\n\n\u003cp\u003eIn the Agentic Trio, the engineer is the only one who can and must hold this theory. Three people and several agents are writing into the same system simultaneously. \u003cstrong\u003eCode review\u003c/strong\u003e by the engineer is therefore not quality control in a bureaucratic sense, but the mechanism through which the theory stays current. She sees what was generated, evaluates whether it fits the existing theory, and corrects where it deviates. Without this, code quality collapses and the theory of the system is lost.\u003c/p\u003e\n\n\u003ch3 id=\"discovery-as-obligation-and-promise\"\u003eDiscovery as obligation and promise\u003c/h3\u003e\n\n\u003cp\u003eDiscovery participation is not an optional extension of the development role in the Agentic Trio. It is constitutive. This lands differently for two very different groups.\u003c/p\u003e\n\n\u003cp\u003eFor engineers who always wanted to do discovery but never had the time, the Agentic Trio delivers on an old promise. The interest in user problems, in the why behind the what, was always there for these engineers. Delivery simply consumed most of the time.\u003c/p\u003e\n\n\u003cp\u003eFor engineers who have not previously been in a Product Trio and did not want to do discovery work, this is a new challenge. The role changes fundamentally.\u003c/p\u003e\n\n\u003cp\u003eA final note on juniors: if we want to build a new generation of seniors, there must be an explicit place for them in the Agentic Trio as well. Juniors in the Agentic Trio rarely work alone. They pair — with the senior engineer on harness construction, with the product manager on discovery, with the designer on solution ideation, and with rotating partners on delivery. Rotation through all roles replaces the classical learning path of writing code. Juniors in the Agentic Trio thus resemble trainees moving through different perspectives and roles. The result is a broader education than before. And the decision about specialization comes later and more deliberately: those who have experienced all areas know where their T-shaped depth should lie.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eThe Agentic Trio is not a blueprint for every organization and every context.\u003c/strong\u003e It is an answer to the question of how to organize product development so that faster delivery actually leads to better outcomes, not just more utilization, more review load, and more bugs in production. The answer the Agentic Trio offers: by deliberately investing the freed-up capacity in discovery, in a small unit that owns both. Not sequentially, not in separate structures, but continuously and together.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/50ngtftjpyb9fwi6bjjk22m1vj5n?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/05/rag-und-fine-tuning-gemeinsam-stark/","url":"https://www.innoq.com/en/blog/2026/05/rag-und-fine-tuning-gemeinsam-stark/","date_published":"2026-05-21T00:00:00.000+00:00","date_modified":"2026-05-18T18:26:34.883+02:00","author":[{"name":"Marco Steinke","url":"https://www.innoq.com/en/staff/marco-steinke/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003ch3 id=\"quick-overview-rag-and-fine-tuning\"\u003eQuick overview: RAG and fine-tuning\u003c/h3\u003e\n\n\u003cp\u003eRAG supplies an LLM with external knowledge at runtime – for example, wiki pages, manuals, or product documentation. The system searches for relevant documents and passes them as context. This reduces hallucinations and makes answers traceable, because sources are cited and can be looked up.\u003c/p\u003e\n\n\u003cp\u003eFine-tuning shapes a model’s behavior through examples. It doesn’t necessarily add new knowledge, but it can train the model to use domain-specific language, produce answers in a particular format, and follow instructions more reliably – for example, in data extraction or document classification.\u003c/p\u003e\n\n\u003cp\u003eOn their own, each approach optimizes toward a different goal. Together, their effects compound – assuming the use case and available data are a good fit.\u003c/p\u003e\n\n\u003cp\u003eIn this article, we use “fine-tuning” to mean supervised fine-tuning, since other variants exist. “Supervised” here means the model is given one or more expected answers for each prompt during training.\u003c/p\u003e\n\n\u003ch3 id=\"the-use-case-knowledge-queries-at-an-insurance-company\"\u003eThe use case: knowledge queries at an insurance company\u003c/h3\u003e\n\n\u003cp\u003eAn insurance company uses an LLM to make its internal wiki more accessible to employees. Typical questions include: Which policy applies to plan X? Where can I find information about claim class Y? Summarize the key points from guideline Z.\u003c/p\u003e\n\n\u003cp\u003eHere’s how it works technically: the wiki is indexed and stored in a data source. That data source can be anything – a vector database, an MCP server, or a relational database. A RAG pipeline finds the right pages for each query and passes them as context to the LLM, which generates answers or summaries from that context.\u003c/p\u003e\n\n\u003ch3 id=\"the-problem\"\u003eThe problem\u003c/h3\u003e\n\n\u003cp\u003eOn paper, the system works. The right pages are in the context. But in practice, the LLM overlooks relevant documents, overweights irrelevant passages, and answers end up incomplete or drawn from the wrong sections.\u003c/p\u003e\n\n\u003cp\u003eThe issue isn’t retrieval – it’s how the model uses context during generation. Put differently: the model receives the right documents but doesn’t reliably know which one matters for the question at hand. The information isn’t missing – what’s missing is the ability to prioritize, weight, and correctly cite within the given context.\u003c/p\u003e\n\n\u003ch3 id=\"from-rag-outputs-to-fine-tuning-data\"\u003eFrom RAG outputs to fine-tuning data\u003c/h3\u003e\n\n\u003cp\u003eIn production, every query the system answers already contains the building blocks for fine–tuning. For each query, the system logs the prompt, the retrieved documents along with their source references, the final model response, and optional user feedback – whether the answer was accepted, rejected, corrected, or annotated with free text.\u003c/p\u003e\n\n\u003cp\u003eThese elements are assembled into fine-tuning examples that teach the model \u003cem\u003ehow\u003c/em\u003e to use context – not \u003cem\u003ewhat\u003c/em\u003e the correct fact is.\u003c/p\u003e\n\n\u003ch3 id=\"the-key-feedback-from-production\"\u003eThe key: feedback from production\u003c/h3\u003e\n\n\u003cp\u003eThe insurance company systematically collects feedback. Users rate answers as helpful or unhelpful, flag incorrect or incomplete results, and provide free-text comments to improve responses. This creates a dataset of prompts, context documents, and preferred versus rejected answers.\u003c/p\u003e\n\n\u003cp\u003eThis feedback loop is essential for stabilizing the system over time. Crucially, users need to be actively encouraged to give feedback. Clear messaging about why feedback matters, low-friction ways to provide it directly in the interface, and structured feedback formats all help ensure the loop actually delivers value.\u003c/p\u003e\n\n\u003cp\u003eIf it’s unclear why and in what form feedback helps – or if providing it feels like too much effort – users will quickly lose motivation, and the feedback loop will break down.\u003c/p\u003e\n\n\u003ch3 id=\"the-fine-tuning-dataset-from-the-field\"\u003eThe fine-tuning dataset (from the field)\u003c/h3\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"A suitable fine-tuning dataset: the prompt, multiple loaded documents, and an expected answer — with explicit attribution of which document was actually used.\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/1zqga20yqbgx2w4u85ohgblws6op?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eA suitable fine-tuning dataset: the prompt, multiple loaded documents, and an expected answer — with explicit attribution of which document was actually used.\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003eHere’s what matters about this data point: multiple documents are present in the context, but only one is referenced in the answer. The model learns to select, prioritize, and cite correctly – not to memorize documentation, but to distinguish relevant from irrelevant context under RAG conditions. This directly addresses the observed failure mode: relevant documents are retrieved but used incorrectly during generation.\u003c/p\u003e\n\n\u003ch3 id=\"fine-tuning-with-direct-preference-optimization\"\u003eFine-tuning with Direct Preference Optimization\u003c/h3\u003e\n\n\u003cp\u003eInstead of classic supervised fine-tuning, the insurance company uses Direct Preference Optimization (DPO). DPO is a fine-tuning variant where the model doesn’t just learn what a good answer looks like – it simultaneously learns what makes an answer poor. For each prompt, the model receives one preferred and several rejected answers during training. This doesn’t just reinforce good answers – it also makes poor ones measurably less likely.\u003c/p\u003e\n\n\u003cp\u003eFor this use case, that’s a decisive advantage: the focus is on decisions, not facts. The model learns which documents are relevant for which prompts, how strongly to weight them, and when to ignore information entirely. It doesn’t memorize the wiki – it learns to use the provided context correctly.\u003c/p\u003e\n\n\u003ch3 id=\"why-rag-and-fine-tuning-work-well-together\"\u003eWhy RAG and fine-tuning work well together\u003c/h3\u003e\n\n\u003cp\u003eRAG and fine-tuning solve different problems: RAG delivers context and traceable source citations, because it draws directly on current data that we provide. Fine-tuning with DPO controls how the model handles that context, stabilizing its behavior based on real user preferences. Together, they complement each other – making the provided company knowledge more reliably reflected in answers.\u003c/p\u003e\n\n\u003ch3 id=\"conclusion\"\u003eConclusion\u003c/h3\u003e\n\n\u003cp\u003eIf you plan to run LLMs long-term in a similar use case, think of RAG and fine-tuning not as competing alternatives but as approaches that combine naturally. And don’t treat fine-tuning as a one-off training run. It’s a continuous process, tightly coupled to real usage. The fine-tuning dataset needs regular updates, and the human feedback loop has to be maintained. That’s how the system grows more reliable over time and makes better decisions, even as new documents are added.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/xrd5tjdgudra9yakooy24lbs8wp7?_a=BACMTiAE"},{"id":"https://www.innoq.com/de/blog/2026/05/ein-prozent-methode-fuer-code-und-architektur/","url":"https://www.innoq.com/de/blog/2026/05/ein-prozent-methode-fuer-code-und-architektur/","date_published":"2026-05-13T00:00:00.000+00:00","date_modified":"2026-05-13T14:43:25.364+02:00","author":[{"name":"Michael Plöd","url":"https://www.innoq.com/en/staff/michael-ploed/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eIn Kundengesprächen höre ich quer durch alle Branchen fast jede Woche dieselben drei Sätze:\u003c/p\u003e\n\n\u003cp\u003e\u003cem\u003e„Wir wollen ja modernisieren, aber das Tagesgeschäft kommt immer dazwischen.\"\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003e\u003cem\u003e„Das System ist so kaputt, da hilft nur noch ein Big Bang.\"\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003e\u003cem\u003e„Die technischen Schulden wachsen schneller, als wir sie abbauen können.\"\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eDas sieht nach drei verschiedenen Problemen aus. Ist es aber nicht. Es sind drei Symptome derselben Überzeugung: dass Modernisierung etwas ist, was man \u003cem\u003egetrennt\u003c/em\u003e von der eigentlichen Software-Entwicklung macht. Ein Projekt. Eine Budgetposition. Ein Programm mit Lenkungsausschuss. Etwas, das ein ruhiges Quartal braucht, ein strategisches Alignment und ein Foliendeck, das jemand zwei Ebenen weiter oben abnickt.\u003c/p\u003e\n\n\u003cp\u003eGenau diese Überzeugung ist das Problem. Und sie ist auch der Grund, warum sich nichts in eine gesunde Richtung bewegt.\u003c/p\u003e\n\n\u003ch3 id=\"das-modernisierungs-paradox\"\u003eDas Modernisierungs-Paradox\u003c/h3\u003e\n\n\u003cp\u003eDas Gespräch endet meistens auch immer gleich: \u003cem\u003e„Wir bräuchten dringend ein großes Aufräumen, aber gerade jetzt geht das nicht.\"\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eGerade jetzt steht ein Termin an. Gerade jetzt hatten wir einen Ausfall. Gerade jetzt liegt ein Feature auf dem Tisch, das das Business bereits zugesagt hat, ohne vorher jemanden zu fragen, ob die Codebasis das überhaupt trägt. Gerade jetzt ist der falsche Zeitpunkt, nächstes Quartal wird der richtige sein. Und nächstes Quartal kommt wieder mit seiner eigenen Variante von „gerade jetzt\".\u003c/p\u003e\n\n\u003cp\u003eAus meiner Erfahrung heraus dauert dieses „gerade jetzt\" manchmal Jahre.\u003c/p\u003e\n\n\u003cp\u003eWährend dieser Zeit wird das System immer schlechter. Module, die früher leicht zu verstehen waren, werden komplexer. Grenzen, die einmal klar gezogen waren, verschwimmen. TODO-Kommentare häufen sich. Die Dokumentation entfernt sich immer weiter von dem, was der Code tatsächlich tut. Tests werden im Build deaktiviert, weil niemand Lust hat, sich um die wackligen zu kümmern. Jede einzelne dieser Entscheidungen ist für sich genommen nachvollziehbar. In Summe ergeben sie eine Abwärtsspirale, die sich selbst verstärkt.\u003c/p\u003e\n\n\u003cp\u003eDas ist das Paradox: Je länger wir auf den richtigen Moment warten, desto größer muss die Modernisierung werden. Je größer sie werden muss, desto riskanter wird jede Änderung. Und je riskanter jede Änderung wird, desto schwieriger wird es, irgendetwas mit Vertrauen zu testen. Damit wird es immer unwahrscheinlicher, dass wir einen Moment finden, der groß genug, sicher genug und testbar genug ist. Die Projekt-Denkweise legt die Latte so hoch, dass die Latte selbst zum Hindernis wird.\u003c/p\u003e\n\n\u003cp\u003eDas System, das eine sechsmonatige Aufräumaktion bräuchte, wird diese sechs Monate nicht bekommen, sei es aus politischen Gründen, wegen Budgetdruck oder weil Features Vorrang haben. Ich höre den Satz schon: \u003cem\u003e„Wir bezahlen für Features, nicht für Refactorings.\"\u003c/em\u003e Stattdessen bekommt das System eine Reihe hektischer Drei-Tage-Pflaster und mit jedem Pflaster wird die eigentliche Aufräumaktion ein Stück schwieriger.\u003c/p\u003e\n\n\u003ch3 id=\"modernisierung-im-rhythmus-nicht-im-big-bang\"\u003eModernisierung im Rhythmus, nicht im Big Bang\u003c/h3\u003e\n\n\u003cp\u003eIch möchte einen anderen Blick auf Modernisierung vorschlagen.\u003c/p\u003e\n\n\u003cp\u003eAnforderungen ändern sich ständig. Teams ändern sich ständig, weil Menschen kommen, gehen, rotieren und dazulernen. Auch die Systeme, die wir bauen, ändern sich, sogar wenn wir sie gar nicht anfassen, weil sich die Welt um sie herum bewegt. Jedes Dependency-Update, jede neue Compliance-Regel, jede verschobene Business-Priorität setzt die Architektur ein bisschen unter Druck.\u003c/p\u003e\n\n\u003cp\u003eWenn sich also alles um das System herum permanent ändert, kann das Gesundhalten dieses Systems kein einmaliges Ereignis sein. Eine Modernisierung, die alle vier Jahre gegen eine Welt antritt, die sich alle vier Wochen ändert, wird immer hinterherhinken.\u003c/p\u003e\n\n\u003cp\u003eKontinuierliche Modernisierung ist also kein Projekt, das wir einmal durchziehen. Sie ist eine kleine, stetige Praxis, die das Team jeden Tag in winzigen Schritten ausführt, ungefähr in dem Tempo, in dem Veränderung auf das Team einwirkt.\u003c/p\u003e\n\n\u003cp\u003eDas Reframing klingt klein, ist es aber nicht. Es verschiebt Modernisierung aus der Kategorie der Dinge, die wir \u003cem\u003eplanen\u003c/em\u003e, in die Kategorie der Dinge, die wir \u003cem\u003etun\u003c/em\u003e. Modernisierung ist dann kein Punkt mehr auf einer Roadmap, sondern wird Teil davon, wie das Team arbeitet. Und wenn sie Teil davon ist, wie das Team arbeitet, konkurriert sie nicht mehr mit dem Tagesgeschäft, weil sie \u003cem\u003eTeil\u003c/em\u003e des Tagesgeschäfts ist. Kein Spektakel, kein Kickoff, keine große Enthüllung. Einfach nur die Arbeit, ein Stück besser gemacht als gestern. Architektur entsteht ohnehin in dem, was das Team jeden Tag tut, nicht in dem, was es einmal plant. Der Rest dieses Beitrags handelt davon, was passiert, wenn man sich diesem Gedanken bewusst stellt.\u003c/p\u003e\n\n\u003ch3 id=\"was-laufen-mit-code-zu-tun-hat\"\u003eWas Laufen mit Code zu tun hat\u003c/h3\u003e\n\n\u003cp\u003eStell Dir vor, Du beschließt, mit dem Laufen anzufangen. Du hast seit Jahren keinen Sport mehr gemacht. Am ersten Tag läufst Du nicht wirklich, weil das wehtun würde. Du gehst zehn Minuten zügig spazieren. Das ist unspektakulär. Du legst kaum eine Strecke zurück und nichts an diesen zehn Minuten macht Dich zu einer Athletin oder einem Athleten.\u003c/p\u003e\n\n\u003cp\u003eAm nächsten Tag machst Du dasselbe. Und am Tag danach. Nach ein, zwei Wochen fällt Dir der zügige Spaziergang leichter und Du beginnst, hier und da eine Minute langsames Joggen einzustreuen. Nach einem Monat schaffst Du fünf Minuten am Stück. Nach drei Monaten läufst Du eine halbe Stunde durch. Nach sechs Monaten ist ein 10 km Lauf am Wochenende ein realistisches Ziel. Nach einem Jahr peilst Du vielleicht einen Halbmarathon an.\u003c/p\u003e\n\n\u003cp\u003eKeine einzelne dieser Trainingseinheiten war bemerkenswert. Bemerkenswert war der Zinseszinseffekt.\u003c/p\u003e\n\n\u003cp\u003eDas ist die Grundidee der 1%-Methode, die James Clear in seinem Buch \u003cem\u003eAtomic Habits\u003c/em\u003e populär gemacht hat. Das Argument ist einfach: Kleine Änderungen, die für sich genommen unbedeutend wirken, führen über die Zeit zu erstaunlichen Ergebnissen. Clear schreibt darüber vor allem im Kontext persönlicher Gewohnheiten. Was ich in diesem Beitrag versuchen möchte, ist diese Idee dorthin zu übertragen, wo Clear nicht hingegangen ist: in die Architektur von Software-Systemen, die von Teams gebaut werden.\u003c/p\u003e\n\n\u003cp\u003eStell Dir vor, jedes Mitglied eines Teams investiert ungefähr zehn Minuten pro Tag in etwas Kleines. Ein Naming verbessern, sodass des dem entspricht, wie das Business die Sache wirklich nennt. Eine Methode extrahieren. Toten Code löschen. Einen fehlenden Test ergänzen. Eine Dependency aktualisieren. Eine Warnung beheben. Einen Absatz Dokumentation verbessern. Ein veraltetes Konfigurations-Flag entfernen.\u003c/p\u003e\n\n\u003cp\u003eNichts davon ist für sich genommen beeindruckend. Nichts davon verdient ein Jira-Ticket. Nichts davon würde in einem Priorisierungsmeeting gegen einen Feature-Wunsch bestehen. Und genau das ist der Punkt. Diese Aktivitäten sind zu klein, um mit dem Tagesgeschäft zu konkurrieren, und genau deshalb können sie tatsächlich neben dem Tagesgeschäft passieren. Sie passieren leise, unter dem Radar, in den Randzeiten des Tages.\u003c/p\u003e\n\n\u003cp\u003e(Ein kleiner Hinweis am Rande: Wenn Du in einem Umfeld arbeitest, in dem jemand Deine Arbeit in Zehn-Minuten-Einheiten misst, wird Dir dieser Beitrag nicht weiterhelfen. Mein Rat in diesem Fall ist ein anderer: lauf, wenn möglich.)\u003c/p\u003e\n\n\u003cp\u003eKleine Verbesserungen sind leicht zu tun. Sie sind aber auch leicht \u003cem\u003enicht\u003c/em\u003e zu tun. Und sie sind unmöglich zu ignorieren, sobald sie sich aufaddieren.\u003c/p\u003e\n\n\u003ch3 id=\"warum-gute-vorsatze-eine-falle-sind\"\u003eWarum gute Vorsätze eine Falle sind\u003c/h3\u003e\n\n\u003cp\u003eWenn Du mit Entwickler:innen über das Thema sprichst, sind sich fast alle einig. Wir \u003cem\u003esollten\u003c/em\u003e den Code jeden Tag ein bisschen verbessern. Wir \u003cem\u003esollten\u003c/em\u003e kleine Refactorings machen, wenn wir an einer Stelle sowieso unterwegs sind. Wir \u003cem\u003esollten\u003c/em\u003e nicht zulassen, dass kleine Probleme so lange wachsen, bis sie zu großen werden. Niemand widerspricht dem Prinzip. Ich habe noch nie eine Entwicklerin oder einen Entwickler erlebt, die oder der gegen die Idee kontinuierlicher kleiner Verbesserungen argumentiert hätte.\u003c/p\u003e\n\n\u003cp\u003eUnd trotzdem tun es die meisten Teams nicht.\u003c/p\u003e\n\n\u003cp\u003eDer Grund ist nicht, dass Entwickler:innen faul oder undiszipliniert wären. Der Grund ist, dass Vorsätze unter Druck nicht halten. Schau Dir eine ganz normale Woche in einem Delivery-Team an. Da gibt es Bereitschaftsdienst. Da gibt es kritische Produktions-Bugs. Da gibt es ein Sprint-Ziel, das schon Dienstag anfängt zu rutschen. Da gibt es die \u003cem\u003e„Bitte ASAP noch dieses eine Feature\"\u003c/em\u003e-Anfrage von einer Stakeholderin oder einem Stakeholder, die oder der einer Kundin oder einem Kunden etwas zugesagt hat, ohne vorher zu fragen. Eine Kollegin ist im Urlaub. Ein Kollege ist krank. Eine andere Kollegin hat Dir gerade gesagt, dass sie in sechs Wochen das Team verlässt. Der Release-Termin verschiebt sich nicht. Die Roadmap wird nicht kleiner. Und der Termin, der vor zwei Monaten noch entspannt aussah, fühlt sich jetzt knapp an.\u003c/p\u003e\n\n\u003cp\u003eIn so einer Woche findet die kleine Verbesserung nicht statt. Nicht weil die Entwicklerin oder der Entwickler sie vergessen hätte und auch nicht, weil es sie oder ihn nicht interessieren würde. Sie findet nicht statt, weil das fünfminütige Refactoring in dem Moment zwischen Produktions-Bug und rutschendem Sprint das Einfachste ist, was man von der Liste streichen kann. Es gibt immer etwas Dringenderes.\u003c/p\u003e\n\n\u003cp\u003eDas ist die Falle. Wenn das Verbessern von Code oder Architektur von Motivation und guten Vorsätzen abhängt, sehen unsere Codebasen am Ende aus wie Fitnessstudios im März. Voll mit guten Absichten im Januar, halb leer im Februar, im Frühjahr mehr oder weniger verlassen. Die Absicht war echt. Die Struktur, sie zu tragen, war es nicht.\u003c/p\u003e\n\n\u003cp\u003eDie Lösung ist nicht mehr Motivation. Die Lösung ist eine Struktur, die Motivation überflüssig macht.\u003c/p\u003e\n\n\u003ch3 id=\"du-hast-bereits-gewohnheiten\"\u003eDu hast bereits Gewohnheiten\u003c/h3\u003e\n\n\u003cp\u003eWenn ich Vorträge zu diesem Thema halte, stelle ich dem Publikum oft eine Frage: \u003cem\u003e„Was ist das Erste, was Du an Deinem Rechner machst, wenn Du morgens am Schreibtisch sitzt?\"\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eDie Antworten sind fast immer dieselben. E-Mails checken. Slack öffnen. Teams öffnen. In den Kalender schauen.\u003c/p\u003e\n\n\u003cp\u003eDann sage ich: \u003cem\u003e„Glückwunsch, Du hast eine Gewohnheit.\"\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eDu bist ins Büro gekommen. Du hast Dir einen Kaffee geholt. Du hast Dich an Deinen Schreibtisch gesetzt. Du hast Deinen Laptop aufgeklappt. Du hast Slack geöffnet. Du hast nicht bewusst entschieden, Slack zu öffnen. Du hast Dich nicht selbst dazu motiviert. Du hast Dir nicht am Vorabend „Slack öffnen\" auf die To-do-Liste geschrieben. Du hast es einfach getan, weil die Kette kleiner Aktionen davor Dich dort hingezogen hat. Der Auslöser war der aufgeklappte Laptop. Die Reaktion war der Klick auf Slack. Die Belohnung war der kleine Hit von \u003cem\u003e„jetzt weiß ich, was los ist.\"\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eGewohnheiten brauchen keine Vorsätze. Genau das ist ihr Sinn. Sie laufen auf einer Schiene, die das Gehirn schon angelegt hat, sodass das bewusste Denken keine Energie mehr darauf verwenden muss.\u003c/p\u003e\n\n\u003cp\u003eHier mein Vorschlag, den ich an dieser Stelle im Vortrag immer gebe: Fang bei Dir selbst an. Morgen früh, bevor Du Slack öffnest, investierst Du zehn Minuten in eine kleine Code-Verbesserung. Benenne eine missverständliche Variable um. Lösche eine tote Methode. Schreib den Test, den Du schon ewig schreiben wolltest. Dann öffnest Du Slack. Leg Dir eine kleine Habit-Card auf den Schreibtisch und hak sie ab, wenn Du fertig bist. Mach das zwei Wochen lang, allein, ohne es jemandem zu sagen. Du brauchst dafür keine Erlaubnis. Du brauchst kein Meeting. Du brauchst kein Ticket. Du musst es einfach nur vor Slack in Deine Morgenroutine packen.\u003c/p\u003e\n\n\u003cp\u003eZwei Dinge werden passieren. Erstens wirst Du spüren, wie sich die kleinen täglichen Verbesserungen in den Teilen der Codebasis anhäufen, an denen Du arbeitest. Zweitens, und das ist der spannendere Teil, wird jemand im Team es bemerken. Die Habit-Card auf Deinem Schreibtisch. Den ungewöhnlich sauberen Diff in Deinem Pull Request. Die Dependency, die Du letzte Woche stillschweigend entfernt hast. Menschen sind neugierig. Sie werden fragen. Und sobald sie fragen, hast Du ein Gespräch und keinen Vertriebspitch.\u003c/p\u003e\n\n\u003cp\u003eSo skaliert das Ganze. Nicht über ein Kickoff-Meeting. Nicht über eine Prozessänderung. Eine Person fängt an, das Team wird neugierig und die Praxis verbreitet sich seitwärts. Damit sie sich aber wirklich festsetzt, muss aus der individuellen Gewohnheit eine \u003cem\u003eTeam\u003c/em\u003e-Gewohnheit werden, nicht nur eine Sammlung einzelner. Und das ist eine andere Design-Aufgabe.\u003c/p\u003e\n\n\u003ch3 id=\"von-personlichen-gewohnheiten-zu-team-gewohnheiten\"\u003eVon persönlichen Gewohnheiten zu Team-Gewohnheiten\u003c/h3\u003e\n\n\u003cp\u003eEine persönliche Gewohnheit muss nur Deine eigenen schlechten Tage überstehen. Eine Team-Gewohnheit muss die schlechten Tage aller Beteiligten gleichzeitig überstehen.\u003c/p\u003e\n\n\u003cp\u003eDas klingt wie ein kleiner Unterschied, ist aber keiner. Die persönliche Gewohnheit scheitert, wenn Du müde, abgelenkt oder im Stress bist. Die Team-Gewohnheit scheitert, wenn \u003cem\u003eirgendjemand\u003c/em\u003e im Team müde, abgelenkt oder genug im Stress ist, um sie auszulassen. Dann sieht es die nächste Person und das Ritual verliert an Gewicht. Innerhalb von zwei Wochen ist die Gewohnheit etwas, was das Team früher mal gemacht hat. Niemand hat entschieden aufzuhören. Es hat einfach aufgehört.\u003c/p\u003e\n\n\u003cp\u003eManchmal entscheidet auch jemand aktiv, dass das Team aufhört. Es gibt immer eine Senior-Person, die sagt \u003cem\u003e„heute nicht, wir haben einen Termin\"\u003c/em\u003e, und das Team folgt, weil die Senior-Person Autorität hat. Aus einem ausgelassenen Tag werden zwei, dann eine Woche, und die Gewohnheit ist weg. Mir ist auch noch etwas Subtileres aufgefallen: Die Menschen, die am stärksten gegen diese kleinen Verbesserungen anschreiben, sind manchmal genau die, die ihren Status daraus ziehen, alle dunklen Ecken der Codebasis zu kennen. Das Chaos macht sie unentbehrlich. Eine sauberere Codebasis, in der sich alle zurechtfinden, bedroht diese Position leise. Das ist selten bewusst, aber es ist real und es lohnt sich, es auszusprechen.\u003c/p\u003e\n\n\u003cp\u003eEine Team-Gewohnheit muss also anders gestaltet werden als eine persönliche. Sie muss in etwas eingebettet sein, was das Team ohnehin gemeinsam tut, sodass keine Einzelperson sich daran erinnern muss. Sie muss die Person überstehen, die gerade Bereitschaft hat. Sie muss den Workshop am Mittwoch überstehen. Sie muss die neue Kollegin überstehen, die noch nicht von ihr gehört hat. Die Struktur trägt die Gewohnheit, nicht die Willenskraft der Menschen darin.\u003c/p\u003e\n\n\u003cp\u003eClear gibt uns in \u003cem\u003eAtomic Habits\u003c/em\u003e vier hilfreiche Design-Regeln für Gewohnheiten: Mach den Auslöser offensichtlich, mach das Verlangen attraktiv, mach die Reaktion einfach und mach die Belohnung befriedigend. Diese Regeln lassen sich auf Team-Gewohnheiten übertragen, müssen aber auf \u003cem\u003eTeam\u003c/em\u003e-Artefakte gerichtet werden, nicht auf persönliche. Der Auslöser hängt nicht an Deinem Badezimmerspiegel. Er steht in Eurem Pull-Request-Template. Die Belohnung ist nicht ein Gefühl am Ende des Tages. Sie ist eine Metrik auf einem Dashboard, das alle im Team sehen.\u003c/p\u003e\n\n\u003cp\u003eEin persönliches Beispiel: Ich habe mein Peloton-Bike in unser Wohnzimmer gestellt, nicht in ein Gästezimmer und auch nicht in den Keller. Direkt mitten dort, wo ich meine Abende verbringe. Ich bin dadurch nicht motivierter geworden zu trainieren. Ich habe nur den Auslöser unmöglich zu ignorieren gemacht. Jedes Mal, wenn ich daran vorbeigehe, fragt es mich leise, ob heute der Tag ist. Das funktioniert erstaunlich gut. Dasselbe Prinzip lässt sich auf Teams anwenden.\u003c/p\u003e\n\n\u003cp\u003eHier ist, wie die vier Regeln aussehen, sobald man sie auf ein Team richtet:\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMach es offensichtlich.\u003c/strong\u003e Der Auslöser lebt in den Artefakten, die das Team ohnehin täglich anfasst. Eine Zeile im Pull-Request-Template, die fragt: „Hast Du diesen Code ein bisschen besser hinterlassen, als Du ihn gefunden hast?\" Ein sichtbares Improvement-Backlog am Team-Board. Eine kurze Slack-Erinnerung, dass die Doku-Minute nach dem Standup beginnt. Ein Architektur-Dashboard auf einem Bildschirm, an dem alle vorbeigehen. Der Auslöser muss unmöglich zu übersehen sein.\u003c/p\u003e\n\n\u003cp\u003eUnd für die Tage, an denen jemand denkt \u003cem\u003e„ich will meine zehn Minuten machen, aber mir fällt nichts ein, was ich anfassen könnte\"\u003c/em\u003e: Hängt ein Flipchart oder ein Miro-Board mit kleinen Verbesserungsideen auf, zu dem jede Person im Team etwas beitragen kann. Diese Klasse umbenennen. Jene unbenutzte Konfigurations-Datei löschen. Einen Test für diesen Edge Case ergänzen. Manche Teams, mit denen ich arbeite, vergeben dort anonymisierte Namen wie „Batgirl\", „Princess Peach\", „Super Mario\" oder „Mr. Burns\". Das hält die Sache spielerisch und verhindert, dass das Board zu einem Tracking-Werkzeug verkommt. Das Board ist immer voll, weil jede Entwicklerin und jeder Entwickler im Alltag Dinge entdeckt, für die gerade keine Zeit ist. Gib diesen Beobachtungen einen Ort, dann werden sie zum Treibstoff für die zehn Minuten von jemand anderem am nächsten Tag.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMach es einfach.\u003c/strong\u003e Zehn Minuten pro Person und Tag. Keine Freigaben. Keine Tickets. Kein Meeting, um die Verbesserung einzuplanen. Keine Schätzung in Story Points. Kein Review-Komitee. Die Autonomie ist Teil des Designs. In dem Moment, in dem ein Team um Erlaubnis fragen muss, um den Code zu verbessern, ist die Gewohnheit tot, weil der Genehmigungsschritt länger dauert als die Verbesserung selbst.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMach es attraktiv.\u003c/strong\u003e Feiert die kleinen Erfolge offen. Erwähnt sie im Standup. Visualisiert die Streak am Team-Board. Führt eine Team-Habit-Streak ein: Wir haben unsere tägliche Verbesserung an 14 aufeinanderfolgenden Arbeitstagen geschafft. Hängt es ans Board. Macht es sichtbar. Eine Streak erzeugt eine sanfte soziale Dynamik, in der niemand die Person sein will, die sie reißt. Einen neuen Rekord aufzustellen wird zu etwas, worauf das Team ehrlich stolz ist. Eine wichtige Regel: Urlaubstage, Krankheitstage und Konferenz- oder Trainingstage zählen immer als fortgesetzte Streak. Niemand soll sich gezwungen fühlen, an einem freien Tag zu arbeiten. Die Streak verfolgt die \u003cem\u003eArbeitstage des Teams\u003c/em\u003e, nicht die individuelle Anwesenheit. Wenn drei Personen im Büro sind und ihre zehn Minuten machen, läuft die Streak weiter. Nutzt hier den gesunden Menschenverstand. Die Liste der gültigen Gründe sollte nicht so lang werden, dass sie billige Ausreden erlaubt, aber sie muss schützen, was wirklich wichtig ist. Das Team kennt den Unterschied. Vertraut dem Team an dieser Stelle. Ein Team, das seinen eigenen Fortschritt sieht, will mehr davon. Ein Team, das seinen Fortschritt nicht sieht, geht davon aus, dass es keinen gibt.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMach es befriedigend.\u003c/strong\u003e Die Befriedigung muss auf Team-Ebene sichtbar sein, nicht nur individuell. Metriken, die sich in die richtige Richtung bewegen. Die Anzahl der Warnungen, die Woche für Woche sinkt. Der Dependency-Graph, der sauberer wird. Die Dokumentation, die tatsächlich die Fragen beantwortet, die eine neue Kollegin am ersten Tag stellt. Das sind keine Vanity Metrics. Das sind Belege, die das gesamte Team teilt, dass die kleinen Verbesserungen tun, wofür sie gedacht sind.\u003c/p\u003e\n\n\u003cp\u003eHier eine wichtige Warnung. In dem Moment, in dem jemand außerhalb des Teams das Ganze in eine formale Tracking-Prozedur verwandelt, ist die Idee tot. Wenn eine Managerin oder ein Manager anfängt zu messen, wie viele Verbesserungen jede Entwicklerin oder jeder Entwickler diese Woche gemacht hat, hast Du aus einer Gewohnheit eine KPI gemacht und die intrinsische Motivation ist weg. Das funktioniert nur, wenn das Team das Ganze besitzt, es für sich selbst trackt und es als sein eigenes internes Signal nutzt. Die Habit-Streak am Board gehört dem Team. Sie ist keine Reporting-Metrik. Lasst es so.\u003c/p\u003e\n\n\u003cp\u003eEs gibt noch eine Idee von Clear, die sich besonders gut auf Teams übertragen lässt: Habit-Stacking. Der einfachste Weg, eine neue Gewohnheit aufzubauen, ist, sie an eine bestehende anzudocken. Teams haben bereits Routinen. Das Daily Standup. Das Pull-Request-Review. Die Retrospektive. Neue Gewohnheiten halten schneller, wenn sie auf diesen Routinen mitreiten. Eine „Doku-Minute\" direkt nach dem Standup. Ein „Lass den Code besser zurück, als Du ihn gefunden hast\"-Check im Pull-Request-Review. Drei Minuten Warning-Cleanup zu Beginn jeder Pair-Programming-Session. Die bestehende Routine trägt die neue Gewohnheit auf dem Rücken.\u003c/p\u003e\n\n\u003cp\u003eGenau dieser Schritt holt die 1%-Methode aus dem Selbsthilfe-Bereich und bringt sie in die Engineering-Praxis. Die Gewohnheiten sind nicht persönlich. Die Struktur ist nicht persönlich. Die Verbesserung ist nicht persönlich. Das Team besitzt sie, das Team gestaltet sie, das Team profitiert davon.\u003c/p\u003e\n\n\u003ch3 id=\"architektur-ist-das-was-du-jeden-tag-tust\"\u003eArchitektur ist das, was Du jeden Tag tust\u003c/h3\u003e\n\n\u003cp\u003eWenn wir über Architektur sprechen, sprechen wir meistens über Entscheidungen. Welche Datenbank. Welches Kommunikationsmuster. Welches Deployment-Modell. Wie schneiden wir den Monolithen. Gehen wir auf Event-driven? Wir schreiben ADRs, wir zeichnen Diagramme, wir haben Review-Meetings. Und das alles ist wichtig.\u003c/p\u003e\n\n\u003cp\u003eAber lass mich Dir eine Frage stellen. Wann ist Eure Architektur das letzte Mal wirklich schlechter geworden, weil in einem Meeting eine Entscheidung getroffen wurde?\u003c/p\u003e\n\n\u003cp\u003eAn diesem Ort erodiert Architektur nicht. Sie erodiert an einem Mittwochnachmittag, wenn jemand eine Abkürzung baut, weil der Sprint zu rutschen beginnt. Sie erodiert an einem Freitag, wenn niemand mehr einen Test schreibt, weil das Wochenende ruft. Sie erodiert über einen ruhigen Monat, in dem ein Modul von 300 auf 1.200 Zeilen anwächst, weil jedes neue Feature dort gelandet ist und niemand stehengeblieben ist und gefragt hat, ob es da überhaupt noch hingehört. Sie erodiert, wenn eine Dependency, die vor sechs Monaten als temporär gedacht war, mittlerweile tragend ist und sich niemand mehr traut, sie anzufassen.\u003c/p\u003e\n\n\u003cp\u003eIn keinem dieser Momente hat jemand eine schlechte Architektur-Entscheidung getroffen. In keinem dieser Momente hat überhaupt jemand eine \u003cem\u003eEntscheidung\u003c/em\u003e getroffen. Die Dinge sind einfach passiert. Kleine Dinge. Alltägliche Dinge. Dinge, die in keinem Review und keinem ADR auftauchen.\u003c/p\u003e\n\n\u003cp\u003eUnd trotzdem sind sie in Summe \u003cem\u003edie Architektur\u003c/em\u003e.\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eArchitektur ist die Summe alltäglichen Verhaltens, im gesamten Team, über Monate und Jahre.\u003c/strong\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eDie Entscheidungen geben die Richtung vor. Das tägliche Verhalten entscheidet, ob wir auch tatsächlich dorthin kommen. Und wenn das tägliche Verhalten nicht zur Richtung passt, gewinnt das Verhalten. Jedes Mal. Kein Diagramm überlebt den Kontakt mit einem Team, das keine Gewohnheit hat, den Code sauber zu halten.\u003c/p\u003e\n\n\u003cp\u003eGenau deshalb sind die Team-Gewohnheiten aus dem vorherigen Abschnitt viel wichtiger, als sie auf den ersten Blick wirken. Eine einzelne zehnminütige Verbesserung wird keine kaputte Modulgrenze reparieren. Natürlich nicht. Aber ein Team, das die \u003cem\u003eGewohnheit\u003c/em\u003e hat, Grenzen wahrzunehmen, Probleme zu benennen, kleine Unordnung zu beseitigen und die Dokumentation ehrlich zu halten, wird über die Zeit eine Architektur produzieren, die fundamental leichter zu ändern ist. Nicht weil jemand in einem Meeting-Raum eine geniale Entscheidung getroffen hat. Sondern weil hunderte kleiner, alltäglicher Verhaltensweisen ungefähr in dieselbe Richtung gezeigt haben, Tag für Tag, Monat für Monat.\u003c/p\u003e\n\n\u003cp\u003eWeniger Hotspots in der Codebasis. Geringere Komplexität in den Modulen, die am häufigsten angefasst werden. Klarere Grenzen zwischen Komponenten. Klarere Verantwortlichkeiten. Vorhersehbarere Abhängigkeiten. Eine Dokumentation, die ein neues Team-Mitglied in der ersten Woche tatsächlich nutzen kann. Nichts davon entsteht durch eine einzige Aktion. Alles davon entsteht durch ein anhaltendes Verhaltensmuster.\u003c/p\u003e\n\n\u003cp\u003eIn den Kampfsportarten gibt es ein Konzept, das das gut einfängt: \u003cstrong\u003eMuskelgedächtnis\u003c/strong\u003e. Du fängst nicht damit an, Kämpfe zu gewinnen. Du fängst damit an, einen einzelnen Schlag zu perfektionieren. Du wiederholst ihn hunderte Male, bis Dein Körper Dein Gehirn nicht mehr braucht. Dann lernst Du Katas, Abfolgen aus Schlägen und Blocks, die ineinander fließen. Auch die wiederholst Du hunderte Male. Und dann, eines Tages, gehst Du in einen Sparring-Kampf und Dein Körper macht das Richtige, ohne dass Du darüber nachdenkst. Der Schlag sitzt. Der Block ist am richtigen Ort. Nicht weil Du in dem Moment entschieden hättest, was zu tun ist. Sondern weil Du das Muster so tief trainiert hast, dass die Entscheidung schon gefallen war.\u003c/p\u003e\n\n\u003cp\u003eArchitektur funktioniert genauso. Ein Team, das ein halbes Jahr lang täglich kleine Verbesserungen geübt hat, muss nicht mehr darüber nachdenken, ob es einen verwirrenden Namen aufräumt, einen toten Import entfernt oder eine Grenze schärft. Es macht es einfach. Das Muster ist trainiert. Die Architektur spiegelt das Training.\u003c/p\u003e\n\n\u003cp\u003eUnd das ist der wirklich starke Effekt: Ein Team mit ausgeprägtem architektonischem Muskelgedächtnis hält nicht nur die aktuelle Architektur. Es macht \u003cem\u003ezukünftige\u003c/em\u003e architektonische Veränderungen sicherer. Wenn der Moment für einen größeren Schritt kommt, eine Service-Extraktion, ein Modul-Split, eine Technologie-Migration, dann ist genau das Team in der Lage, ihn umzusetzen, das seit Monaten Grenzen sauber hält, Abhängigkeiten vorhersehbar und die Dokumentation ehrlich. Der große Schritt ist nicht deshalb sicher, weil ihn jemand gut geplant hat. Er ist sicher, weil der Boden, auf dem er landet, vorbereitet wurde, eine zehnminütige Verbesserung nach der anderen.\u003c/p\u003e\n\n\u003ch3 id=\"die-zinseszinskurve\"\u003eDie Zinseszinskurve\u003c/h3\u003e\n\n\u003cp\u003eEs gibt etwas, worüber ich ehrlich sein muss. In den ersten Wochen wird sich das anfühlen, als würde nichts passieren.\u003c/p\u003e\n\n\u003cp\u003eDer Code wird nicht dramatisch anders aussehen. Die Metriken werden sich kaum bewegen. Die Architektur-Diagramme ändern sich gar nicht. Und irgendjemand im Team wird sagen \u003cem\u003e„siehst Du, ich hab’s Dir gesagt, das funktioniert nicht.\"\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eDiese Person liegt falsch. Aber sie lügt auch nicht über das, was sie sieht. Die Zinseszinskurve hat eine Form, die Ungeduld bestraft. Die frühe Phase ist flach. Fast unsichtbar. Die Verbesserungen sind real, aber sie sind zu klein und zu verstreut, um in irgendetwas aufzutauchen, das man messen oder zeigen könnte. An genau dieser Stelle geben die meisten Teams auf. Nicht weil der Ansatz nicht funktioniert hätte, sondern weil das Feedback zu spät kam.\u003c/p\u003e\n\n\u003cp\u003eDas ist wie beim Laufen. Nach vier bis sechs Wochen täglichem Training fühlst Du Dich gut. Du bist schneller als am ersten Tag. Du hast eine Routine. Du fühlst Dich wie eine Läuferin oder ein Läufer. Und dann läuft Dir an einem Morgen jemand in Deinem Alter im Park entspannt davon. Plötzlich fühlt sich Dein Fortschritt nach nichts an. Du warst stolz auf Deine zehn Minuten pro Kilometer und diese Person macht sechs Minuten pro Kilometer, als sei es ein Aufwärmen. Die Versuchung aufzugeben ist riesig, weil der Vergleich Deinen eigenen Fortschritt für Dich unsichtbar gemacht hat.\u003c/p\u003e\n\n\u003cp\u003eDasselbe passiert Teams. Du machst seit sechs Wochen Deine zehnminütigen Verbesserungen. Der Code ist ein Stück sauberer. Das Team fühlt sich gut damit. Und dann liefert ein anderes Team in der Firma ein großes, sichtbares Modernisierungs-Projekt aus. Neue Architektur. Neuer Tech-Stack. Eine Präsentation für die Leitung. Applaus. Eure stille, tägliche, unsichtbare Arbeit sieht im Vergleich nach nichts aus.\u003c/p\u003e\n\n\u003cp\u003eDas ist der Moment, der Teams trennt, die dauerhafte Gewohnheiten aufbauen, von Teams, die es einmal versucht haben. Das Team, das hier aufgibt, hat sichtbaren Fortschritt als Motivation gebraucht. Das Team, das weitermacht, hat etwas Besseres: eine Struktur, die nicht davon abhängt, wie sich das Team in dieser Woche fühlt. Die Gewohnheit läuft einfach. Genau deshalb haben wir sie als Gewohnheit gebaut.\u003c/p\u003e\n\n\u003cp\u003eWenn Du dabei bleibst, biegt sich die Kurve. Nicht plötzlich, aber spürbar. So sieht es ungefähr aus in den Teams, mit denen ich bisher gearbeitet habe:\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eWoche 1 bis 4: Unsichtbar.\u003c/strong\u003e Die Verbesserungen passieren, aber niemand außerhalb des Teams kann es erkennen. Die Diffs sind ein bisschen sauberer. Ein paar Namen ergeben mehr Sinn. Ein Test, der gefehlt hat, existiert jetzt. Der Wert ist real, aber nur für die Person sichtbar, die die Änderung gemacht hat. Das ist die schwerste Phase, weil sich der Aufwand sinnlos anfühlt. Ist er aber nicht.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMonat 2 bis 5: Spürbar.\u003c/strong\u003e Das Team beginnt es zu fühlen. Code-Reviews werden schneller, weil der Code leichter zu lesen ist. Neue Kolleg:innen finden sich ein bisschen schneller zurecht. Die Anzahl der „was macht das hier eigentlich?\"-Gespräche im Standup geht zurück. Nichts Dramatisches. Aber die Reibung ist geringer und das Team merkt es.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMonat 6 bis 12: Messbar.\u003c/strong\u003e Die Metriken beginnen widerzuspiegeln, was das Team längst spürt. Warning-Counts sind sichtbar niedriger. Die Komplexitätswerte in den Modulen, die das Team am häufigsten anfasst, sind gesunken. Der Dependency-Graph hat weniger überraschende Verbindungen. Die Dokumentation deckt die Dinge ab, nach denen tatsächlich gefragt wird. Ihr könnt einen Chart auf einen Bildschirm hängen und jemandem außerhalb des Teams zeigen, dass sich etwas verbessert. Die kognitive Last im Team ist niedriger. Und falls jemand fragt, ob die zehn Minuten den Aufwand wert sind: Das Team liefert Features schneller aus, weil der Code, auf dem es aufbaut, ihm nicht mehr im Weg steht.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMonat 12 bis 24: Transformativ.\u003c/strong\u003e Hier wird der Zinseszinseffekt schwer zu ignorieren. Die Codebasis ist nicht nur sauberer, sie ist strukturell anders als vor zwei Jahren. Grenzen sind schärfer. Module sind kleiner. Das Team bewegt sich schneller bei neuen Features, weil der Boden, auf dem es baut, solide ist. Und die großen architektonischen Schritte, die vor zwei Jahren unmöglich aussahen, eine Service-Extraktion, ein Modul-Split, eine Technologie-Migration, sind jetzt realistisch, weil die Vorbereitung in zehnminütigen Verbesserungen passiert ist.\u003c/p\u003e\n\n\u003cp\u003eDie Zinseszinskurve ist kein Versprechen, dass alles gut wird. Sie ist eine Beschreibung davon, wie sich kleiner, konsistenter Aufwand aufaddiert. Die Form ist immer dieselbe: flach, dann allmählich, dann steil. Die einzige Variable ist, ob das Team lange genug dabeibleibt, um den steilen Teil zu erreichen. Genau deshalb muss es eine Gewohnheit sein und kein Projekt. Keine Motivation hält zwei Jahre durch. Keine Projekt-Sponsorin, kein Projekt-Sponsor hält zwei Jahre Aufmerksamkeit. Eine Gewohnheit hält das aus.\u003c/p\u003e\n\n\u003ch3 id=\"wie-du-diese-woche-anfangst\"\u003eWie Du diese Woche anfängst\u003c/h3\u003e\n\n\u003cp\u003eWer schon einmal mit dem Laufen angefangen hat, kennt die Antwort auf die Frage \u003cem\u003e„Wann ist der beste Zeitpunkt für eine Laufrunde?\"\u003c/em\u003e. Die Antwort ist immer: jetzt. Nicht morgen. Nicht nächsten Montag. Nicht erst, wenn Du die richtigen Schuhe gekauft hast. Jetzt.\u003c/p\u003e\n\n\u003cp\u003eNach Konferenzen sage ich den Leuten normalerweise das Gegenteil. Fall nicht auf jeden Hype rein. Mach kein CDD, also kein Conference-Driven Development. Keine karrierebegrenzenden Aktionen auf Basis eines 45-Minuten-Vortrags zwischen Mittagessen und Nachmittagskaffee.\u003c/p\u003e\n\n\u003cp\u003eDiesmal sage ich Dir das Gegenteil. Fang an. Jetzt. Das ist kein neues Framework. Das ist keine riskante Technologie-Wette. Das sind zehn Minuten, in denen Du Code aufräumst, der Dir ohnehin gehört. Das Schlimmste, was passieren kann, ist, dass eine Variable morgen einen besseren Namen hat. Los.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eDiese Woche:\u003c/strong\u003e Morgen früh, bevor Du Slack öffnest, investierst Du zehn Minuten in eine kleine Verbesserung. Irgendeine Verbesserung. Ein besserer Name. Eine gelöschte Methode. Ein Test. Ein Absatz Dokumentation. Sag es niemandem. Bitte um keine Erlaubnis. Leg kein Ticket an. Mach es einfach. Leg eine Habit-Card auf den Schreibtisch und hak sie ab. Mach das diese ganze Woche an jedem Arbeitstag.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eDiesen Monat:\u003c/strong\u003e Sprich mit Deinem Team darüber. Zeig ihnen die Habit-Card. Erzähl, was Du gemacht hast. Wenn die anderen neugierig werden, schlag vor, dass alle es zwei Wochen lang ausprobieren. Richtet das Improvement-Ideen-Board ein, auf einem Flipchart oder in Miro, sodass niemand überlegen muss, woran zu arbeiten wäre. Startet eine Team-Streak. Trackt sie am Team-Board, nicht in einem Reporting-Tool.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eDieses Quartal:\u003c/strong\u003e Bring das Thema in Eure Retrospektiven. Sprecht darüber, was funktioniert und was nicht. Wählt eine oder zwei Metriken aus, die für Eure Codebasis relevant sind, also Warning-Count, Komplexitäts-Score, Dependency-Anzahl, Dokumentations-Coverage, und beobachtet sie. Nicht als KPI für jemand anderen. Als Signal für das Team. Lasst das Team die Gewohnheit auf den eigenen Rhythmus zuschneiden. Manche Teams machen ihre zehn Minuten als Erstes am Morgen. Andere direkt nach dem Standup. Wieder andere als Teil des Pull Requests. Es gibt keine richtige Antwort. Es gibt nur die Antwort, mit der Euer Team auch noch nach 12 Monaten dabei ist.\u003c/p\u003e\n\n\u003cp\u003eKein Framework, das eingeführt werden muss. Kein Tool, das gekauft werden muss. Kein Budget, das genehmigt werden muss. Kein Lenkungsausschuss, den Du überzeugen müsstest. Die Eintrittsbarriere sind zehn Minuten und die Bereitschaft anzufangen, bevor Du Dich bereit fühlst.\u003c/p\u003e\n\n\u003ch3 id=\"fang-morgen-an\"\u003eFang morgen an\u003c/h3\u003e\n\n\u003cp\u003eIn zwei Jahren wird Eure Codebasis die Summe jeder kleinen Entscheidung sein, die Euer Team zwischen jetzt und dann getroffen hat. Das stimmt, ob Ihr morgen anfangt oder nicht. Der einzige Unterschied ist, ob die Richtung und die Haltung bewusst sind.\u003c/p\u003e\n\n\u003cp\u003eDu brauchst kein Modernisierungs-Projekt. Du brauchst kein Budget. Du brauchst von niemandem eine Erlaubnis. Du musst nicht auf den nächsten Planungszyklus warten, nicht auf das nächste Quartal, nicht auf die nächste ruhige Phase, die ohnehin nie kommen wird. Du brauchst zehn Minuten, eine Habit-Card und die Bereitschaft, den Code ein bisschen besser zu hinterlassen, als Du ihn vorgefunden hast.\u003c/p\u003e\n\n\u003cp\u003eDie Teams, die das tun, werden die Veränderung zunächst nicht bemerken. Niemand wird sie bemerken. Aber in einem Jahr werden sie auf ihre Codebasis schauen und sich fragen, wann die Arbeit damit eigentlich angefangen hat, leichter zu werden. Die Antwort wird sein: an einem ruhigen Dienstagmorgen, bevor jemand Slack geöffnet hat.\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003e\u003cstrong\u003eArchitektur wird zu dem, was Du jeden Tag tust. Nicht zu dem, was Du einmal planst.\u003c/strong\u003e\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eFang morgen an. Zehn Minuten. Nach dem Kaffee. Vor Slack. Vor E-Mail. Vor allem anderen.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/j5xjwve3fne35ajw279ckxkxh7z1?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/05/agentic-engineering-product-discovery/","url":"https://www.innoq.com/en/blog/2026/05/agentic-engineering-product-discovery/","date_published":"2026-05-12T00:00:00.000+00:00","date_modified":"2026-05-12T12:52:59.298+02:00","author":[{"name":"Daniel Westheide","url":"https://www.innoq.com/en/staff/daniel-westheide/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eTeams that have adopted the paradigm of agentic software development produce more code in less time. Studies by \u003ca href=\"https://cloud.google.com/resources/content/2025-dora-ai-assisted-software-development-report\"\u003eDORA\u003c/a\u003e and \u003ca href=\"https://www.faros.ai/ai-productivity-paradox\"\u003eFaros AI\u003c/a\u003e consistently show that these teams complete more tasks and epics in the same time, create more and larger pull requests, and merge more pull requests as well.\u003c/p\u003e\n\n\u003cp\u003eThis raises the question: what to do with the freed-up capacity? The obvious response would be to simply build more in the same amount of time. There are companies marketing themselves as AI-native that are already optimizing their entire pipeline in exactly this direction: a Slack message in which a stakeholder requests a feature becomes a pull request, which goes through review and gets deployed — all fully automated by various agents.\u003c/p\u003e\n\n\u003cp\u003eThis is an understandable reflex, for several reasons. First, it sounds appealing to keep existing developers busy rather than shrinking development teams in response to higher productivity. And second, many companies have enormous backlogs that seem to offer an endless supply of work, and stakeholders — especially in senior leadership — always have new ideas or feature requests anyway.\u003c/p\u003e\n\n\u003cp\u003eSometimes the business side still takes too long to prepare their requirements. It becomes the bottleneck. So requirements engineering must be accelerated too. BMAD and similar approaches promise “hours instead of weeks” for specifications. Spec-driven development and AI-assisted requirements engineering are supposed to ensure that developers and their agents never get any slack — always busy, always shipping features to production at maximum speed. What emerges is a perfect output treadmill.\u003c/p\u003e\n\n\u003ch3 id=\"two-problems\"\u003eTwo problems\u003c/h3\u003e\n\n\u003cp\u003eBut this approach has two problems. First, the data from DORA and Faros AI show not only that more code is being produced, but also that code reviews take significantly longer and that software quality in agentically operating teams declines noticeably. The Faros AI report shows that developers in AI-assisted teams produce a significantly higher number of bugs. Compared to the period when AI adoption was still low, the number of incidents has roughly tripled. The DORA report shows an increase in incidents compared to previous years, including in teams that DORA classifies as high performers.\u003c/p\u003e\n\n\u003cp\u003eThe other problem is that features per unit of time is not a meaningful metric for determining whether you are on the right track. The interesting questions are these: do these features create value for users and for the business? Do they satisfy concrete user needs? Do they contribute to a desired strategic outcome?\nIf a feature was simply someone’s idea — whether a manager, an executive, or a customer — the answers to these questions might still be yes, but that would be more coincidence than design.\u003c/p\u003e\n\n\u003ch3 id=\"product-discovery-as-a-better-answer\"\u003eProduct discovery as a better answer\u003c/h3\u003e\n\n\u003cp\u003eReducing that uncertainty is the goal of product discovery: through systematic methods like opportunity mapping, regular short interviews with (potential) users, solution ideation, and assumption testing. Ideally, there is a continuous cycle of discovery and delivery, which in turn feeds new discovery.\u003c/p\u003e\n\n\u003cp\u003eIn many organizations, however, systematic discovery simply does not happen. The number of unused features built at great expense because someone had a seemingly brilliant idea is considerable: according to a 2019 Pendo study, 80% of features in average software products are rarely or never used.\u003c/p\u003e\n\n\u003cp\u003eSo why not respond to the increased productivity in delivery by treating it as an opportunity to finally take the essential work of product discovery seriously? Instead of shipping more features in the same period, most of which not used by any one, we could simply maintain the old pace and use the freed-up time to systematically test whether the assumptions behind a feature even hold. Does the problem we want to solve actually exist? Interviews or observations with potential users provide clarity. Is the feature even a solution to the problem? There too, we want to reduce our uncertainty before moving into implementation.\u003c/p\u003e\n\n\u003cp\u003eAI is also changing the economics of product discovery, and it allows the cycle of discovery and delivery to run faster. But some assumptions can still be tested more quickly and cheaply without an agent generating a single line of code, and without deploying anything to production.\u003c/p\u003e\n\n\u003cp\u003eThat something is shifting dramatically here is also illustrated by the fact that IBM WatsonX is reportedly discussing a ratio of one product manager to half a developer, up from a traditional ratio of one product manager to eight developers, as Melissa Perri describes in a LinkedIn post.\u003c/p\u003e\n\n\u003cp\u003eProduct management is the new bottleneck. And the answer is not to compress the discovery phase into a few hours using BMAD or similar tools, but to use the space created by faster delivery for more and better discovery work. Fewer features in the pipeline also means fewer pull requests, less review pressure, and fewer incidents.\nA reversal of the ratio of product managers to developers of this magnitude will have consequences — for team structures, for roles, and for the question of what we as developers actually do.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/5y2k6qi2ify0zeslsxmhsrlab6ab?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/05/the-missing-half-of-your-data-strategy/","url":"https://www.innoq.com/en/blog/2026/05/the-missing-half-of-your-data-strategy/","date_published":"2026-05-04T00:00:00.000+00:00","date_modified":"2026-04-30T17:20:09.294+02:00","author":[{"name":"Stefan Negele","url":"https://www.innoq.com/en/staff/stefan-negele/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003ch3 id=\"the-interplay-between-data-literacy-and-data-access\"\u003eThe interplay between data literacy and data access\u003c/h3\u003e\n\n\u003cp\u003eThere are several definitions of data literacy. I really like the definition given by Jordan Morrow in his book \u003cem\u003eBe Data Literate\u003c/em\u003e:\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eData literacy is the ability to read, work with, analyze, and communicate with data. (Morrow, 2024, p. 36)\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eMorrow writes that not everyone needs to become a data scientist, but people should have these skills to succeed in today’s world. But, in my view, such learning efforts remain largely theoretical without access to data. Data architectures based on federation and data products, such as Data Mesh, can alleviate this problem because companies that implement these concepts should allow access to all company data where legally possible. However, such open data availability leads to chaos without data literacy, because without the ability to understand data and use tools, the likelihood of misinterpretation is very high. Together, however, data availability and data literacy form a symbiotic cycle:\u003c/p\u003e\n\n\u003cp\u003e\u003cfigure style=\"max-width: 700px; margin: auto;\"\u003e\n\u003cimg alt=\"Segmented circle diagram with labels: Data literacy, Platform usage, Data variety, Data access, Curiosity, Learning.\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/4e47fbyw7wbwg79nui3z735l8jqx?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\u003c/figure\u003e\u003c/p\u003e\n\n\r\n\n\u003cp\u003eWhen we are provided the opportunity to access data that is relevant to us (\u003cstrong\u003edata access\u003c/strong\u003e), \u003cstrong\u003ecuriosity\u003c/strong\u003e can develop. This curiosity creates the basis for motivation to \u003cstrong\u003elearn\u003c/strong\u003e, which leads to new skills in working with data. This \u003cstrong\u003edata literacy\u003c/strong\u003e has the desired effect: the \u003cstrong\u003eplatform is used\u003c/strong\u003e, which in turn leads to greater \u003cstrong\u003edata variety\u003c/strong\u003e on the platform. This makes further data available, which in turn may be of interest to others, and the cycle continues.\u003c/p\u003e\n\n\u003cp\u003eA well-designed data platform can reinforce this cycle by making data accessible and supporting users in handling it responsibly. Architectures that distribute data responsibility to the teams who know the data best can provide this foundation. By this, I mean architectures in which a central data team does not manage and provide all data, but rather domain teams act as autonomous owners of their data products. A data product is more than just a dataset; it is a clearly defined, documented and reliably maintained offering of data, developed specifically for use by others. In this article, when I refer to data product oriented architectures, I am not necessarily referring to a fully mature Data Mesh as described by Zhamak Dehghani. I am presenting a spectrum ranging from the initial steps towards decentralised data ownership to the full implementation of the four principles. However, the core idea remains: data responsibility belongs with the teams that know the data.\u003c/p\u003e\n\n\u003cp\u003eThe rapid rise of large language models has made this cycle even more urgent, making it clear how much the value of these technologies depends on the quality of the underlying data. Organisations without reliable, well-structured and accessible data will find little value in AI initiatives. Therefore, data literacy and solid data architecture are not only prerequisites for data-driven decision-making, but also the foundation for the meaningful use of AI.\u003c/p\u003e\n\n\u003cp\u003eBut how can we ensure that these users actually start using the tools, understand how to use them properly, and know how to interpret data correctly? The answer is straightforward, but implementing it is anything but: organizations need employees who are data literate.\u003c/p\u003e\n\n\u003ch3 id=\"what-s-in-it-for-me-generating-interest-in-data-literacy\"\u003e“What’s in it for me?”: Generating interest in data literacy\u003c/h3\u003e\n\n\u003cp\u003eThe motivation to work with data does not always come naturally. In his book \u003cem\u003eHumanizing Data Strategy\u003c/em\u003e, Tiankai Feng (2024) describes several exciting ways in which data strategies can be implemented sustainably. However, I would like to highlight one particular point: the individual benefits for employees are at least as motivating as the benefits for the organization. People use data when it makes their everyday work easier. Once they have mastered the tools on offer, workflows can be automated and thus simplified. If teams can increase their productivity and innovative strength, this not only helps them in their personal success and development but also the entire organization (Feng, 2024).\u003c/p\u003e\n\n\u003cp\u003eManagers have a unique role to play as role models. They exemplify working with data and create incentives to engage with it. This means they work with the data themselves and demonstrate this to their employees, encouraging them to do the same. It is crucial that communication is not only with data but also about data and the successes that have been achieved through its use. Examples include publishing data projects that have been successfully implemented or using gamification in learning path design.\u003c/p\u003e\n\n\u003ch3 id=\"enabling-team-community-of-practice\"\u003eEnabling Team \u0026amp; Community of Practice\u003c/h3\u003e\n\n\u003cp\u003eThe development of data literacy is supported by social structures. On the one hand, it can be driven forward by so-called enabling teams. This type of team is described in detail in Chapter 5 of \u003cem\u003eTeam Topologies\u003c/em\u003e (Skelton \u0026amp; Pais, 2019). This team works closely with the domain teams, coaching, accompanying, and supporting the practical development of data products through workshops, self-service offerings, or by actively fostering collaboration between teams that exchange data. The successes of an enabling team are almost always indirect, which is why clear support from managers is necessary. They must protect time for learning and reward collaboration (Skelton \u0026amp; Pais, 2019).\u003c/p\u003e\n\n\u003cp\u003eA community of practice dedicated to the topic of data keeps the learning process alive. It connects everyone who works with data, enables the exchange of positive and negative experiences, creates knowledge, for example about data quality, and establishes common standards. In this way, it complements the enabling team: the enabling team specifically creates new skills, while the community ensures that this knowledge is broadly anchored and the cognitive load is reduced ​​by pre-filtering relevant content for the organization. Together, they make learning a continuous, social process.\u003c/p\u003e\n\n\u003cp\u003eIntroducing a data product oriented architecture is not merely a project plan but rather the establishment of a learning system. To provide our employees with appropriate learning paths, learning aids, and tasks, we need to know them very well and group them according to skill level and areas of interest. Enabling teams can identify knowledge gaps and a good community promotes effective working methods and provides a forum for discussion of controversial topics. These feedback loops keep the transformation toward a data-driven organization going and make the data platform a living system.\u003c/p\u003e\n\n\u003ch3 id=\"the-platform-as-a-catalyst\"\u003eThe platform as a catalyst\u003c/h3\u003e\n\n\u003cp\u003eIn addition to social structures, the implementation of the technical platform is a key pillar for successfully rolling out a data product oriented architecture. The platform should promote engagement, not discourage it. This means that users must be the focus when selecting or building tools. For example, we use tools such as Terraform, which software engineers are already familiar with, for deploying data products. For non-technical users, we link data sources to familiar programs such as Microsoft Excel or Google Sheets.\u003c/p\u003e\n\n\u003cp\u003eThe familiar tooling is not always sufficient to offer competent users all the options they need to make full use of the data. For example, users with a certain level of maturity in data analysis may also want to use BI tools such as Microsoft Power BI or Tableau. When it comes to software and data engineering tools, a basic knowledge of a tool is often not enough to be able to implement it in a specific environment without tedious trial and error. Clear documentation, simple onboarding processes, sensible defaults, and automated quality checks must be part of the design.\u003c/p\u003e\n\n\u003cp\u003eIf tools are unfamiliar, they should be learnable within a reasonable amount of time because otherwise, the wrong lesson will be taught. It is important that employees internalize working with data. Programs and software libraries should primarily be understood as vehicles. This does not mean that the introduction of a new technology cannot serve as motivation. This is particularly appealing to tech-savvy employees. However, a technically brilliant but cognitively overwhelming platform is usually not accepted by users.\u003c/p\u003e\n\n\u003ch3 id=\"making-success-measurable\"\u003eMaking success measurable\u003c/h3\u003e\n\n\u003cp\u003eWe cannot force acceptance of our platform. Since not every organization is the same, there is no one-size-fits-all solution when it comes to creating learning paths and selecting tools. However, we can demonstrate our own data expertise by monitoring and analyzing the rollout process to make adjustments where necessary.\u003c/p\u003e\n\n\u003cp\u003eRelevant metrics include, for example:\u003c/p\u003e\n\n\u003ch4 id=\"competence-building-are-people-actually-developing-new-skills\"\u003eCompetence building — Are people actually developing new skills?\u003c/h4\u003e\n\n\u003cul\u003e\n\u003cli\u003eWorkshop participation indicates interest and shows that our measures are being adopted.\u003c/li\u003e\n\u003cli\u003eThe number of data products independently delivered by domain teams shows that what has been learned is also being applied.\u003c/li\u003e\n\u003cli\u003eRegular surveys on data confidence reveal whether employees feel secure enough to use data in their decisions.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch4 id=\"cultural-signals-difficult-to-measure-but-particularly-meaningful\"\u003eCultural signals — difficult to measure, but particularly meaningful\u003c/h4\u003e\n\n\u003cul\u003e\n\u003cli\u003eWhen data products are reused across domain boundaries, genuine collaboration emerges.\u003c/li\u003e\n\u003cli\u003eFewer ad hoc requests to the central data team indicate that teams are increasingly able to work independently.\u003c/li\u003e\n\u003cli\u003eInternal and external talks or blog posts by employees on data usage signal that a data culture is not only being lived but actively passed on.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch4 id=\"platform-usage-is-the-platform-making-an-impact-in-day-to-day-work\"\u003ePlatform usage — Is the platform making an impact in day-to-day work?\u003c/h4\u003e\n\n\u003cul\u003e\n\u003cli\u003eThe number of active users and published or reused data products indicates whether the system is growing or stagnating.\u003c/li\u003e\n\u003cli\u003eTime to data access is particularly critical: if it is too long, the cycle of curiosity and learning breaks down right at the start.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch4 id=\"platform-usability-useful-for-real-people-with-real-tasks\"\u003ePlatform usability — Useful for real people with real tasks?\u003c/h4\u003e\n\n\u003cul\u003e\n\u003cli\u003eWhich features are preferred, which are avoided?\u003c/li\u003e\n\u003cli\u003eWhere do users abandon interaction paths?\u003c/li\u003e\n\u003cli\u003eAre data products found and then actually used?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eWe should not use these metrics to make ourselves look good to the outside world or gloss over the success of our projects. They are mirrors that show whether people are learning, working together, and trusting the system. And that is precisely how we must use them.\u003c/p\u003e\n\n\u003ch3 id=\"the-elephant-in-the-room-data-literacy-ai\"\u003eThe elephant in the room: Data Literacy \u0026amp; AI\u003c/h3\u003e\n\n\u003cp\u003eFew topics currently dominate corporate discussions as much as artificial intelligence. AI tools are being connected directly to company data more and more. The idea is simple: ask a question and receive an immediate answer. “Talk to your data” is no longer just a vision for the future; the actual possibilities are impressive. Relationships that previously required deep specialist knowledge and considerable time can now be explored in a matter of minutes. This significantly lowers the barrier to entry and can considerably speed up the cycle of data access, curiosity and expertise.\u003c/p\u003e\n\n\u003cp\u003eHowever, this opportunity also brings with it new responsibilities. Large language models can produce answers that are factually incorrect but sound convincing. Those unfamiliar with the underlying data and lacking basic skills in working with it will not recognise such errors. Therefore, data literacy is not an alternative to AI tools, but a prerequisite for using them responsibly. It enables people to question, contextualise and act on answers with care.\u003c/p\u003e\n\n\u003cp\u003eI see chatbots and similar tools as further tools in our ecosystem to which data should be made available in a targeted way, just as with Excel or Power BI. The difference is that errors are harder to spot here because the output appears as a definitive statement rather than a traceable formula. In this context, data literacy means being able to recognise errors and other inconsistencies.\u003c/p\u003e\n\n\u003ch3 id=\"references\"\u003eReferences\u003c/h3\u003e\n\n\u003cp\u003eMorrow, J. (2024). Be Data Literate: The data literacy skills everyone needs to succeed (2nd ed.). Kogan Page\u003c/p\u003e\n\n\u003cp\u003eFeng T. (2024). Humanizing Data Strategy: Leading Data with the Head and the Heart. Technics Publications\u003c/p\u003e\n\n\u003cp\u003eSkelton, M., \u0026amp; Pais, M. (2019). Team Topologies: Organizing Business and Technology Teams for Fast Flow. IT Revolution Press\u003c/p\u003e\n\u003cp\u003eA successful data platform not only enables easy access to data but also learning with data. Moreover, it conveys an understanding of how insights can be derived from data and turned into direct, tangible value. Democratizing access to data arouses curiosity, and data literacy transforms that curiosity into competence. Acceptance arises naturally when people learn, share, and trust data. In this way, our data initiative evolves from a technical project into a genuine learning system.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/d7r01zjj80e0x3ot4emz4ckr87cu?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/05/self-made-souv-nebu-chat/","url":"https://www.innoq.com/en/blog/2026/05/self-made-souv-nebu-chat/","date_published":"2026-05-03T00:00:00.000+00:00","date_modified":"2026-04-30T16:27:08.598+02:00","author":[{"name":"Philipp Beyerlein","url":"https://www.innoq.com/en/staff/philipp-beyerlein/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003e\u003cem\u003eInspired by the \u003ca href=\"https://di.day/de\"\u003eDigital Independence Day\u003c/a\u003e and its call for “recipes” for digital sovereignty, we share our own approaches to topics that matter to our customers – every first Sunday of the month.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eIf you google “Slack alternative open source”, you’ll get results quickly. Mattermost, Rocket.Chat, Element. All open source, all promising. Their websites promise enterprise communication without vendor lock-in, complete with UI screenshots that look just like Slack. Channels, threads, file sharing, integrations. You think: This could work. So you click through the feature lists.\u003c/p\u003e\n\n\u003cp\u003eAnd then you see the “Enterprise” column: SSO integration, LDAP connectivity, compliance exports, clustering. All behind a commercial license. The Community Edition is cut down so far that you can’t run it in production as a company. Not by accident—on purpose. That’s how the business model works. The Community Edition exists so you can try the product. It does not exist so you can run it in production.\u003c/p\u003e\n\n\u003cp\u003eWhat’s frustrating: in other software categories, the open-source world solved this problem long ago. Nextcloud is the best example. A company can install Nextcloud, run it in production, and retain full control over its data. SSO works in the Community Edition. Clustering is possible. There’s a vibrant community evolving the product, plus a commercial support option—but no artificial feature barrier that blocks production use. Enterprise chat is different. There is no Nextcloud for chat. There’s only a storefront with a license cashier.\u003c/p\u003e\n\n\u003cp\u003eAnd that’s the real issue: nothing changes about sovereignty. If you move from Slack to Mattermost Enterprise, you’re swapping the badge, not the dependency. The invoice comes from a different vendor, and someone else still sets the roadmap. That’s not sovereign.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch3 id=\"the-licensing-trap\"\u003eThe licensing trap\u003c/h3\u003e\n\n\u003cp\u003eWhat shows up in enterprise chat isn’t an isolated case. It’s a business model with its own name: Open Core. The idea sounds reasonable at first: the core of the software is open source; anyone can use it, modify it, and run it. The community benefits, and the company behind it makes money with enterprise features, support, and hosting. That’s the theory.\u003c/p\u003e\n\n\u003cp\u003eIn practice, it looks different. The Community Edition is deliberately stripped down until it’s useless for professional use. Not because the features are technically hard. But because those features are what make the Enterprise Edition commercially valuable. SSO is the most obvious example, because in virtually every company above a certain size it’s table stakes. If employees can’t log in via centralized identity management, you’re running an isolated solution. And that feature reliably sits behind the license wall.\u003c/p\u003e\n\n\u003cp\u003eYou see the same pattern with clustering, audit logging, compliance exports. It’s never niche functionality that’s missing. It’s always the features without which you can’t operate the system in production in an enterprise context. That’s not an accidental gap. It’s the architecture of the business model.\u003c/p\u003e\n\n\u003cp\u003eAnd this model isn’t limited to chat software. In 2024, Redis moved from a permissive open-source license to a restrictive one. The Community Edition can do clustering, but without automatic shard management, without multi-tenancy, and without the high availability you need for production. If you need that, you end up on Redis Enterprise. Same open-core pattern—this time in infrastructure. HashiCorp did the same in 2023 with Terraform and Vault. MongoDB went down the same road years earlier. The license you relied on when you adopted the software can change years later, and there’s nothing you can do about it. Then you’re left with a choice: pay, migrate, or accept the risk. None of those options is sovereign.\u003c/p\u003e\n\n\u003cp\u003eThe problem is not that companies pay for software. Funding good software is right and important. The problem is that “open source” gets used as a promise of independence, while the business model behind it systematically prevents that independence. If you move from Slack to Mattermost Enterprise, you get a new logo on the invoice. The dependency structure is the same.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch3 id=\"a-protocol-not-a-product\"\u003eA protocol, not a product\u003c/h3\u003e\n\n\u003cp\u003eWith Matrix, the situation differs from Mattermost or Rocket.Chat. Matrix is not a company product offering a Community Edition as bait. Matrix is an open standard. The protocol is publicly specified, actively developed, and defines how servers and clients communicate. Anyone can write their own implementation, and existing clients will work with it without changes. Element, Cinny, FluffyChat: all of these clients speak the Matrix protocol and connect to any server that correctly implements the Client-Server API.\u003c/p\u003e\n\n\u003cp\u003eThat’s a crucial difference. If you build your own Mattermost server, you don’t have clients. If you build your own Matrix server, you instantly have an entire ecosystem of mature clients that just work.\u003c/p\u003e\n\n\u003cp\u003eBut this is exactly where the problem starts. On the server side, it looks bleak. Synapse, the best-known and most mature implementation, is licensed under AGPLv3. That means: any modification must be published as open source as soon as the server is provided over a network. For companies that need a customized permission model or industry-specific compliance functions, that’s a deal-breaker. Dendrite and Conduit are Apache 2.0 licensed, but neither is production-ready. No real clustering, open performance questions, no enterprise feature set.\u003c/p\u003e\n\n\u003cp\u003eThe gap is real: there is no Matrix server that combines a permissive license, horizontal scalability, and the features companies and public-sector organizations need. The choice is Synapse with licensing issues, immature alternatives, or Element’s commercial solution—which merely shifts the dependency problem elsewhere.\u003c/p\u003e\n\n\u003cp\u003eI’d been thinking this for a while: why doesn’t anyone build it themselves? The protocol is open, the clients exist. What’s missing is the server. But a chat system is complex. Business-critical. Thousands of users depend on it working, administered by two or three admins. In classic software projects, you need a specialized team that knows distributed systems, real-time communication, and cryptography. That’s expensive and hard to justify for a single project. As a one-person show, it’s simply not feasible. Not because the knowledge is unavailable, but because the architecture, implementation, and sheer breadth of technical decisions are too much for one person.\u003c/p\u003e\n\n\u003cp\u003eSo the thought stayed just that—a thought. Until the conditions changed.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch3 id=\"so-i-m-building-it-myself\"\u003eSo I’m building it myself\u003c/h3\u003e\n\n\u003cp\u003eAgentic software development—working with LLM-assisted agents that independently solve sub-tasks—has changed what a single person can accomplish. Not because an LLM writes better code than an experienced developer. But because it reduces conceptual work to a level that used to be unthinkable. What previously required months of architecture work by a specialized team can now be worked out in structured sessions—if you have the right process and can evaluate the results.\u003c/p\u003e\n\n\u003cp\u003eWhat matters is the distinction: this is not vibe coding. Not a one-shot run where you tell an LLM “build me a chat server” and hope something usable comes out. The Matrix community has already lived through that. In January 2026, Cloudflare published a Matrix homeserver on Cloudflare Workers, marketed as “production-grade” with a deploy button. Matthew Hodgson from the Matrix Foundation dismantled the implementation within hours \u003ca href=\"https://matrix.org/blog/2026/01/28/matrix-on-cloudflare-workers/\"\u003epublicly\u003c/a\u003e: no state resolution, no permission checks, no real authentication. Hodgson compared it to “a filesystem that ignores permissions, or a blockchain without a consensus mechanism.” The community immediately recognized the typical traits of LLM-generated code: TODO comments in critical security logic, plausible structure without functional depth. Shortly after publication, Cloudflare had to update the post with a disclaimer, reframing it as a proof of concept.\u003c/p\u003e\n\n\u003cp\u003eWhat’s happening here instead is spec-driven development with the BMAD framework: a structured process for agentic product development. A flow that goes from product vision through architectural decisions to epics and implementation plans. The result is not code you accept blindly, but real engineering: a complete Product Requirements Document, a well-considered architecture, clean epics. Produced by a human, supported by agents.\u003c/p\u003e\n\n\u003cp\u003eI tried it. Not as an experiment, but with a concrete product vision: a Matrix-compatible enterprise chat server. Apache 2.0. Horizontal clustering. SSO by default. GDPR-ready with audit logs and compliance access. Everything the licensing trap keeps from me, I’ll build myself.\u003c/p\u003e\n\n\u003cp\u003eThe result is called Nebu. \u003cem\u003eNamed after the Nebuchadnezzar from the film The Matrix—the ship that freed people from the Matrix.\u003c/em\u003e\u003c/p\u003e\n\n\u003ch4 id=\"remove-complexity-wherever-possible\"\u003eRemove complexity wherever possible\u003c/h4\u003e\n\n\u003cp\u003eThe first architectural decision was the most important: implement only the Matrix Client-Server API and skip federation entirely. Federation—the ability for different Matrix servers to communicate—is the most complex part of the protocol. State resolution, server-to-server authentication, key exchange: that’s roughly 40% of the overall complexity. Enterprise deployments run on their own instance anyway. Dropping federation isn’t a compromise; it’s a deliberate decision to drastically reduce complexity. And the key point: all standard clients still work without any changes.\u003c/p\u003e\n\n\u003ch4 id=\"the-licensing-trap-in-my-own-design\"\u003eThe licensing trap in my own design\u003c/h4\u003e\n\n\u003cp\u003eThen came the moment that perfectly captured this article’s core thesis. I look at the first blueprint. Redis for session caching, NATS for the internal message bus. Both standard choices, both technically fine. Then I check the licenses. Redis: SSPL since 2024. For production clustering you need a commercial license. I stare at my own design and realize: the licensing trap that brought me to this project in the first place is baked into my first blueprint. And it got worse: if you look at what Element Server Suite Pro uses under the hood, you’ll find Synapse Pro with Redis as a central component for worker communication when scaling horizontally. Redis—under SSPL since 2024. The vendor that’s supposed to free you from Slack dependency builds its solution on a component that is itself caught in the licensing trap.\u003c/p\u003e\n\n\u003cp\u003eSo: out it goes. Elixir brings everything you’d otherwise need external components like Redis or NATS for. The built-in in-memory store ETS replaces Redis—no network hop, no extra license. Process groups take over the internal message bus. Elixir distribution with TLS connects the cluster nodes. No external dependency, everything in one platform.\u003c/p\u003e\n\n\u003cp\u003eI don’t know Elixir in depth myself. But I do know WhatsApp runs on Erlang/OTP, and OTP’s actor model was built for exactly this kind of real-time messaging. With agentic development, I don’t need Elixir expert knowledge. I need the architectural decision that Elixir is the right choice—and the ability to evaluate the results. The agent does the rest.\u003c/p\u003e\n\n\u003cp\u003eAnd this is where an advantage hides in plain sight: because I’m not writing the code myself, the technology choice can be driven purely by the product vision. Elixir is the optimal language for real-time messaging. Go is optimal for HTTP gateways. In a classic project, I’d have to compromise because the team only knows certain languages. Everything would probably end up in Go because the available developers can do Go. That would work—but without the benefits of a language optimized for this exact use case. And even if you pick the optimal stack, you suddenly need Go developers and Elixir developers, which makes the team larger and the project more expensive. With agentic development, that compromise goes away. The best technology for the job wins—not the one that happens to be available on the team.\u003c/p\u003e\n\n\u003ch4 id=\"the-result-three-components-no-proprietary-licenses\"\u003eThe result: three components, no proprietary licenses\u003c/h4\u003e\n\n\u003cp\u003eIn the end, there are three runtime components: a Go binary for the API and media gateways; an Elixir/OTP release for core messaging, session management, and presence; PostgreSQL as the database. No Redis. No NATS. All components are Apache 2.0 or BSD licensed. No additional licenses. No dependency chain.\u003c/p\u003e\n\n\u003cp\u003eAuthentication is OIDC-first. Nebu does not manage passwords. Identity comes exclusively from an external OIDC provider—whether that’s Dex, Keycloak, or Azure AD doesn’t matter. The protocol abstracts the provider. This has a concrete security benefit: no in-house password management eliminates a whole category of vulnerabilities.\u003c/p\u003e\n\n\u003cp\u003eOn top of that: compliance access with a two-person approval principle and an immutable audit log; cryptographic signatures on every message for message integrity; and full-text search directly via PostgreSQL FTS without an external search component. Everything that elsewhere requires an enterprise license is part of the architecture from day one.\u003c/p\u003e\n\n\u003cp\u003eThe foundation: twelve Architecture Decision Records, eight development phases, a complete database schema. What it turned into: a Matrix server that works with Element and FluffyChat. SSO via OIDC from the very first instance. \u003ccode\u003edocker compose up\u003c/code\u003e, ten minutes, running. Audit log and compliance access with two-person approval principle are included without a paywall. Everything under Apache 2.0: \u003ca href=\"https://github.com/innoq/nebu\"\u003egithub.com/innoq/nebu\u003c/a\u003e and \u003ca href=\"https://gitlab.opencode.de/nebu/nebu-server\"\u003ehttps://gitlab.opencode.de/nebu/nebu-server\u003c/a\u003e.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch3 id=\"the-new-equation\"\u003eThe new equation\u003c/h3\u003e\n\n\u003cp\u003eWhat Nebu shows is not a one-off. It’s a pattern and this pattern has two variants.\u003c/p\u003e\n\n\u003cp\u003eThe first is the Nebu variant: there’s an open protocol or open standard, but no usable implementation with a suitable license. In that case, building it yourself is the answer. That sounds like a lot—and it’s not trivial. But you don’t start from scratch. Most of it is handled by proven foundational technologies that remain untouched: Go, Elixir/OTP, PostgreSQL, and all the surrounding infrastructure. The protocol is specified as an open standard and doesn’t need reinventing. The Matrix ecosystem provides finished clients; they work without customization. What remains is the actual server implementation. That’s a substantial project, but not one that requires a team if the process is right.\u003c/p\u003e\n\n\u003cp\u003eThe second variant applies to far more companies: an open-source solution with a suitable license exists, it’s stable, it covers 90–95% of requirements. But the last 5% are missing. A specific permission model, an integration the vendor doesn’t prioritize, a compliance feature only your industry needs. Until now, the answer was: buy the commercial version or pay an expensive service provider to get up to speed in the repository. The SAP effect: it fits 90% for everyone, but the last 10% are the most expensive because you need an integrator deep enough inside the system to adapt the application to your domain.\u003c/p\u003e\n\n\u003cp\u003eAgentic development fundamentally changes that math. Getting up to speed in a foreign repository, understanding the architecture, finding the right extension points—that’s exactly what an agent can do. Not perfectly, not without oversight, but fast and thorough enough to make a fork realistic where it previously wasn’t economically viable. The missing feature that forced you into a buy decision can now be built in-house.\u003c/p\u003e\n\n\u003cp\u003eThe classic counterargument is maintenance: a fork that drifts from upstream, creates merge conflicts, and ultimately depends on one developer who must never quit. That argument applies to code written directly into the fork. With spec-driven development, you instead produce a specification of your customizations, which an agent reapplies on every upstream update. That’s still a hypothesis—but one Nebu itself will prove.\u003c/p\u003e\n\n\u003cp\u003eBoth variants lead to the same outcome: the trade-off between “open source, but incomplete” and “complete, but dependent” dissolves. Not for every problem. But for any where the gap between what’s freely available and what you actually need is manageable.\u003c/p\u003e\n\n\u003cp\u003eWho benefits? Companies with an in-house IT team that had no real alternative at the last license renewal. Organizations with strict compliance requirements. Public-sector bodies with on-prem mandates. What it takes: a senior developer who can own code, domain, and architecture together; a structured process; and the decision to actually do it.\u003c/p\u003e\n\n\u003cp\u003eAt the decision-maker level, you can reduce the approach to a simple equation. A Mattermost Enterprise license for 500 users is in the tens of thousands of euros per year; an Element server contract in a similar range; plus integrator costs and ongoing dependency on the vendor’s pricing and roadmap. On the other side are LLM usage costs for the architecture work, the working time of a senior developer, and infrastructure costs for running it yourself. You have to budget staff time when introducing new software in any case, regardless of the alternative. In a greenfield project like Nebu, the LLM costs were in the low hundreds; in a fork that reuses 90% of the architecture, that’s effectively the cap. No annual license renewal where the price goes up because the vendor knows switching costs are higher. No risk that a license change like Redis or HashiCorp can call your infrastructure into question overnight. What remains is an asset the company owns—one that can be developed further internally and whose costs are predictable.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch3 id=\"limits-honestly-stated\"\u003eLimits, honestly stated\u003c/h3\u003e\n\n\u003cp\u003eThis approach is not a cure-all, and it shouldn’t be sold as one.\u003c/p\u003e\n\n\u003cp\u003eAgent-generated architecture and code must be evaluated by someone who understands the consequences. An architecture can sound plausible and still contain fundamental flaws. Agentic development accelerates execution, not the judgment you need to bring. The Cloudflare disaster shows what happens when that evaluation is missing: plausible-looking code that fails when it matters and a community that loses trust.\u003c/p\u003e\n\n\u003cp\u003eNot everything should be built in-house. Kubernetes, PostgreSQL, nginx, Keycloak: for proven infrastructure, there’s no justification for developing your own solutions. The approach works where a real gap exists. Where the gap doesn’t exist, it’s waste.\u003c/p\u003e\n\n\u003cp\u003eOperations and maintenance remain your responsibility. Sovereignty also means owning security patches, database migrations, backups, and monitoring. That’s not a downside—it’s the price of control. If you can’t or don’t want to cover that internally, you should assess that honestly before you start.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch3 id=\"self-made-sovereignty\"\u003eSelf-made sovereignty\u003c/h3\u003e\n\n\u003cp\u003eThe licensing trap works because it rests on an assumption: that “building it yourself” is too expensive, too complex, and too risky. That companies have no choice but to pay or accept it. That assumption was true for a long time. It isn’t anymore.\u003c/p\u003e\n\n\u003cp\u003eAgentic software development has changed the equation. Not because it makes developers obsolete, but because it enables a single experienced developer to do what used to require a team: choose the best technology for the job, independent of which languages are available on the team; get up to speed in a foreign repository and implement the missing 5%; or, if necessary, build an entire implementation from scratch because no existing product meets the requirements with the right license.\u003c/p\u003e\n\n\u003cp\u003eNebu shows that this path is viable. Not as a thought experiment, but as an ongoing project: Apache 2.0, three runtime components, no licensing dependencies. Element and FluffyChat connect. [github.com/nebu] — a project that would not exist as a one-person show without agentic development.\u003c/p\u003e\n\n\u003cp\u003eAny company staring at a feature matrix today and looking at the locked “Enterprise” column now has a new option. Not “live with it” or “pay for it.” But: build it yourself. The tools are here. The processes are proven. The barrier is lower than ever. If you have a product vision and someone who can evaluate it, you have everything you need today to turn a dependency into your own solution.\u003c/p\u003e\n\n\u003cp\u003eDigital sovereignty used to be a resource problem. Today it’s a decision. I made mine.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch3 id=\"glossary\"\u003eGlossary\u003c/h3\u003e\n\n\u003cp\u003e\u003cstrong\u003eAgentic software development\u003c/strong\u003e — Software development with LLM-assisted agents that independently solve multi-step tasks: writing code, analyzing repositories, running tests. Unlike a single prompt, agents work in structured, iterative processes under human guidance.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eBMAD framework\u003c/strong\u003e — A framework for structured agentic product development, developed by the BMAD community (bmad-method.org). BMAD systematically moves from product vision to architectural decisions to implementation-ready epics.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eSpec-Driven Development (SDD)\u003c/strong\u003e — A development approach where requirements and changes are expressed as a formal specification before an agent turns them into code. Unlike unstructured code generation, SDD focuses on specification before implementation, making the process traceable and repeatable when needed.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eOpen Core\u003c/strong\u003e — A business model where the core of a software product is published as open source, while business-critical features are placed behind a commercial license.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eLicense types in this article\u003c/strong\u003e — \u003cem\u003ePermissive licenses\u003c/em\u003e (Apache 2.0, BSD, MIT) allow free use and modification, including commercially. \u003cem\u003eAGPL\u003c/em\u003e (Affero GPL) is a copyleft license that requires source-code publication when provided over a network. \u003cem\u003eSSPL\u003c/em\u003e (Server Side Public License) and \u003cem\u003eBSL\u003c/em\u003e (Business Source License) are restrictive licenses that limit commercial use and are used in the article as examples of license changes.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eMatrix protocol\u003c/strong\u003e — An open standard for decentralized, encrypted real-time communication. The protocol specifies how servers and clients interact. Anyone can write their own implementation, and existing clients like Element, Cinny, or FluffyChat work with any standards-compliant server.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eFederation\u003c/strong\u003e — The ability of Matrix servers to communicate across servers. Similar to email, users on different servers can exchange messages. Technically, it’s the most complex part of the protocol.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eOIDC (OpenID Connect)\u003c/strong\u003e — An authentication standard that delegates identity verification to an external provider. In practice: login via the existing corporate identity provider instead of in-house passwords.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eElixir/OTP\u003c/strong\u003e — A programming language and platform running on the Erlang VM. OTP (Open Telecom Platform) provides built-in tools for concurrent, fault-tolerant, distributed systems. WhatsApp uses the same platform for large-scale real-time messaging.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eETS (Erlang Term Storage)\u003c/strong\u003e — An in-memory data store built into the Erlang/OTP platform. In Nebu, ETS replaces the external Redis component for session caching and temporary data.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch3 id=\"sources\"\u003eSources\u003c/h3\u003e\n\n\u003cul\u003e\n\u003cli\u003eMatrix protocol and specification: \u003ca href=\"https://matrix.org\"\u003ematrix.org\u003c/a\u003e\n\u003c/li\u003e\n\u003cli\u003eCloudflare Workers Matrix homeserver, analysis by Matthew Hodgson: \u003ca href=\"https://matrix.org/blog/2026/01/28/matrix-on-cloudflare-workers/\"\u003ematrix.org/blog/2026/01/28/matrix-on-cloudflare-workers\u003c/a\u003e\n\u003c/li\u003e\n\u003cli\u003eElement Server Suite Pro architecture: \u003ca href=\"https://docs.element.io/latest/element-server-suite-pro/introduction-to-ess-pro/#architecture\"\u003edocs.element.io\u003c/a\u003e\n\u003c/li\u003e\n\u003cli\u003eRedis license change to SSPL (2024): \u003ca href=\"https://redis.io/blog/redis-adopts-dual-source-available-licensing/\"\u003eredis.io/blog/redis-adopts-dual-source-available-licensing\u003c/a\u003e\n\u003c/li\u003e\n\u003cli\u003eHashiCorp switch to BSL (2023): \u003ca href=\"https://www.hashicorp.com/blog/hashicorp-adopts-business-source-license\"\u003ehashicorp.com/blog/hashicorp-adopts-business-source-license\u003c/a\u003e\n\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/cu2gv62b188umjq5pcon6x2x9ok4?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/","url":"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/","date_published":"2026-04-30T00:00:00.000+00:00","date_modified":"2026-04-27T17:20:41.928+02:00","author":[{"name":"Daniel Westheide","url":"https://www.innoq.com/en/staff/daniel-westheide/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eOver the last ten years we have been able to make an interesting observation: there has been a wave of digitalization and modernization projects. Around the same time, Domain-Driven Design went mainstream. Tactical patterns like Aggregate or Repository, as well as strategic patterns like Context Maps or Bounded Contexts, are everywhere today – including, and especially, in those modernization projects.\u003c/p\u003e\n\n\u003cp\u003eAt first glance this is merely a temporal correlation. But there is good reason to believe that more than coincidence lies behind the simultaneous rise of modernization projects and the growing popularity of DDD. That is what we want to explore in this post.\u003c/p\u003e\n\n\u003ch3 id=\"all-it-can-do-is-control-it\"\u003e“All it can do is control it”\u003c/h3\u003e\n\n\u003cp\u003eThe canonical source on Domain-Driven Design is Eric Evans' book of the same name\u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fn:1\" id=\"fnref:1\" title=\"see footnote\" class=\"footnote\"\u003e[1]\u003c/a\u003e. Martin Fowler writes in the foreword that the complexity at the heart of a software system cannot be avoided – all software can do is control it. Evans himself is no less explicit in the preface: the most significant complexity of many applications lies not in the technology, but in the domain itself, and that complexity must be handled.\u003c/p\u003e\n\n\u003cp\u003eNot a single pattern in the blue book focuses on questioning complexity, let alone eliminating it.\u003c/p\u003e\n\n\u003ch3 id=\"essential-or-accidental\"\u003eEssential or accidental?\u003c/h3\u003e\n\n\u003cp\u003eIn 1986, Fred Brooks distinguished between \u003cem\u003eessential complexity\u003c/em\u003e and \u003cem\u003eaccidental complexity\u003c/em\u003e in \u003cem\u003eNo Silver Bullet\u003c/em\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fn:2\" id=\"fnref:2\" title=\"see footnote\" class=\"footnote\"\u003e[2]\u003c/a\u003e. Essential complexity lives in the problem itself – it cannot be argued away. Accidental complexity is made: grown historically, shaped by organizational structures, never questioned by anyone.\u003c/p\u003e\n\n\u003cp\u003eEvans assumes in the blue book that the domain complexity we encounter is genuinely essential. The domain complexity is taken as given. DDD offers tools to make it manageable. Aggregates, Bounded Contexts, Anti-Corruption Layers, Domain Events – these are all tools for exactly that task. Not a single pattern asks whether this complexity actually needs to exist.\u003c/p\u003e\n\n\u003cp\u003eA significant part of what we consider essential complexity is in reality homegrown. Processes that have “always been this way” are no longer perceived as requiring justification. Their mere existence is taken as proof of their necessity. This is not a DDD problem. It is better explained by the psychological phenomenon of \u003ca href=\"https://en.wikipedia.org/wiki/Status_quo_bias\"\u003estatus quo bias\u003c/a\u003e, well described by Kahneman and Tversky: losses weigh psychologically heavier than equivalent gains, and giving up the status quo feels like a loss.\u003c/p\u003e\n\n\u003cp\u003eDDD is the right tool for essential complexity. The problem is that the methodology offers no answer to the question of how to recognize whether complexity is truly essential. And because DDD’s tools are so good at making complexity manageable, the pressure that would be needed to even ask that question subsides. It is like a very effective painkiller: once the symptoms are well treated, nobody asks about the cause anymore.\u003c/p\u003e\n\n\u003ch3 id=\"what-happens-in-practice\"\u003eWhat happens in practice\u003c/h3\u003e\n\n\u003cp\u003eFrom my own observations across a wide range of projects, I can say that the use of tactical DDD patterns has become an unquestioned modus operandi. That is not necessarily wrong – they are a useful shared design language, by now almost a lingua franca. But it is also a symptom of complexity being increasingly accepted as a normal state of affairs.\u003c/p\u003e\n\n\u003cp\u003eWith the strategic patterns, things get more interesting. Tools like Bounded Contexts, Context Maps, and Subdomains are explicitly designed to find the \u003cem\u003eright\u003c/em\u003e cuts. Evans and Vernon would never say: trace the existing departmental boundaries and call that a Context Map. Yet that is exactly what happens far too often. The monolith is decomposed, the Bounded Contexts follow Conway’s Law, and the new architecture mirrors the old organizational structure. The dysfunction gets new clothes.\u003c/p\u003e\n\n\u003cp\u003eThis is not a misapplication that can be blamed on DDD. But it happens systematically, and the explanation lies not in DDD itself but in the context in which it is deployed.\u003c/p\u003e\n\n\u003cp\u003eThe result is often that new software is built with modern technologies, yet it merely reproduces the old processes. Frequently not even that – because in the legacy world many processes are not explicitly modeled in software at all, but live in people’s heads and in Excel sheets.\u003c/p\u003e\n\n\u003cp\u003eA symptom of such modernization projects is user stories that describe only the desired effects on the user interface, while keeping silent about the underlying business process – a badge that displays a status, a button that saves it, and a business workflow that continues to happen entirely in the heads of employees, outside the expensively modernized software.\u003c/p\u003e\n\n\u003ch3 id=\"too-late-too-much-pressure\"\u003eToo late, too much pressure\u003c/h3\u003e\n\n\u003cp\u003eWhy is this? There is no simple explanation, but in my experience several factors play a role. One is that modernization projects often do not start when it would make sense, but when the pain becomes unbearable. The legacy system causes so many problems that it needs to be shut down as quickly as possible. That very pressure is why nobody has time to rethink processes anymore.\u003c/p\u003e\n\n\u003cp\u003eReinertsen\u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fn:3\" id=\"fnref:3\" title=\"see footnote\" class=\"footnote\"\u003e[3]\u003c/a\u003e popularized \u003cem\u003eCost of Delay\u003c/em\u003e as a metric in the context of new product development: what does it cost us for every week we bring the product to market later than planned? But the concept applies equally to legacy modernization. Those who modernize early still have room to make decisions. Those who wait until things are on fire modernize under emergency conditions – every further delay in retiring the old system is extremely costly.\u003c/p\u003e\n\n\u003cp\u003eModeling existing processes with DDD does take time. But it takes less time than genuine process reflection and change, and above all it requires no organizational courage. Under pressure, the methodology that saves both will always win.\u003c/p\u003e\n\n\u003ch3 id=\"the-missing-mandate\"\u003eThe missing mandate\u003c/h3\u003e\n\n\u003cp\u003eEven without time pressure: clients typically order software, not process consulting. IT architects have a mandate for technology, not for the business domain. Everyone does their job. Nobody does the job that is actually needed.\u003c/p\u003e\n\n\u003cp\u003eThere is an important reason why the mandate is so rarely given that goes beyond time pressure and role definitions. Process simplification is simply existentially threatening to some stakeholders. Those stakeholders and corresponding departments have a structural interest in the existence of processes that need to be managed. That is not a criticism; it is rational self-interest, shaped by organizational structures and incentives. But it explains why the question of simplification is so rarely asked.\u003c/p\u003e\n\n\u003ch3 id=\"domain-storytelling-to-be-the-neglected-tool\"\u003eDomain Storytelling to-be – the neglected tool\u003c/h3\u003e\n\n\u003cp\u003eThe tool for the right step does exist, by the way. Hofer and Schwentner built the to-be scenario into \u003cem\u003eDomain Storytelling\u003c/em\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fn:4\" id=\"fnref:4\" title=\"see footnote\" class=\"footnote\"\u003e[4]\u003c/a\u003e as an equal mode, not an afterthought. The explicit purpose is not to model how things are, but how they should be.\u003c/p\u003e\n\n\u003cp\u003eIn practice, it is almost always only as-is, or a half-hearted to-be that does not dare to deviate significantly from existing processes and structures – simply because the project context is oriented toward implementation, not process change. The moment when a to-be that actually changes something would be possible has often already passed by the time the project starts.\u003c/p\u003e\n\n\u003ch3 id=\"complexity-as-a-status-symbol\"\u003eComplexity as a status symbol\u003c/h3\u003e\n\n\u003cp\u003eThere is one more dimension that is rarely discussed openly. Some people practice DDD not despite the complexity, but because of it. Complex solutions signal expertise. The DDD vocabulary signals community membership. Whoever eliminates complexity has nothing to show for it. Whoever presents a sophisticated domain model demonstrates skill.\u003c/p\u003e\n\n\u003cp\u003eThis is a specific manifestation of what Graeber describes in \u003cem\u003eBullshit Jobs\u003c/em\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fn:5\" id=\"fnref:5\" title=\"see footnote\" class=\"footnote\"\u003e[5]\u003c/a\u003e: people who create meaning for themselves by managing complexity they generate themselves. For consultants, a structural incentive adds to this: complexity creates work.\u003c/p\u003e\n\n\u003ch3 id=\"the-sobering-balance-sheet\"\u003eThe sobering balance sheet\u003c/h3\u003e\n\n\u003cp\u003eThe wave of digitalization and modernization projects of the last ten years was real. The results of these projects are sobering. The main obstacle is usually not bad technology, but an organization’s lack of readiness to change.\u003c/p\u003e\n\n\u003cp\u003eThat is no coincidence. It is the direct consequence of treating modernization as a technical problem. New systems, new architecture, carefully modeled domains – but the organization stays as it is. Then the outcome is predictable: the dysfunction outlives the technology.\u003c/p\u003e\n\n\u003ch3 id=\"conclusion\"\u003eConclusion\u003c/h3\u003e\n\n\u003cp\u003eModernization and digitalization projects are not technology projects. They require an organization that is ready to change – and that is more than introducing new software and training employees in how to use it. As long as this is not understood, as long as modernization is treated as a technical problem requiring technical solutions, nothing will change structurally. DDD offers the perfect tool for that mindset, and nobody has a strong incentive to change it.\u003c/p\u003e\n\n\u003cp\u003eWhat developers and consultants could do: ask persistently. Why does it have to be this way? Why not just leave the legacy system running? Not as rhetoric, but as a genuine question, asked again and again. The most important question is not how to model something, but whether it needs to exist at all.\u003c/p\u003e\n\n\u003cfoot-notes class=\"footnotes\"\u003e\n\n\u003col class=\"footnotes__list\"\u003e\n\n\u003cli id=\"fn:1\"\u003e\n\u003cp\u003eEric Evans: Domain-Driven Design: Tackling Complexity in the Heart of Software, Addison-Wesley, 2003 \u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fnref:1\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:2\"\u003e\n\u003cp\u003eFred Brooks: No Silver Bullet: Essence and Accidents of Software Engineering, 1986 \u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fnref:2\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:3\"\u003e\n\u003cp\u003eDonald Reinertsen: The Principles of Product Development Flow, Cenna Books, 2009 \u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fnref:3\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:4\"\u003e\n\u003cp\u003eStefan Hofer, Henning Schwentner: Domain Storytelling, Addison-Wesley, 2021 \u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fnref:4\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:5\"\u003e\n\u003cp\u003eDavid Graeber: Bullshit Jobs, Simon \u0026amp; Schuster, 2018 \u003ca href=\"https://www.innoq.com/en/blog/2026/04/accidental-complexity-in-the-heart-of-software/#fnref:5\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003c/ol\u003e\n\u003c/foot-notes\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/3jtcoffnqxqt5mnctcnec6dsoy7t?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/","url":"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/","date_published":"2026-04-20T00:00:00.000+00:00","date_modified":"2026-04-20T16:59:20.813+02:00","author":[{"name":"Daniel Westheide","url":"https://www.innoq.com/en/staff/daniel-westheide/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003e\u003cem\u003eThis is the fourth post in “Developing with AI Through the Cognitive Lens,” a series exploring how AI tools affect the way programmers and development teams learn, work, and build expertise. Drawing on cognitive psychology research, this series examines what happens when we delegate cognitive work to AI. In this post, the lens widens beyond coding to requirements engineering. The goal of the series isn’t to arrive at a predetermined verdict on AI, but to follow the cognitive evidence wherever it leads. Sometimes, as in this post, it leads to fundamental skepticism about a whole class of tools.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eIn the previous posts of this series, we focused on what cognitive psychology can tell us about beneficial and harmful ways to use AI for coding. But coding is only a small part of software development. Increasingly, AI agents are being put to use for eliciting requirements. Marketing claims for tools like BMAD speak of hours instead of weeks spent on requirements engineering. Can BMAD and other tools for spec-driven development really replace established methods of learning about requirements? Cognitive psychology, it turns out, has quite a lot to say about this.\u003c/p\u003e\n\n\u003ch3 id=\"how-bmad-and-other-sdd-tools-work\"\u003eHow BMAD and other SDD tools work\u003c/h3\u003e\n\n\u003cp\u003eTools like BMAD promise to drastically speed up the requirements engineering process. Instead of weeks of workshops, interviews, and iterative refinement, an AI agent guides stakeholders through a structured elicitation process and produces a comprehensive specification document in a matter of hours. Buildmode.dev, one of the more prominent advocates of this approach, \u003ca href=\"https://buildmode.dev/blog/mastering-bmad-method-2025/\"\u003eclaims to reduce requirements discovery\u003c/a\u003e from “2–3 weeks to 6 hours.”\u003c/p\u003e\n\n\u003cp\u003eThe workflow usually starts with a product idea or a rough vision. Acting in the role of a business analyst, an AI agent, called Mary in the case of BMAD, then interviews the stakeholder or domain expert, asking questions about users, goals, constraints, and technical requirement. It turns the answers into a specification document that serves as the blueprint for implementation. In more ambitious setups like BMAD, additional agents decompose this specification into epics, stories, and tasks, which yet another layer of agents implements. The human moves from doing the work to providing domain knowledge and reviewing the output.\u003c/p\u003e\n\n\u003cp\u003eAll this can sound very appealing, especially to anyone who has sat through lengthy requirements workshops that seemed to produce little more than a long list of assumptions dressed up as decisions. If an agent can do the same job faster and more systematically, why wouldn’t you use it?\u003c/p\u003e\n\n\u003ch3 id=\"the-extraction-paradigm\"\u003eThe extraction paradigm\u003c/h3\u003e\n\n\u003cp\u003eWhat these tools have in common is an implicit assumption about the nature of domain knowledge: that it exists in the heads of stakeholders and domain experts, waiting to be retrieved. The right questions, asked in the right order, will bring it to the surface. The business analyst agent has the role of a skilled interviewer, it is systematic, thorough, patient, and, in a way, relentless. With this assumption, the tools follow a paradigm of extraction. Knowledge is seen as a resource to be mined, the human as the deposit.\u003c/p\u003e\n\n\u003cp\u003eThis assumption is so deeply embedded in the workflow that it rarely gets stated explicitly. But occasionally it does. The Buildmode.dev post mentioned above describes their approach as replacing “iterative discovery”. Iterative discovery implies that requirements emerge over time, through feedback, building and learning. Replacing it means believing that the knowledge is already there, fully formed, and only needs to be drawn out efficiently.\u003c/p\u003e\n\n\u003cp\u003eMost tools stop here and leave the human as an imperfect but tolerable source. Some go further. \u003ca href=\"https://github.com/Q00/ouroboros\"\u003eOuroboros\u003c/a\u003e, an agent framework whose \u003ca href=\"https://github.com/Q00/ouroboros/blob/625dec9499e812c62579334773c59b799200f195/README.md\"\u003eold README\u003c/a\u003e stated bluntly that “HUMANS ARE NOT RATIONAL,” takes the logical next step: if humans cannot reliably articulate what they know, the problem is not the extraction technique. The problem is the human.\u003c/p\u003e\n\n\u003ch3 id=\"what-cognitive-science-tells-us\"\u003eWhat cognitive science tells us\u003c/h3\u003e\n\n\u003cp\u003eMichael Polanyi’s observation that “we can know more than we can tell” is probably the most concise summary of the problem. In \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fn:1\" id=\"fnref:1\" title=\"see footnote\" class=\"footnote\"\u003e[1]\u003c/a\u003e: The Tacit Dimension (1966), Polanyi argues that much of what experts know is not consciously accessible and can thus not be articulated. This is implicit or tacit knowledge, in contract to explicit knowledge.\u003c/p\u003e\n\n\u003cp\u003eWhen an experienced domain expert describes her process to an interviewer, she will inevitably leave things out because she does not know she knows them. Some good candidates for this are the steps she always takes when dealing with a particular edge case, or the implicit check she runs before escalating an issue. There is a good chance that neither of these will surface in a structured interview, because they have long since been internalised below the threshold of conscious reflection.\u003c/p\u003e\n\n\u003cp\u003eThe SECI model described by \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fn:2\" id=\"fnref:2\" title=\"see footnote\" class=\"footnote\"\u003e[2]\u003c/a\u003e: Nonaka (1991) adds another dimension. Tacit knowledge is not merely hard to articulate, it’s fundamentally social. Knowledge creation in organisations happens through cycles of socialisation, externalisation, combination, and internalisation. These processes require a shared context, trust, and time.\u003c/p\u003e\n\n\u003cp\u003eThe important point is that the techniques that actually work for making tacit knowledge explicit are not interview-based. They rely on shared practice and direct observation. For instance, a co-worker who watches over someone’s shoulder and asks “why did you do that just now?” is far more likely to surface implicit knowledge than any structured questioning. An AI agent conducting a text-based interview is, by definition, outside this social fabric.\u003c/p\u003e\n\n\u003cp\u003eThere is a third dimension that rarely enters the requirements engineering discussion: time. Graham Wallas’s classic model of creative cognition \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fn:3\" id=\"fnref:3\" title=\"see footnote\" class=\"footnote\"\u003e[3]\u003c/a\u003e identified incubation as essential to insight. Incubation refers to the fact that the mind continues working on problems below the threshold of conscious attention. \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fn:4\" id=\"fnref:4\" title=\"see footnote\" class=\"footnote\"\u003e[4]\u003c/a\u003e: Cai et al. (2009) provided empirical support for this in a study showing that REM sleep enhances the ability to integrate information and recognise non-obvious connections.\u003c/p\u003e\n\n\u003cp\u003eWhat does this mean for requirements work? First and foremost, we need to accept that some of the most valuable domain insights cannot be scheduled. They arrive when an expert is in the shower, or wakes up at 3am with sudden clarity about why the current process is broken. A six-hour interview session with an AI agent has no room for incubation. It does not compress this phase, it eliminates it.\u003c/p\u003e\n\n\u003cp\u003eTaken together, these three dimensions of evidence point to the same conclusion: the extraction paradigm misunderstands the nature of what it is trying to extract. Domain knowledge is not a static deposit waiting to be mined. It’s tacit, social, and temporally distributed. Any elicitation method that ignores these properties will miss the knowledge that matters most.\u003c/p\u003e\n\n\u003ch3 id=\"how-to-make-tacit-knowledge-explicit\"\u003eHow to make tacit knowledge explicit\u003c/h3\u003e\n\n\u003cp\u003eThis is not a new problem, and the software development community has developed methods that take it seriously. Domain Storytelling, for instance, uses collaborative narrative sessions where domain experts tell stories about their work while a facilitator captures them in a visual notation to surface the language, the actors, and the workflows that actually matter. The method works because it creates a shared situation: experts and developers are in the same room, the story unfolds in real time, and misunderstandings become visible immediately. Similarly, Event Storming brings together developers and domain experts around a shared timeline of domain events, relying on the productive friction of different perspectives colliding to reveal what no single participant could have articulated alone.\u003c/p\u003e\n\n\u003cp\u003eThese methods share a common assumption that stands in direct contrast to the extraction paradigm: domain knowledge exists — but it becomes explicit through multiple conversations and iterations, not in advance of them.\u003c/p\u003e\n\n\u003cp\u003eDomain-Driven Design is sometimes misread as an argument for thorough upfront domain modelling before implementation begins. Martin Fowler’s foreword to Eric Evans’s original book \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fn:5\" id=\"fnref:5\" title=\"see footnote\" class=\"footnote\"\u003e[5]\u003c/a\u003e corrects this directly. Fowler writes that powerful domain models evolve over time, and that even the most experienced modellers find their best ideas emerge after the initial releases of a system. Domain-driven design was never a license for big upfront design. It has always been meant as a method of sustained, iterative engagement with the domain throughout the life of a project. A lot of important knowledge arrives late, earned through the experience of building and using the system.\u003c/p\u003e\n\n\u003cp\u003eThis is precisely what a six-hour elicitation session cannot buy.\u003c/p\u003e\n\n\u003ch3 id=\"conclusion\"\u003eConclusion\u003c/h3\u003e\n\n\u003cp\u003eThe logic of the extraction paradigm, followed to its conclusion, does not stop at better interviews. Ouroboros, the agent framework I mentioned earlier, makes the next step explicit. The problem, it concludes, is not the extraction technique, it is the human. Its proposed solution is that it “fixes the human, not the machine.” The improved human is one who communicates more clearly, more consistently, more completely. One who, in short, is easier for a machine to process.\u003c/p\u003e\n\n\u003cp\u003eThis is what the \u003ca href=\"https://pluralistic.net/2025/12/05/pop-that-bubble/#u-washington\"\u003eReverse Centaur\u003c/a\u003e looks like in practice. The classical centaur (human judgment directing machine capability) has quietly inverted. The machine sets the agenda, defines the categories, asks the questions. The human’s job is to fit into the structure the machine provides. BMAD does not do this through coercion. It does it through the appearance of helpfulness: a guided process, structured questions, a clear output. You aren’t asked to think differently. You’re simply led through a workflow that rewards machine-readable answers and has no place for the ambiguous, the half-formed, or the tacit that cannot be articulated yet.\u003c/p\u003e\n\n\u003cp\u003eThis inversion is not an accident. It’s the natural endpoint of a paradigm that treats knowledge as a resource to be extracted rather than a capability to be developed. And that paradigm did not begin with BMAD or Ouroboros. Large language models are themselves its most ambitious expression: trained on the accumulated written knowledge and culture of humanity, compressed into a statistical model, without the consent or compensation of those who produced it. BMAD and Ouroboros are not outliers. They are the same logic applied one step further, from extracting human knowledge into a model, to extracting domain knowledge into a specification, to gradually reshaping the human who provides it into something the model can more readily use.\u003c/p\u003e\n\n\u003cp\u003eThe question this raises is not primarily technical. It is about the direction of adaptation. Technology has always changed how people work and think. That’s old news. What is worth noticing is when the adaptation runs in one direction only: when the human is expected to become more legible to the machine, while the machine is not expected to become more capable of meeting the human where she is.\u003c/p\u003e\n\n\u003cfoot-notes class=\"footnotes\"\u003e\n\n\u003col class=\"footnotes__list\"\u003e\n\n\u003cli id=\"fn:1\"\u003e\n\u003cp\u003ePolanyi, M. (1966). The Tacit Dimension. Doubleday. \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fnref:1\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:2\"\u003e\n\u003cp\u003eNonaka, I. (1991). The knowledge-creating company. Harvard Business Review, 69(6), 96–104. \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fnref:2\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:3\"\u003e\n\u003cp\u003eWallas, G. (1926). The Art of Thought. Harcourt Brace. \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fnref:3\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:4\"\u003e\n\u003cp\u003eCai, D. J., Mednick, S. A., Harrison, E. M., Kanady, J. C., \u0026amp; Mednick, S. C. (2009). REM, not incubation, improves creativity by priming associative networks. Proceedings of the National Academy of Sciences, 106(25), 10130–10134. \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fnref:4\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:5\"\u003e\n\u003cp\u003eEvans, E. (2003). Domain-Driven Design: Tackling Complexity in the Heart of Software. Addison-Wesley. (Foreword by Martin Fowler) \u003ca href=\"https://www.innoq.com/en/blog/2026/04/ai-cognitive-lens-domain-knowledge/#fnref:5\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003c/ol\u003e\n\u003c/foot-notes\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/me75f3ayfm5160jf2we0zdzyu7sr?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/04/vom-vibe-coder-zum-agentic-engineer/","url":"https://www.innoq.com/en/blog/2026/04/vom-vibe-coder-zum-agentic-engineer/","date_published":"2026-04-20T00:00:00.000+00:00","date_modified":"2026-04-20T11:21:17.717+02:00","author":[{"name":"Torben Keller","url":"https://www.innoq.com/en/staff/torben-keller/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eIf you let agents work without tests, linting, or architecture constraints, you’ll quickly see where that leads. The code compiles, maybe even works, but nobody can say what happens when you change something. Structure erodes, boundaries blur, and after a few weeks you have a classic Big Ball of Mud, except this time it was created in record time.\u003c/p\u003e\n\n\u003cp\u003eThe real problem isn’t that agents write bad code. They write the code you allow them to write, and without guardrails, everything is allowed. That doesn’t scale, especially not at the speed agents bring to the table. Vibe Coding, generating code on demand without systematic control, feels like a huge productivity boost at first. But without direction, you’re mainly producing technical debt faster.\u003c/p\u003e\n\n\u003cp\u003eThe question is actually simple: If I’m no longer reading every line, how do I still take ownership of the code? The answer is closer than you might think, because we’ve solved this problem before.\u003c/p\u003e\n\n\u003ch3 id=\"this-problem-isn-t-new\"\u003eThis Problem Isn’t New\u003c/h3\u003e\n\n\u003cp\u003eImagine you’re joining an existing project. The team has changed, the original developers are long gone, and you inherit 200,000+ lines of code. You don’t understand every line, not even every module. Yet you still ship features, fix bugs, and take responsibility. That’s normal brownfield work, and every one of us has done it.\u003c/p\u003e\n\n\u003cp\u003eNobody would expect you to read and understand every line in a brownfield project before you’re allowed to open a pull request. Instead, you rely on systems: tests tell you if you broke something, linters enforce conventions, CI pipelines catch errors before they reach production. You understand the part you’re working on and trust that the rest is covered.\u003c/p\u003e\n\n\u003cp\u003eAnd that’s the key insight: \u003cstrong\u003eI don’t need to know every line for ownership. I need to be confident that my change works and doesn’t break anything.\u003c/strong\u003e\u003c/p\u003e\n\n\u003cp\u003eAI-generated code is essentially code from a colleague who’s no longer in the room, except this colleague produces significantly faster and significantly more. What makes brownfield possible in the first place is good modularization: loose coupling, high cohesion. Well-defined modules let you understand a small part and change it safely without keeping the entire system in your head. Agents benefit from this just as much, because the better the module boundaries, the more focused and error-free they work.\u003c/p\u003e\n\n\u003cp\u003eThe knowledge of how to own code you didn’t write is already in us. The question is just how we transfer it to working with agents.\u003c/p\u003e\n\n\u003ch2 id=\"fromspreadsheettolintrule\"\u003eFrom Spreadsheet to Lint Rule\u003c/h2\u003e\n\n\u003cp\u003eBoris Cherny, known today as the creator of Claude Code, provides a good example in \u003ca href=\"https://newsletter.pragmaticengineer.com/p/building-claude-code-with-boris-cherny\"\u003ethe the pragmatic engineers podcast\u003c/a\u003e. At Meta, he was one of the most prolific code reviewers, and his method was surprisingly analog: a spreadsheet. Every time he left the same comment in a review, like “no \u003ccode\u003eany\u003c/code\u003e here please” or “error handling missing”, he added it to the list. When the same feedback appeared three or four times, he wrote a lint rule for it.\u003c/p\u003e\n\n\u003cp\u003eThe principle behind it: repetitive human feedback becomes automated enforcement. Instead of saying the same thing in code reviews over and over, you build a system that says it for you. The reviewer becomes a system designer.\u003c/p\u003e\n\n\u003cp\u003eThat this effort pays off is also measurable. An \u003ca href=\"https://newsletter.pragmaticengineer.com/p/building-claude-code-with-boris-cherny\"\u003einternal study at Meta\u003c/a\u003e showed that clean codebases have a double-digit percentage impact on engineering productivity. Half-finished migrations, inconsistent patterns, and outdated conventions don’t just slow down human developers, they confuse AI models just as much. An agent working in a codebase with three different error-handling patterns will reproduce all three. Consistency is the foundation for both humans and agents to work effectively.\u003c/p\u003e\n\n\u003cp\u003eCherny’s spreadsheet method sounds almost trivial, but it contains a principle that takes on an entirely new dimension in the context of AI agents. More on that shortly.\u003c/p\u003e\n\n\u003ch3 id=\"the-agent-harness-more-than-just-the-tool\"\u003eThe Agent Harness — More Than Just the Tool\u003c/h3\u003e\n\n\u003cp\u003eIf we take the principle of “turning recurring feedback into tooling” seriously, the question is: where does this tooling actually live? In which system do lint rules, tests, architecture checks, and prompts come together?\u003c/p\u003e\n\n\u003cp\u003eThe answer is the agent harness. At its core, this is the agent program itself: Claude Code, Codex, or Cursor with its agentic loop. Through hooks, we can integrate directly and give the agent feedback on every action. But the harness also includes the agent workflow around it: the environment the agent works in. Through Git hooks and CI/CD pipelines, we can also intervene there and give the agent feedback before code is even merged.\u003c/p\u003e\n\n\u003cp\u003eThe central question is: when I want to enforce a new constraint, at which integration point do I build it in? Two pyramids help with this decision, both inspired by the test pyramid.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg src=\"https://res.cloudinary.com/innoq/image/upload/v1/uploads-production/qy2qd2a7ee3vrawewhykt5fbbr15?_a=BACMTiAE\" alt=\"Pyramid of hooks: Agent Hooks (fast, frequent), Git Hooks, CI/CD (slow, rare).\" loading=\"lazy\"\u003e\r\n\r\n\u003cfigcaption\u003eThe Tooling Pyramid\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003e\u003cstrong\u003eThe Tooling Pyramid\u003c/strong\u003e covers the deterministic side. At the foundation are agent hooks, which run before or after tool calls. Because they execute on nearly every agent action, the integrated functions need to be extremely fast. Anything that takes longer slows the agent down on every single step. Above that are Git hooks (pre-commit, pre-push), which run less frequently and can therefore take a bit more time. Formatters, linters, or fast tests fit well here. At the top is CI/CD. That’s where checks go that take so long you wouldn’t want to run them locally as well, like comprehensive test suites or security scans. The principle is the same as with the test pyramid: what should always run must be fast. What’s slower belongs higher up.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg src=\"https://res.cloudinary.com/innoq/image/upload/v1/uploads-production/xolju6082ant1bkfti02k2o9f7v9?_a=BACMTiAE\" alt=\"Three-level pyramid: “AGENTS.md \u0026amp; Global Rules”, “Scoped Rules \u0026amp; Skills”, “Docs Specs”; general→specific and always→on demand.\" loading=\"lazy\"\u003e\r\n\r\n\u003cfigcaption\u003eThe Prompt Pyramid\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003e\u003cstrong\u003eThe Prompt Pyramid\u003c/strong\u003e addresses through prompts everything that can’t be checked deterministically. The sorting criterion here is the degree of specialization: the more general a piece of information, the lower it belongs, because it’s relevant for every task. The more specific, the higher up, because it’s only needed in certain situations. At the base are CLAUDE.md and global rules, which are always loaded in the agent’s context. In the middle are conditional rules, which apply when the agent works in specific areas of the codebase, and skills, which are loaded when the agent performs a specific type of task. At the top are docs, specs, and ADRs, available only on-demand when the agent actively needs them. At its core, this is context engineering: the right information at the right time in context, without blowing the token budget.\u003c/p\u003e\n\n\u003cp\u003eThe decision rule between the two pyramids: \u003cstrong\u003eIf you can tool it, tool it. If you can’t, prompt it.\u003c/strong\u003e What can be checked deterministically belongs in the tooling pyramid. Architecture decisions, domain conventions, or style questions are too loose for that and belong in the prompt pyramid. Deterministic enforcement is generally preferable to prompts.\u003c/p\u003e\n\n\u003ch3 id=\"the-harness-is-never-done\"\u003eThe Harness Is Never Done\u003c/h3\u003e\n\n\u003cp\u003eAn agent harness isn’t something you set up once and then forget. It grows with every PR you review.\u003c/p\u003e\n\n\u003cp\u003eThe cycle starts unremarkably: the agent produces a pull request, and in the review a problem comes up, for example an import from a layer that shouldn’t have access, or a test without assertions. Up to this point, normal developer life. But now comes the crucial question: \u003cem\u003eIs there tooling that could detect this problem automatically?\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eIf yes, it gets integrated: maybe a new lint rule or an ArchUnit test. From then on, the agent runs against this new tooling next time and gets feedback not in the review, but immediately. If the check fails, it fixes the problem itself, before a human ever sees it.\u003c/p\u003e\n\n\u003cp\u003eThis is an incremental process, but the effect is cumulative: every problem that’s codified as tooling once never appears again. After weeks, the harness has dozens of such checks. After months, it’s a dense safety net that reflects exactly the quality requirements of the project.\u003c/p\u003e\n\n\u003cp\u003eThis is where the circle closes back to Boris Cherny. He tracked recurring review comments in a spreadsheet and turned them into lint rules — exactly the same principle. Except that with agents it takes on a new quality, because the agent doesn’t just produce the code, it also helps build the tooling. The spreadsheet was manual, the feedback loop with agents is a self-reinforcing cycle. With enough feedback and time, agents converge on the right solution, not through magic, but through systematic feedback, consistently applied.\u003c/p\u003e\n\n\u003ch2 id=\"whathumansstillneedtoown\"\u003eWhat Humans Still Need to Own\u003c/h2\u003e\n\n\u003cp\u003eTooling makes code correct, but it doesn’t check whether the code does the right thing. Linters catch style problems, type checks prevent runtime errors, architecture tests secure boundaries. But whether the business logic is correct, no hook can answer.\u003c/p\u003e\n\n\u003cp\u003eThis is also the most subtle danger with agent-generated code: tests verify the agent’s assumptions. If the assumptions are wrong, the tests are still green. A concrete example: the agent implements a discount calculation and writes tests for it, but both are based on the same misunderstanding of the business requirement. The feedback loop doesn’t help here because there’s nothing to check against.\u003c/p\u003e\n\n\u003cp\u003eThat’s why business logic stays with humans. Not every line and not every file, but the places where domain decisions are made. This has consequences for architecture: if you want this review to work efficiently, you have to design for it. Isolate the domain layer, separate business logic from infrastructure. Not because it’s theoretically elegant, but because it focuses human review on what only a human can check. The rest is covered by tooling.\u003c/p\u003e\n\n\u003ch3 id=\"the-mindset-shift-from-vibe-coder-to-agentic-engineer\"\u003eThe Mindset Shift — From Vibe Coder to Agentic Engineer\u003c/h3\u003e\n\n\u003cp\u003eVibe coders generate code on demand, without any quality assurance, and hope the result works out. Agentic engineers work just as fast with agents, but they invest in their harness: constraints, feedback systems, deterministic checks. They can focus on domain logic because their systems cover the rest. What separates the two isn’t speed, but control. And whoever has that control can ultimately take ownership of their systems.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/r19r731k3kz5aqggoby5b8tzq4sm?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/04/versteckte-kosten-spec-driven-development/","url":"https://www.innoq.com/en/blog/2026/04/versteckte-kosten-spec-driven-development/","date_published":"2026-04-08T00:00:00.000+00:00","date_modified":"2026-04-27T10:49:55.851+02:00","author":[{"name":"Roman Stranghöner","url":"https://www.innoq.com/en/staff/roman-stranghoener/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eOver the past few months, spec-driven development has often helped me produce more – and better – AI-generated output than before. At the same time, I’ve also been noticeably more exhausted after many sessions.\u003c/p\u003e\n\n\u003cp\u003eNot despite the structure, but often \u003cem\u003ebecause\u003c/em\u003e of it.\u003c/p\u003e\n\n\u003cp\u003eAt first, that threw me off. Spec-driven development initially promises exactly the right thing: less chaos, more clarity, better results with agents. And that’s true, too – just not consistently.\u003c/p\u003e\n\n\u003cp\u003eBy now, my takeaway is pretty clear: the same structure that helps me during the thinking phase can slow me down during the build phase. Then I’m no longer building. I’m managing, phrasing, and documenting. And that’s exactly where the hidden costs of spec-driven development come from for me. I’m less interested in the big, abstract question of whether SDD is good or bad. I’m more interested in the practical one: \u003cstrong\u003eWhen does this way of working create real clarity – and when does it push thinking to the wrong place?\u003c/strong\u003e\u003c/p\u003e\n\n\u003ch3 id=\"the-right-kind-of-hard\"\u003eThe right kind of hard\u003c/h3\u003e\n\n\u003cp\u003eNot everything that’s hard is automatically bad. Some development work is hard because the problem itself is hard. Another part is hard because the way we work creates unnecessary friction.\u003c/p\u003e\n\n\u003cp\u003eThat difference is what has made spec-driven development interesting to me. With a framework like \u003ca href=\"https://docs.bmad-method.org/\"\u003eBMAD\u003c/a\u003e, I can think through an initiative more cleanly than I used to. It forces me to make fuzzy requirements explicit. I have to name constraints. I have to cut scope. I have to see gaps before I cover them up with generated code. That’s helpful.\u003c/p\u003e\n\n\u003cp\u003eBut at the same time, I’m noticing: this structure doesn’t feel equally useful in every phase. Sometimes it reduces complexity. Sometimes it just relocates it. And that difference isn’t theoretical for me – I feel it very concretely in my focus, my pace, and my energy.\u003c/p\u003e\n\n\u003cp\u003eThat’s why I no longer want to judge SDD in a blanket way. The real question, for me, is whether it’s the right kind of hard right now.\nDoes the structure help me understand? Or does it force me to spell out things that I could clarify much more easily while building, directly on the \u003cem\u003eliving thing\u003c/em\u003e?\u003c/p\u003e\n\n\u003ch3 id=\"why-sdd-can-help-during-the-thinking-phase\"\u003eWhy SDD can help during the thinking phase\u003c/h3\u003e\n\n\u003cp\u003eSpec-driven development solves a real problem when you’re starting from scratch. Anyone working with agents quickly learns that good context isn’t a nice-to-have.\nWell-structured context contributes massively to the solution.\u003c/p\u003e\n\n\u003cp\u003eIf I capture requirements, rough architecture, UX guardrails, and initial story cuts cleanly, a lot becomes easier later. Diffuse thoughts end up in clearer buckets. The problem gets smaller because I’ve already sorted it once. And later, the agent has something to hold on to—instead of starting from zero for every task.\u003c/p\u003e\n\n\u003cp\u003eThis helps me a lot, especially early in a project. Not because I never thought things through before, but because the framework forces me to really work out loose thoughts and write them down properly.\u003c/p\u003e\n\n\u003cp\u003e\u003cem\u003eWhat is this or that feature actually supposed to do, and for whom? Which constraints matter? Where do I deliberately want to decide nothing yet?\u003c/em\u003e For that kind of thinking, some structure is useful.\u003c/p\u003e\n\n\u003cp\u003eIf I described this using \u003ca href=\"https://www.innoq.com/en/blog/2026/03/ai-cognitive-lens-cognitive-load-theory/\"\u003eDaniel Westheide’s cognitive-load lens\u003c/a\u003e, it would be like this: in this phase, the framework reduces complexity. It breaks a big problem into smaller, workable parts.\u003c/p\u003e\n\n\u003cp\u003eThat’s \u003cstrong\u003ethe right kind of hard\u003c/strong\u003e. Or at least effort that feels meaningful to me.\u003c/p\u003e\n\n\u003ch3 id=\"when-building-devolves-into-documentation-work\"\u003eWhen building devolves into documentation work\u003c/h3\u003e\n\n\u003cp\u003eFor me, the problem often starts exactly where things should become productive: \u003cstrong\u003eimplementation\u003c/strong\u003e.\u003c/p\u003e\n\n\u003cp\u003eThen I’m no longer sitting in front of the actual problem. I’m sitting in front of a story spec, writing acceptance criteria for things that might never happen. I’m describing behavior for edge cases that I probably would have clarified in two iterations with the agent if I were building directly.\nI’m trying to press intuition into text upfront.\u003c/p\u003e\n\n\u003cp\u003eI notice this most clearly when generating a story from an epic. Before that, I’m still talking about functional requirements, non-functional requirements, and rough story skeletons – who are we building this for, what should happen, how will we know it’s enough. Instead, technical details and code examples creep in very quickly.\u003c/p\u003e\n\n\u003cp\u003eOne concrete example made this very clear. A story that was actually pretty simple – “Create User Profile” – turned into a massive package: 19 acceptance criteria, database fields, component mapping, dependency notes, file lists, test checklists, and later even review follow-ups. 1,070 lines of Markdown.\u003c/p\u003e\n\n\u003cp\u003eFormally, it was still called a story. In practice, it was half specification, half implementation plan, and half QA backlog. Yes, exactly – that’s three halves.\u003c/p\u003e\n\n\u003cp\u003eThat shifts the whole mode of thinking. I’m no longer talking about a story that makes sense to humans; I’m discussing a pre-decided implementation. The feedback loop gets longer because suddenly there are things on the table that didn’t need to be decided at that point. It gets even more problematic because those code snippets aren’t neutral. They prime the model that later implements the story. If I ever want to refactor the code or regenerate it with a better model, the story already contains a technical bias from an earlier planning step. That pre-commitment has long since seeped into the material.\u003c/p\u003e\n\n\u003ch3 id=\"why-not-everything-can-be-made-explicit\"\u003eWhy not everything can be made explicit\u003c/h3\u003e\n\n\u003cp\u003ePart of my discomfort has to do with something else: some of what makes development work good can’t be cleanly translated into language upfront.\u003c/p\u003e\n\n\u003cp\u003eThe best way I can describe it is through driving: when you’ve been driving for a while, you develop a sense for what will happen next. You see brake lights flicker three cars ahead. You ease off the gas, move into the left lane, pass, react proactively. It happens fast, almost without thinking. Coding feels similar to me. With some experience, you often sense early that a decision will cause problems three steps down the road. You adjust something before it breaks. Not because you could already articulate everything, but because you’re immersed in the thing itself.\u003c/p\u003e\n\n\u003cp\u003eSpec-driven frameworks try to close a gap with documents. That’s understandable. But part of that gap isn’t simply a documentation problem. It’s a problem of tacit knowledge. While building, we often know more than we can precisely say in advance. We recognize patterns. We notice something looks off before we can explain why.\u003c/p\u003e\n\n\u003cp\u003eTrying to press that experience fully into specs costs a lot of energy. And that’s the real loss for me: I have to switch from a mode of recognizing to a mode of explaining before I’ve even started building anything.\u003c/p\u003e\n\n\u003ch3 id=\"why-thick-specs-don-t-create-shared-understanding\"\u003eWhy thick specs don’t create shared understanding\u003c/h3\u003e\n\n\u003cp\u003eOne thought has stuck with me since a product owner training in 2017 with Jeff Patton and Jeff Gothelf: a story is not a document. It’s a token for a conversation.\u003c/p\u003e\n\n\u003cp\u003eFirst Card, then Conversation, then Confirmation. Not the other way around.\u003c/p\u003e\n\n\u003cp\u003eAcceptance criteria should confirm that you understood each other. They shouldn’t replace understanding with more and more upfront detail. That’s exactly where, for me, SDD tips over during the build phase. I write a document. The agent consumes it. There’s no real conversation – just a handoff.\u003c/p\u003e\n\n\u003cp\u003eAnd that removes something stories were originally meant to provide.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"Four images: A team believes they agree, then realizes they have different mental models, works through the problem together, and only then reaches real understanding.\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/lnkgb78uey78u1k7nj23jnjdku4k?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eReal understanding only emerges once everyone grapples with the same problem. Inspired by Jeff Patton.\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003eI experienced exactly that in a client project last year. Within the team, I invested a lot of effort into detailed user stories (specs). No BMAD involved, and with genuinely good intentions: a solid first draft, including lots of acceptance criteria, so the team could hit the ground running instead of starting from scratch. But the team read the specs very differently. To them, it wasn’t a draft for discussion – it was already a finished handoff document. Everything’s in there already, right? So there was a lot of discussion, but not about the thing we needed to build; it was about the scope and shape of the stories.\u003c/p\u003e\n\n\u003cp\u003eThe result: on one side, someone wondering why they’re getting only negative feedback. On the other side, a team wondering why they should still have a say if everything already seems decided.\u003c/p\u003e\n\n\u003cp\u003eThe stories stayed that thick for quite a while. And the review load grew with them. Because from a thick story, an agent quickly produces a huge implementation plan including database schema, component structure, and so on. And at some point, nobody reads it properly anymore. That’s human. But it undermines exactly what the stories were supposed to ensure.\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003e“Shared documents are not shared understanding”\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003ePatton nails it for me. The story gets thicker. I feel briefly safer. But that doesn’t mean something better will be built later. The document grows, but \u003ca href=\"https://margaretstorey.com/blog/2026/02/09/cognitive-debt/\"\u003eunderstanding doesn’t automatically grow with it\u003c/a\u003e.\u003c/p\u003e\n\n\u003cp\u003eIn the worst case, the opposite happens: I’ve produced a lot of text and convince myself I’ve thoroughly understood the problem – when what I’ve really produced is a clean handoff document.\u003c/p\u003e\n\n\u003cp\u003eIt looks like clarity. But sometimes it’s just a well-organized illusion of safety.\u003c/p\u003e\n\n\u003cp\u003eBMAD has a so-called party mode where different AI experts, each with a different perspective, discuss a story. It simulates a team conversation and sometimes brings surprising insights. But it remains a simulation. And the insights remain inside the AI session. A team that later has to live with the spec and the code learns nothing from it. No shared understanding flows back into people’s heads.\u003c/p\u003e\n\n\u003ch3 id=\"why-cheap-iteration-changes-the-equation\"\u003eWhy cheap iteration changes the equation\u003c/h3\u003e\n\n\u003cp\u003eMany SDD frameworks invest during planning as if \u003ca href=\"https://www.robert-glaser.de/what-if-iteration-is-all-we-need/\"\u003eiteration were still expensive\u003c/a\u003e. But with agents, it often isn’t anymore. A few quick rounds of “build it like this,” “no, more like that,” “yes, that’s it” are, in many situations, cheaper than spending hours polishing a perfect spec upfront.\u003c/p\u003e\n\n\u003cp\u003eSometimes the better answer is perhaps: start doing sooner and put better safety nets in place. That doesn’t mean planning becomes pointless. It just means the equation shifts. If the next testable prototype is minutes away instead of days, some of the upfront bureaucracy loses its economic advantage.\u003c/p\u003e\n\n\u003cp\u003eMore spec isn’t automatically the more reasonable form of risk management. Sometimes it’s more expensive thinking work – in more than one sense.\u003c/p\u003e\n\n\u003ch3 id=\"what-works-better-for-me\"\u003eWhat works better for me\u003c/h3\u003e\n\n\u003cp\u003eThat’s why I didn’t end up at “ditch the framework, just vibe-code.” That would be the wrong overcorrection. For example, I find an agent’s planning mode extremely useful. I just don’t want everything that is temporarily helpful in that thinking space to end up permanently in a story file.\u003c/p\u003e\n\n\u003cp\u003eWhat I do instead – and what I no longer do:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eIn the thinking phase\u003c/strong\u003e, I try to secure shared understanding in the team first, and only then capture the relevant information from that in a spec. Not the other way around. I still use a fixed structure for this – either BMAD or something homegrown. But I only write as much spec as actually creates real clarity for me or my team. The story stays short and describes what should happen for whom and how we’ll know it’s enough. Plus a few links to more context if needed. Essentially, \u003ca href=\"https://www.nngroup.com/articles/progressive-disclosure/\"\u003eprogressive disclosure\u003c/a\u003e for potentially relevant additional information such as the associated epic, personas, requirements, etc.\u003c/p\u003e\u003c/li\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eIn the build phase\u003c/strong\u003e, I work in short loops close to the code. Ideally in a flow state. I let the agent plan based on the specs, but that specific plan can be ephemeral. It’s a working document, not an artifact. What I no longer do: stuffing the plan—including code examples, database fields, or file lists—into the story.\u003c/p\u003e\u003c/li\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eDecisions\u003c/strong\u003e are something I deliberately separate from the spec. If architecture-relevant decisions come up during planning or implementation, I write an ADR. If it’s a product decision, it goes back into refinement or an updated story. If it’s a technical trade-off, it lands as a comment in the code or in architecture documentation. Decisions need a visible home, not a line in a 1,000-line Markdown file.\u003c/p\u003e\u003c/li\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eSafeguards\u003c/strong\u003e I gradually shift from the document into the \u003ca href=\"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/\"\u003eharness\u003c/a\u003e: tests, linters, architecture rules, reviews, and anything that makes recurring problems reliably—and ideally deterministically—visible to an agent. Relevant learnings from that belong in the harness-specific project memory. I like the harness idea so much because it shifts the emphasis: \u003ca href=\"https://www.hlyr.dev/blog/skill-issue-harness-engineering-for-coding-agents\"\u003esystematically safeguard what repeatedly goes wrong\u003c/a\u003e instead of spelling everything out upfront. Less upfront bureaucracy. More robust guardrails.\u003c/p\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3 id=\"as-much-as-needed-as-little-as-possible\"\u003eAs much as needed, as little as possible\u003c/h3\u003e\n\n\u003cp\u003eI don’t think spec-driven development is a dead end. BMAD has a clear place for me: thinking, sorting, slicing, and clarifying. That doesn’t just help me solo; it can also support teams—especially when this step gets shortchanged or quietly skipped.\u003c/p\u003e\n\n\u003cp\u003eBut when it comes to implementation work, I quickly end up writing more about work than I actually do. And then the story is no longer a prompt for conversation, but a handoff document written behind closed doors.\u003c/p\u003e\n\n\u003cp\u003eSDD frameworks noticeably increase output in every phase. But output is not outcome. Or as Patton put it in that product ownership training:\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003e“Minimize Output, Maximize Outcome \u0026amp; Impact.”\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eWhat I learned the long way around: the thicker spec was never the goal. It was a symptom that I was looking for safety in the document instead of in the process.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/nvrib3qr57pn7wqknx9a8phuv5va?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/04/jira-ki-ohne-cloud/","url":"https://www.innoq.com/en/blog/2026/04/jira-ki-ohne-cloud/","date_published":"2026-04-05T00:00:00.000+00:00","date_modified":"2026-04-10T14:43:03.734+02:00","author":[{"name":"Nicolas Inden","url":"https://www.innoq.com/en/staff/nicolas-inden/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003e\u003cem\u003eInspired by the \u003ca href=\"https://di.day/de\"\u003eDigital Independence Day\u003c/a\u003e and its call for “recipes” for digital sovereignty, we share our own approaches to topics that matter to our customers – every first Sunday of the month.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eThe best way to discuss business requirements is still through direct conversation with your team and your customer. Important implementation details often come up in side comments or are only implied between the lines. That is a long way from the structured input forms Jira gives us for creating issues. In my day-to-day work, one workflow has been a real game changer: take notes and transcripts from calls or chat discussions, hand them to an AI, and let it turn them into well-written Jira issues – with sensible fields, descriptions, and structure.\u003c/p\u003e\n\n\u003cp\u003eAtlassian has recognized the same need and added several AI features to Jira Cloud to support exactly this:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eSearch issues in natural language, for example: “Which epics were delivered in the last release?” or “Which tickets have we already had on topic XYZ?”\u003c/li\u003e\n\u003cli\u003eSummarize information in useful ways, for example by distilling the key points from issue comments or creating a business-level summary of an epic and its related issues\u003c/li\u003e\n\u003cli\u003eDraft issue descriptions and stories from meeting notes\u003c/li\u003e\n\u003cli\u003eBreak issues down into meaningful subtasks\u003c/li\u003e\n\u003cli\u003eand more\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eThe ability to turn an unstructured discussion directly into structured issues saves a significant amount of time.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eNote:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eAtlassian is clearly moving toward the cloud – support for the on-premises product Jira Data Center only runs through March 2029. Still, not everyone wants to use Atlassian Cloud, and that includes both Jira itself and the AI features Atlassian provides in the cloud. Existing on-premises installations can continue running until official support ends in March 2029. This article is for exactly those environments and shows how they can benefit from similar AI capabilities.\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eOur goal is to use Jira in a way that is GDPR-compliant and protected from the Cloud Act, including the AI features mentioned above.\u003c/p\u003e\n\n\u003ch3 id=\"what-you-need\"\u003eWhat you need\u003c/h3\u003e\n\n\u003cp\u003eHere is what we need:\u003c/p\u003e\n\n\u003col\u003e\n\u003cli\u003eA self-hosted Jira Server or Jira Data Center instance\u003c/li\u003e\n\u003cli\u003eAccess to an AI\u003c/li\u003e\n\u003cli\u003eAn MCP server (\u003ccode\u003emcp-atlassian\u003c/code\u003e) to connect the AI to Jira Server\u003c/li\u003e\n\u003cli\u003eA tool where the “discussion” between the user and the AI can happen\u003c/li\u003e\n\u003c/ol\u003e\n\n\u003ch4 id=\"the-jira-instance\"\u003eThe Jira instance\u003c/h4\u003e\n\n\u003cp\u003eFor \u003cstrong\u003eitem 1\u003c/strong\u003e, we assume you already have a working, licensed Jira Server or Jira Data Center instance running the current version 11. As of March 30, 2026, it is unfortunately no longer possible to obtain new licenses for either product. Even so, we expect that many projects will continue running on-premises until support ends in March 2029.\u003c/p\u003e\n\n\u003ch4 id=\"choosing-the-right-ai\"\u003eChoosing the right AI\u003c/h4\u003e\n\n\u003cp\u003eThings get more interesting with \u003cstrong\u003eitem 2\u003c/strong\u003e. Atlassian Cloud uses a \u003ca href=\"https://support.atlassian.com/rovo/docs/rovo-data-privacy-and-usage-guidelines/\"\u003emix of open-weight models and frontier models from OpenAI and Anthropic\u003c/a\u003e. That may still be GDPR-compliant. The real issue is the Cloud Act: because Atlassian internally \u003ca href=\"https://www.atlassian.com/trust/reliability/cloud-architecture-and-operational-practices#cloud-infrastructure\"\u003euses AWS\u003c/a\u003e – and because OpenAI and Anthropic do as well – all data is subject to Cloud Act exposure. So we need an alternative.\u003c/p\u003e\n\n\u003cp\u003eIn principle, there are two options: use an inference API operated by a European provider, or run inference locally on your own machine. European providers such as StackIt, IONOS, and Scaleway do offer pay-as-you-go inference APIs, but in some cases their selection of current high-performance models is still limited.\u003c/p\u003e\n\n\u003cp\u003eFor this recipe, we focus on local models. We use \u003ca href=\"https://lmstudio.ai/\"\u003eLM Studio\u003c/a\u003e as both the inference engine and the chat tool. LM Studio is available for macOS, Linux, and Windows and can be downloaded from the linked site. Once it is installed successfully, the next step is to choose the right model. Once it’s installed successfully, the next step is choosing the right model.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eNote:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eLocal inference requires hardware with a large amount of fast memory. That can mean a GPU with enough VRAM, or a system with a strong integrated GPU and enough shared RAM, such as Apple Silicon or AMD Strix Halo systems.\u003c/p\u003e\n\n\u003cp\u003eHardware requirements scale with the number of parameters in the model you choose. A very rough rule of thumb is 1 GB of RAM per 1 billion parameters. So \u003ccode\u003egpt-oss:20b\u003c/code\u003e should have 20 GB of (V)RAM available, although in practice a bit less is often enough.\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eFor our use case, we need a model that is designed for tool use. In practice, \u003ccode\u003egpt-oss:20b\u003c/code\u003e has proven to be a good compromise between capability, speed, and hardware requirements – an open-source model with strong tool-use support. If you have a bit more RAM available, start with \u003ccode\u003eqwen3.5:35b-a3b\u003c/code\u003e. You can search for and download both models in LM Studio under \u003cstrong\u003eModel Search\u003c/strong\u003e.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"LM Studio screenshot showing “GPT-OSS 20B” selected and details for “openai/gpt-oss-20b” in the model library.\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/8fjxru8y3ixbc7fumvcn98qup8zi?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eSelecting the desired large language model\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003eIt is definitely worth experimenting here. There is now a huge range of models available, and each comes with its own strengths and weaknesses. But the two models mentioned above are a solid place to start.\u003c/p\u003e\n\n\u003ch4 id=\"teaching-the-ai-to-talk-to-jira-mcp\"\u003eTeaching the AI to talk to Jira (MCP)\u003c/h4\u003e\n\n\u003cp\u003eAt this point, you can already chat with the local AI inside LM Studio. All of the AI’s knowledge is contained in its trained parameters—the so-called “weights.” There is still no connection to the outside world, though. To achieve our goal, the AI still needs the ability to talk to our application: Jira. That is what we address here in \u003cstrong\u003eitem 3\u003c/strong\u003e.\u003c/p\u003e\n\n\u003cp\u003eTo do that, we use a suitable \u003ca href=\"https://modelcontextprotocol.io/docs/learn/server-concepts\"\u003eMCP server\u003c/a\u003e. We chose \u003ca href=\"https://github.com/sooperset/mcp-atlassian\"\u003emcp-atlassian\u003c/a\u003e. \u003ccode\u003emcp-atlassian\u003c/code\u003e is an open-source project, it is actively maintained, and it has already collected plenty of stars on GitHub. Our own tests with \u003ccode\u003emcp-atlassian\u003c/code\u003e were successful as well.\u003c/p\u003e\n\n\u003cp\u003eTo use \u003ccode\u003emcp-atlassian\u003c/code\u003e without getting into the details of Python and virtual environments, we install a tool called \u003ccode\u003euv\u003c/code\u003e. It handles fetching and installing the current version of \u003ccode\u003emcp-atlassian\u003c/code\u003e for us. You can find all the details \u003ca href=\"https://docs.astral.sh/uv/getting-started/installation/\"\u003ehere\u003c/a\u003e.\u003c/p\u003e\n\n\u003cp\u003eTo let LM Studio use \u003ccode\u003emcp-atlassian\u003c/code\u003e, we need to add it to the MCP server configuration. You can do that here:\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"Developer UI showing “Local Server” and “Loaded Models” (READY, openai/gpt-oss-20b); red arrow points to “mcp.json”.\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/go5msbat5o77g3ml2d6y6cfw6jee?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eAdding the MCP server\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003eA \u003ccode\u003emcp.json\u003c/code\u003e file that can connect to a Jira Server started locally with the \u003ccode\u003ecompose.yml\u003c/code\u003e shown above looks like this:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n  \u003c/span\u003e\u003cspan class=\"nl\"\u003e\"mcpServers\"\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n    \u003c/span\u003e\u003cspan class=\"nl\"\u003e\"mcp-atlassian\"\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n      \u003c/span\u003e\u003cspan class=\"nl\"\u003e\"command\"\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s2\"\u003e\"uvx\"\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n      \u003c/span\u003e\u003cspan class=\"nl\"\u003e\"args\"\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e[\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n        \u003c/span\u003e\u003cspan class=\"s2\"\u003e\"mcp-atlassian\"\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n      \u003c/span\u003e\u003cspan class=\"p\"\u003e],\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n      \u003c/span\u003e\u003cspan class=\"nl\"\u003e\"env\"\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"p\"\u003e{\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n        \u003c/span\u003e\u003cspan class=\"nl\"\u003e\"JIRA_URL\"\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s2\"\u003e\"https://meine-jira-instanz.org\"\u003c/span\u003e\u003cspan class=\"p\"\u003e,\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n        \u003c/span\u003e\u003cspan class=\"nl\"\u003e\"JIRA_PERSONAL_TOKEN\"\u003c/span\u003e\u003cspan class=\"p\"\u003e:\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s2\"\u003e\"\u0026lt;dein-jira-personal-access-token\u0026gt;\"\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n      \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n    \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n  \u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003cspan class=\"w\"\u003e\r\n\u003c/span\u003e\u003cspan class=\"p\"\u003e}\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eWe configure LM Studio so it can launch the MCP server with \u003ccode\u003euvx mcp-atlassian\u003c/code\u003e, and we pass the required environment variables for the URL and the personal access token (PAT).\u003c/p\u003e\n\n\u003cp\u003eYou can get the PAT in your Jira profile. There, you can create a PAT for the MCP server to use. Of course, that also means that any action you perform in Jira through MCP will be associated with your user account.\u003c/p\u003e\n\n\u003cp\u003eAs a final step, start a new chat and select \u003ccode\u003emcp-atlassian\u003c/code\u003e. From that point on, the AI can use the tools provided by \u003ccode\u003emcp-atlassian\u003c/code\u003e and access your Jira instance.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"Integrations panel listing plugins; “mcp/mcp-atlassian” enabled, others off; search field “Type to filter plugins…\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/o5hxkdsfor50a5ok0qd1a7zabozm?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eEnabling the MCP server\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003eThat’s the hardest part done – now it’s time for fine-tuning.\u003c/p\u003e\n\n\u003ch4 id=\"the-chat\"\u003eThe chat\u003c/h4\u003e\n\n\u003cp\u003eThis is where \u003cstrong\u003eitem 4\u003c/strong\u003e gets really fun. LM Studio is not just our AI inference engine, but also our chat tool – the place where user interaction happens. In an LM Studio chat with \u003ccode\u003emcp-atlassian\u003c/code\u003e enabled, we can talk about anything that exists in our Jira project. The AI uses tools on its own to find issues on specific topics, from certain time periods, or created by particular users. It can also create issues. Here are a few prompts for inspiration:\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eExample prompts:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cul\u003e\n\u003cli\u003eWhich epics were addressed in the last release?\u003c/li\u003e\n\u003cli\u003eCreate business-facing release notes from this information\u003c/li\u003e\n\u003cli\u003eWhich open bugs are assigned to me?\u003c/li\u003e\n\u003cli\u003e…\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eTo test \u003cem\u003ecreating\u003c/em\u003e issues, prompts like the following are also useful:\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eExample prompt:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cul\u003e\n\u003cli\u003eMake up a feature and create an epic for it along with three related stories\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eIf you are already thinking one step ahead: with additional MCP servers, you can also connect other systems such as GitLab or GitHub. More on that in the conclusion.\u003c/p\u003e\n\n\u003cp\u003eThat said, a few minor issues become obvious pretty quickly – at least in my experience:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eevery now and then there are problems with umlauts in description texts\u003c/li\u003e\n\u003cli\u003ethe AI likes to use Markdown in issue descriptions, which Jira does not handle well\u003c/li\u003e\n\u003cli\u003e…\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eFortunately, there is a (non-deterministic) fix for that too: adjust the system prompt. This is where you can shape the AI’s behavior effectively—for example, by telling it to always show a preview and ask for confirmation before it actually creates an issue in Jira.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"UI screenshot showing preset “Product Issue Assistant” and a “System Prompt” field with German text for Jira issue drafting.\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/u0n3aflhpf15nbpjkqjzoexzj3hq?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eFine-tuning via the system prompt\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003e\u003cstrong\u003eMy current system prompt looks like this:\u003c/strong\u003e\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eYou are a skilled product owner with strong expertise in writing and slicing work items. You support the user in creating Jira issues by helping formulate descriptions and set issue fields correctly.\u003c/p\u003e\n\n\u003cp\u003eWhen the input makes it possible, suggest sensible values for these fields:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eType\u003c/li\u003e\n\u003cli\u003eComponents\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003ePlease answer in German.\u003c/p\u003e\n\n\u003cp\u003eWhen you create issues, show a preview and ask for confirmation before you actually create the issue with a tool.\u003c/p\u003e\n\n\u003cp\u003eUse umlauts directly (ä, ö, ü, ß) and standard quotation marks (\" or '').\u003c/p\u003e\n\n\u003cp\u003eYou work with Jira. When you create or edit issues in Jira using tools, use only Jira Wiki Markup in Jira descriptions with the following syntax: \u003ccode\u003eh1.\u003c/code\u003e for level-1 headings, \u003ccode\u003eh2.\u003c/code\u003e for level-2 headings, \u003ccode\u003eh3.\u003c/code\u003e for level-3 headings. Never use Markdown (\u003ccode\u003e#\u003c/code\u003e, \u003ccode\u003e##\u003c/code\u003e) or other syntax such as \u003ccode\u003e=Text=\u003c/code\u003e or \u003ccode\u003e====\u003c/code\u003e. Use \u003ccode\u003e-\u003c/code\u003e for lists.\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003ch3 id=\"conclusion\"\u003eConclusion\u003c/h3\u003e\n\n\u003cp\u003eA large share of the features offered by Atlassian Intelligence can be implemented successfully using only local AI. The scenario described here can even be taken further: if we connect not only Jira, but also the version control system used in the project – GitLab, GitHub, and others – through MCP, then the AI has all the information it needs to generate fine-grained release notes automatically. Of course, this requires some setup work and is not as out-of-the-box as the vendor solution. But it is absolutely practical and makes it possible to use AI even in highly privacy-sensitive environments. On top of that, this approach gives us a bit more protection from arbitrary price increases in cloud subscription models.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/mwk5y0gzh1eqtkpjhpsagohv7e3v?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/03/ueber-ai-einsatz-in-verschiedenen-coding-situationen/","url":"https://www.innoq.com/en/blog/2026/03/ueber-ai-einsatz-in-verschiedenen-coding-situationen/","date_published":"2026-03-31T00:00:00.000+00:00","date_modified":"2026-03-31T13:29:44.941+02:00","author":[{"name":"Markus Harrer","url":"https://www.innoq.com/en/staff/markus-harrer/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eWhere do I even start when trying to measure productivity gains from using Large Language Models (LLMs) in software development? In this short analysis, I simply draw on data from Yegor Denisov-Blanch’s talk \u003ca href=\"https://www.youtube.com/watch?v=tbDDYKRFjhk\"\u003eDoes AI Actually Boost Developer Productivity? (Stanford 100k Devs Study)\u003c/a\u003e. In it, 136 teams from 27 countries were surveyed on whether they see productivity improvements from using AI (more precisely: LLM-assisted software development).\u003c/p\u003e\n\n\u003cp\u003eThe following charts are relevant for my take on the “what really matters” factor; I’m repeating them here and adding my interpretation.\u003c/p\u003e\n\n\u003ch3 id=\"chart-i-context-is-the-brake\"\u003eChart I: Context Is the Brake\u003c/h3\u003e\n\n\u003cp\u003eOne of the most interesting insights from the talk, for me, is a 2x2 matrix that shows in which situations AI support actually adds productivity value for software developers. Instead of making blanket statements about AI productivity, the matrix breaks the question down along two dimensions: how mature the codebase is and how complex the task is. The results are more nuanced than the usual promises in glossy brochures (or on websites) from various AI tool vendors would have you believe.\u003c/p\u003e\n\n\u003cfigure style=\"margin: 2rem 2rem;\"\u003e\r\n\u003cimg alt=\"2x2 matrix; explanation and conclusions in the text below\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/b7viuvyqmknjmfqnschfquqodgy9?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eProductivity gains from AI usage by project maturity and task complexity\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003ch4 id=\"my-interpretation\"\u003eMy interpretation\u003c/h4\u003e\n\n\u003cp\u003eThe matrix shows that productivity gains from AI are highest in greenfield projects with low task complexity—study participants report an uplift of 35–40% there. To me, the reason is obvious: low-complexity tasks are often repetitive and clearly defined, so AI can reliably generate boilerplate-heavy code with minimal risk of errors. Also, I suspect we’re in the realm of to-do list apps here: written a thousand times, and nothing new happens the thousand-and-first time.\u003c/p\u003e\n\n\u003cp\u003eHowever, the gains drop sharply as project maturity increases and/or task complexity rises (i.e., as soon as things get serious):\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eIn brownfield and legacy projects, the gains fall to 15–20% even for simple maintenance tasks, because outdated code and complex dependencies limit what AI can contribute safely.\u003c/li\u003e\n\u003cli\u003eFor highly complex tasks in systems that already resemble a Big Ball of Mud, the gains shrink to just 0–10%, because the AI struggles to cut through tangled architectures, poorly implemented ideas, and deeply nested logic.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eThis doesn’t surprise me: the underlying training data largely comes from public code repositories. There’s a clear bias in what gets shared—code you don’t have to be embarrassed about in public (at least that’s true for me). The real bulk of code that deviates from those idealized images sits inside companies’ closed software systems. An LLM’s first encounter with that kind of code can be jarring, which makes it harder to adapt known patterns from the training data to the existing codebase. Or, as Ludwig Wittgenstein put it more than a hundred years ago:\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eThe limits of my language mean the limits of my world.\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eEven in ideal greenfield environments, highly complex work caps AI’s impact at 10–15%, because those tasks require deeper human judgment that mechanical automation can’t replace. AI can assist, but it still can’t replace architectural thinking and contextual judgment—both of which complex engineering and domain knowledge demand. That also ties back to the limited amount of available context capacity (see my assessment in \u003ca href=\"https://markusharrer.de/blog/2026/02/17/agentic-software-modernization-chances-and-traps/\"\u003e“Agentic Software Modernization: Chances and Traps”\u003c/a\u003e).\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eTL;DR:\u003c/strong\u003e AI delivers the most when the problem is tightly scoped and the codebase is clean. High task complexity and legacy code are the two main productivity killers for AI—especially in combination (which is likely the reality for most of us).\u003c/p\u003e\n\n\u003ch3 id=\"chart-ii-the-niche-penalty\"\u003eChart II: The Niche Penalty\u003c/h3\u003e\n\n\u003cp\u003eThe second chart shifts the perspective from project maturity to the choice of programming language. It turns out that the popularity of the language has a significant impact on how much an LLM can actually help—mainly driven by how much training data exists for that language.\u003c/p\u003e\n\n\u003cfigure style=\"margin: 2rem 2rem;\"\u003e\r\n\u003cimg alt=\"2x2 matrix; explanation and conclusions in the text below\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/f8ff9biarno5bscgu24q848l2ohr?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eImpact of the programming language on AI-driven productivity gains\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003ch4 id=\"my-interpretation\"\u003eMy interpretation\u003c/h4\u003e\n\n\u003cp\u003eWith widely used languages (e.g., Python, Java), LLMs provide the most benefit: for simple tasks, they boost productivity by 20–25% thanks to extensive training data (e.g., via reinforcement learning on thousands of simple question–answer pairs); for complex tasks, by 10–15%. In popular languages, LLMs can still provide solid support because they’ve seen huge amounts of diverse training data. But even in the best case, complex tasks still require human judgment—so AI acts more as an accelerator than a replacement.\u003c/p\u003e\n\n\u003cp\u003eWith niche languages (e.g., COBOL—though to me that’s already mainstream), the gains for simple tasks are negligible at 0–5% (due to limited training data). For highly complex tasks, things get even worse: productivity can drop to as low as -5%, because the AI enters a hallucination-prone zone where it confidently produces plausible-sounding but incorrect output. This highlights that AI tools without sufficient training data can become a liability rather than an advantage in complex development work. Personally, I don’t see this changing for the better anytime soon. It’s also becoming clear that even \u003ca href=\"https://github.com/IBM/rpg-genai-data\"\u003eactively asking for code in niche programming languages\u003c/a\u003e doesn’t lead to enough high-quality training data (and honestly: what insurance company wants to put its COBOL-written computational core on GitHub?).\u003c/p\u003e\n\n\u003cp\u003eThe underlying driver in all four quadrants is the same: the more training data exists for a given language and task type, the more reliably AI can contribute. Language popularity is therefore not just a matter of personal preference—it’s a direct indicator of how productively you can use LLM-assisted software development.\u003c/p\u003e\n\n\u003ch3 id=\"chart-iii-heaven-or-hell\"\u003eChart III: Heaven or Hell\u003c/h3\u003e\n\n\u003cp\u003eFor the third chart, I rather pragmatically combine the average productivity gains from the two previous 2x2 charts into a third perspective. It shows productivity gains split by programming language popularity and project maturity. I’m particularly interested in this view for a concrete reason: I sometimes work in projects that use programming languages that don’t even make it into the top 50 most popular languages in the TIOBE Index (https://www.tiobe.com/tiobe-index/)—and languages that will never show up there because they exist only within a single company. And of course, it’s worth mentioning: these are decades-old, massive software systems that are now slowly due for modernization.\u003c/p\u003e\n\n\u003cfigure style=\"margin: 2rem 2rem;\"\u003e\r\n\u003cimg alt=\"2x2 matrix; explanation and conclusions in the text below\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/u5g1c38rljnadujoeg76f85663he?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\r\n\u003cfigcaption\u003eAI assistance productivity gain matrix\u003c/figcaption\u003e\u003c/figure\u003e\r\n\n\u003cp\u003e\u003cem\u003eNote: This combined view is not a formally validated model. It’s a pragmatic thought experiment that connects two independent data sources by simply averaging them. It’s meant to provide orientation—not a precise prediction.\u003c/em\u003e\u003c/p\u003e\n\n\u003ch4 id=\"my-interpretation\"\u003eMy interpretation\u003c/h4\u003e\n\n\u003cp\u003eWhen you combine both dimensions—project maturity (greenfield vs. brownfield) and programming language popularity—you get four interesting quadrants. The best-case scenario, “AI Heaven,” happens when you’re working on a greenfield project in a widely used language: that’s where you can expect the highest productivity gains. It’s the ideal state: ample training data meets a clean, unburdened codebase. AI can reach its full potential. That’s why vibe coding and prototyping with languages like TypeScript and friends works so well.\u003c/p\u003e\n\n\u003cp\u003eIn brownfield projects written in popular languages, gains drop noticeably. Now you’re paying the price for letting code hygiene best practices slide (Yegor Denisov-Blanch also has an excellent talk on this: \u003ca href=\"https://www.youtube.com/watch?v=JvosMkuNxF8\"\u003e“Can you prove AI ROI in Software Eng?”\u003c/a\u003e). A Large Language Model still understands the well-known programming language just fine, but the complexity and technical debt in the existing codebase limit what it can contribute.\u003c/p\u003e\n\n\u003cp\u003eInterestingly, niche languages in greenfield projects still deliver noticeable gains—only slightly worse than the legacy-code scenario in popular languages. That suggests a clean codebase can partly compensate for weaker training data, although the language barrier still sets a meaningful ceiling. My bias here is that it’s simply always easier to start on a blank slate, no matter which language you use (I still remember the time when people kept saying “we’re just faster with Scala / F#,” which left me unimpressed even back then. It gets interesting once you have a mountain of code that goes beyond a to-do list).\u003c/p\u003e\n\n\u003cp\u003eThe worst-case scenario is “AI Hell”: a niche language combined with a brownfield codebase yields only minimal productivity gains. Here, both obstacles amplify each other. The AI has neither sufficient training data for the language nor the ability to meaningfully penetrate a tangled legacy codebase—the result is unreliable output and a high risk of doing more harm than good.\u003c/p\u003e\n\n\u003cp\u003eThe key takeaway: language popularity and project maturity both matter in their own right. And their negative effects add up. In other words, each dimension already reduces AI productivity on its own; together, they push AI-driven productivity gains down to the lowest level. Teams working with niche languages in legacy systems should be especially cautious about relying too heavily on AI tools (see, for example, my article \u003ca href=\"https://www.innoq.com/en/blog/2025/09/software-analytics-going-craizy/\"\u003e\"Software Analytics going crAIzy!\n\"\u003c/a\u003e).\u003c/p\u003e\n\n\u003cp\u003ePS: Did I mention I’m a fan of the \u003ca href=\"https://www.tqdev.com/2018-the-boring-software-manifesto/\"\u003eBoring Software Manifesto\u003c/a\u003e and have been preaching for years that people should join it? I believe that in the age of agentic software modernization, the manifesto is more relevant than ever. 😉\u003c/p\u003e\n\n\u003cp\u003e\u003cem\u003eIf you’re interested in the charts: the accompanying \u003ca href=\"https://github.com/feststelltaste/software-analytics/blob/master/notebooks/AI%20Productivity%20Gains%20in%20different%20Situations.ipynb\"\u003eJupyter Notebook\u003c/a\u003e, which generated the images based on the talk’s data, is available here.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003e\u003cem\u003eHeader image sourced from \u003ca href=\"https://commons.wikimedia.org/wiki/File:2011-365-314_Social_Media_Cootie_Catcher_(6333021221).jpg\"\u003eWikipedia\u003c/a\u003e, Creative Commons CC0 1.0 Universal Public Domain Dedication.\u003c/em\u003e\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/dpsfk6pbve9by7ovqlrvbm6zstgg?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/02/spec-driven-architecture-contracts-fuer-agenten/","url":"https://www.innoq.com/en/blog/2026/02/spec-driven-architecture-contracts-fuer-agenten/","date_published":"2026-03-26T00:00:00.000+00:00","date_modified":"2026-03-26T11:31:10.078+01:00","author":[{"name":"Philipp Beyerlein","url":"https://www.innoq.com/en/staff/philipp-beyerlein/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eAgentic development is currently shifting where the real bottleneck in software development sits. Anyone working today with tools like \u003ca href=\"https://docs.bmad-method.org\"\u003eBMAD\u003c/a\u003e, Claude Code, or Cursor—and using structured specifications as the foundation—experiences this firsthand: implementation becomes fast, cheap, and scalable. A new service, a refactor, an additional feature unit: hours instead of weeks.\u003c/p\u003e\n\n\u003cp\u003eBut implementation speed doesn’t solve an architecture problem. It amplifies it.\u003c/p\u003e\n\n\u003ch3 id=\"the-problem-implicit-architecture-and-distributed-systems\"\u003eThe Problem: Implicit Architecture and Distributed Systems\u003c/h3\u003e\n\n\u003cp\u003eImagine you’re developing a new application spec-driven with an agentic workflow. The specification is clear, the agent delivers cleanly, the tests are green. Then comes the moment when this application needs to be integrated into an existing enterprise landscape, and suddenly questions arise that aren’t in any spec: Which logging format applies here? Which auth infrastructure is used? Is this service allowed to write directly into a neighboring system’s database? Which events must it publish so other systems learn what’s happening?\u003c/p\u003e\n\n\u003cp\u003eThese questions aren’t implementation details. They’re architecture decisions that apply system-wide and must be collectively adhered to across a distributed system. As long as you’re building a standalone application, you can gloss over them. The moment you operate in a distributed system landscape, they become mandatory.\u003c/p\u003e\n\n\u003cp\u003eThis is exactly where the real problem begins. In most projects, architecture lives implicitly: in ADRs that nobody reads anymore, in Confluence pages that weren’t updated in the last sprint, in framework conventions that are taken for granted—until an agent doesn’t know them. The problem isn’t that this architecture is wrong. The problem is that it isn’t enforceable. No build fails because a domain boundary was crossed. No CI job reports that a forbidden dependency was introduced. Drift emerges quietly and only becomes visible when it’s expensive.\u003c/p\u003e\n\n\u003cp\u003eIn agentic setups, the effect multiplies. Every autonomously generated unit is a new point where an architecture convention can be violated silently—not out of malice, but because the agent simply can’t know if the knowledge isn’t explicitly documented anywhere.\u003c/p\u003e\n\n\u003ch3 id=\"spec-driven-architecture\"\u003eSpec-Driven Architecture\u003c/h3\u003e\n\n\u003cp\u003eSpec-Driven Development is the approach of treating the specification as the primary artifact from which implementation and tests are derived. SDD describes how a system is correctly \u003cstrong\u003eimplemented\u003c/strong\u003e. It doesn’t describe how a portfolio of systems \u003cstrong\u003estays coherent\u003c/strong\u003e.\u003c/p\u003e\n\n\u003cp\u003eThat gap is exactly what \u003cstrong\u003eSpec-Driven Architecture (SDA)\u003c/strong\u003e addresses.\u003c/p\u003e\n\n\u003cp\u003eSDA transfers SDD’s core principle to the architecture level: architecture isn’t diagrams or ADRs—it’s versioned, agent-optimized contracts. In a distributed system developed with agentic workflows, they operate on multiple levels at once:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eDevelopment process:\u003c/strong\u003e Contracts for architecture, operations, or security define guardrails within which agents may implement autonomously.\u003c/p\u003e\u003c/li\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eBetween systems:\u003c/strong\u003e Architecture contracts can define permitted dependencies and communication patterns across system boundaries.\u003c/p\u003e\u003c/li\u003e\n\u003cli\u003e\u003cp\u003e\u003cstrong\u003eWithin a system:\u003c/strong\u003e Contracts for domain design can make a domain’s semantics, invariants, and interfaces binding.\u003c/p\u003e\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eSDA doesn’t replace existing governance. It automates it. What’s discussed today in architecture review boards, documented in ADRs, and corrected in retrospectives becomes, in SDA, a verifiable artifact—one that can’t be forgotten next sprint.\u003c/p\u003e\n\n\u003ch3 id=\"contracts-boundaries-and-guarantees-for-agents-and-teams\"\u003eContracts: Boundaries and Guarantees for Agents and Teams\u003c/h3\u003e\n\n\u003cp\u003eA contract in SDA isn’t an abstract idea. It’s an explicit, verifiable statement about expectations, guarantees, and boundaries between domains, services, teams, and agents—an artifact treated like source code: versioned, reviewed, part of the pipeline.\u003c/p\u003e\n\n\u003cp\u003eHere, the term means something different from API or consumer-driven contracts like Pact. While those govern behavior between two services at the implementation level, SDA contracts operate at the architecture, governance, and domain levels. They don’t define how two services talk to each other; they define what is allowed in a system at all.\u003c/p\u003e\n\n\u003cp\u003eWhich contracts a system needs depends on its specific requirements. There’s no definitive list. The following examples are typical for distributed systems.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eDomain contracts\u003c/strong\u003e describe a domain’s business semantics outward: which terms it defines, which invariants it guarantees, and through which API or event interfaces it may be consumed. A domain contract protects the integrity of the domain—not only against other teams, but also against agents that autonomously generate new implementations. It defines what must remain stable so consumers can rely on it. Domain contracts formalize concepts from DDD: they delineate a domain’s ubiquitous language outward and define which terms and invariants are relevant and stable for other domains—and what remains internal. What used to live in event-storming boards and in people’s heads becomes, in SDA, a versioned artifact.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eArchitecture contracts\u003c/strong\u003e govern the structure of the overall system: which dependencies between modules are allowed, which communication patterns apply, where layer boundaries run. They translate implicit conventions like “we never call directly into another domain’s database” into verifiable rules that a linter or CI job can validate automatically.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eOps contracts\u003c/strong\u003e define operational requirements that apply to every unit: metrics, tracing, logging standards. Especially when new services pop up quickly, they prevent observability from turning into a cleanup project after the fact.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eSecurity contracts\u003c/strong\u003e define which data classifications a service may process, which auth mechanisms are mandatory, and which compliance rules apply. This sounds like classic security overhead, but in an agentic context it becomes genuinely critical. An agent that autonomously creates a new service processing customer data has no basis—without an explicit security contract—for knowing that this data must be stored encrypted, transmitted only over internal networks, and never written to generic log output. A security contract turns exactly these requirements into a build artifact that can be checked—not a checklist someone fills out after deployment. Tools like Open Policy Agent (OPA) show this isn’t theoretical; it’s production-ready today.\u003c/p\u003e\n\n\u003cp\u003eDepending on context, others may be added: test contracts that define which quality gates apply to a domain, or deployment contracts that govern infrastructure requirements. The principle is always the same: implicit architectural knowledge is made explicit, versionable, and automatically verifiable—so agents and teams alike know where the boundaries are.\u003c/p\u003e\n\n\u003ch3 id=\"what-does-a-contract-look-like-in-practice\"\u003eWhat Does a Contract Look Like in Practice?\u003c/h3\u003e\n\n\u003cp\u003eThe simplest way to get started is a Markdown file in the repository. No special format, no tool lock-in—just explicit, versioned knowledge that’s equally readable for humans and agents. A domain contract for an ordering domain might look like this:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"gu\"\u003e### Domain Contract: Ordering\r\u003c/span\u003e\n\r\n\u003cspan class=\"gu\"\u003e#### Owner\r\u003c/span\u003e\nTeam Checkout, checkout@example.com\r\n\r\n\u003cspan class=\"gu\"\u003e#### Ubiquitous Language\r\u003c/span\u003e\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e \u003cspan class=\"gs\"\u003e**Order**\u003c/span\u003e: A confirmed purchase intent by a customer, containing at least one item.\r\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e \u003cspan class=\"gs\"\u003e**OrderItem**\u003c/span\u003e: A single product line with quantity and price fixed at the time of ordering.\r\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e \u003cspan class=\"gs\"\u003e**OrderStatus**\u003c/span\u003e: Enum — PENDING, CONFIRMED, SHIPPED, CANCELLED\r\n\r\n\u003cspan class=\"gu\"\u003e#### Invariants\r\u003c/span\u003e\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e An Order always contains at least one OrderItem.\r\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e The total price is frozen at the time of order creation and never changes retroactively.\r\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e A CANCELLED Order cannot be reactivated.\r\n\r\n\u003cspan class=\"gu\"\u003e#### Published Interface\r\u003c/span\u003e\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e REST API: \u003cspan class=\"sb\"\u003e`POST /orders`\u003c/span\u003e, \u003cspan class=\"sb\"\u003e`GET /orders/{id}`\u003c/span\u003e\r\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e Events: \u003cspan class=\"sb\"\u003e`OrderConfirmed`\u003c/span\u003e, \u003cspan class=\"sb\"\u003e`OrderCancelled`\u003c/span\u003e (schema see /contracts/events/ordering.json)\r\n\r\n\u003cspan class=\"gu\"\u003e#### What this domain does NOT own\r\u003c/span\u003e\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e Payment processing (→ Payment Domain)\r\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e Inventory management (→ Inventory Domain)\r\n\r\n\u003cspan class=\"gu\"\u003e#### Consumption Rules\r\u003c/span\u003e\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e Direct database access is not permitted.\r\n\u003cspan class=\"p\"\u003e-\u003c/span\u003e Status changes only via the API — never through direct event manipulation.\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eThis isn’t documentation in the classic sense—it’s a contract. An agent implementing a new service in the ordering domain gets this contract as context and therefore knows: which terms are binding, which invariants must be upheld, which interfaces exist, and where the domain’s boundaries lie. That’s the difference between an agent that guesses and an agent that operates within clear guardrails.\u003c/p\u003e\n\n\u003cp\u003eThe same principle works for architecture contracts as YAML policies, for security contracts as OPA rule files, for ops contracts as part of a platform configuration. The format follows the context; the property stays the same: versioned, verifiable, part of the pipeline.\u003c/p\u003e\n\n\u003ch4 id=\"from-markdown-to-a-contract-system\"\u003eFrom Markdown to a Contract System\u003c/h4\u003e\n\n\u003cp\u003eMarkdown in the repository is a pragmatic start, but not the end state. When contracts grow across many domains, teams, and systems, the management question comes up: How do contracts stay consistent? How do dependencies between contracts become visible? How do agents get the right contract context at runtime?\u003c/p\u003e\n\n\u003cp\u003eA natural next step is a dedicated contract repository: a central place where all contracts are maintained under version control, checked for consistency, and linked to the affected domains. Changes to a contract trigger automated checks: Which consumers are affected? Which agent workflows need to be revalidated?\u003c/p\u003e\n\n\u003cp\u003eThe next step is actively delivering contracts to agents at runtime. Instead of loading contracts statically into an agent’s context, a contract server could be provided as a tool or resource via the \u003ca href=\"https://spec.modelcontextprotocol.io\"\u003eModel Context Protocol (MCP)\u003c/a\u003e. The agent actively requests the valid domain contract for the ordering domain before generating an implementation. It retrieves the current security contract for customer data before generating database access. Contracts thus become not only governance artifacts, but an active part of the agentic development process: dynamic, context-dependent, and always up to date.\u003c/p\u003e\n\n\u003cp\u003eThat’s still future work. But it’s a future whose foundation you can lay today—with a Markdown file in the repository.\u003c/p\u003e\n\n\u003ch3 id=\"contracts-in-agentic-development-the-gatekeeper\"\u003eContracts in Agentic Development: The Gatekeeper\u003c/h3\u003e\n\n\u003cp\u003eA contract in the repository is a good start. But an agent that doesn’t know that contract won’t follow it either. So the key question isn’t where contracts live, but when and how they enter the agentic workflow.\u003c/p\u003e\n\n\u003cp\u003eThe most straightforward answer: as a skill.\u003c/p\u003e\n\n\u003cp\u003eA gatekeeper skill is a specialized agent that knows the relevant contracts and plays a dual role in the workflow. It is advisor and auditor at the same time. Before an implementation agent starts a task, it can consult the gatekeeper: Which terms must I adhere to in this domain? Which dependencies are forbidden? Which interfaces am I allowed to consume? The gatekeeper responds with the relevant contract context, and the implementation agent builds within the guardrails from the beginning.\u003c/p\u003e\n\n\u003cp\u003eAt the end of each workflow step, the gatekeeper switches roles. It takes the result and validates it against the contracts: Are the domain’s invariants upheld? Does the new dependency violate the architecture contracts? Is the tracing missing that the ops contract requires? Only when the gatekeeper gives the green light does the workflow continue. Otherwise, it returns concrete feedback to the implementation agent.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"s\"\u003e---\u003c/span\u003e\u003cspan class=\"err\"\u003e\r\u003c/span\u003e\n\u003cspan class=\"na\"\u003ename\u003c/span\u003e\u003cspan class=\"pi\"\u003e:\u003c/span\u003e \u003cspan class=\"s\"\u003econtract-gatekeeper\u003c/span\u003e\u003cspan class=\"err\"\u003e\r\u003c/span\u003e\n\u003cspan class=\"na\"\u003edescription\u003c/span\u003e\u003cspan class=\"pi\"\u003e:\u003c/span\u003e \u003cspan class=\"s\"\u003eUse this skill to validate implementation decisions against active\u003c/span\u003e\u003cspan class=\"err\"\u003e\r\u003c/span\u003e\n  \u003cspan class=\"s\"\u003edomain, architecture, ops, and security contracts. Invoke before any\u003c/span\u003e\u003cspan class=\"err\"\u003e\r\u003c/span\u003e\n  \u003cspan class=\"s\"\u003enon-trivial implementation step and after each workflow step completes.\u003c/span\u003e\u003cspan class=\"err\"\u003e\r\u003c/span\u003e\n\u003cspan class=\"s\"\u003e---\u003c/span\u003e\u003cspan class=\"err\"\u003e\r\u003c/span\u003e\n\r\n\u003cspan class=\"gu\"\u003e### Contract Gatekeeper\r\u003c/span\u003e\n\r\nYou are the contract gatekeeper for this system.\r\nYou know all active Domain Contracts, Architecture Contracts, Ops Contracts,\r\nand Security Contracts.\r\n\r\n\u003cspan class=\"gu\"\u003e#### When consulted (pre-implementation)\r\u003c/span\u003e\n\r\nAn agent asks whether a planned approach is contract-compliant.\r\n\u003cspan class=\"p\"\u003e\r\n1.\u003c/span\u003e Identify which contracts are relevant to the request.\r\n\u003cspan class=\"p\"\u003e2.\u003c/span\u003e Respond with the relevant contract sections.\r\n\u003cspan class=\"p\"\u003e3.\u003c/span\u003e Return a clear assessment:\r\n\u003cspan class=\"p\"\u003e   -\u003c/span\u003e \u003cspan class=\"gs\"\u003e**green**\u003c/span\u003e: compliant, proceed.\r\n\u003cspan class=\"p\"\u003e   -\u003c/span\u003e \u003cspan class=\"gs\"\u003e**yellow**\u003c/span\u003e: risk identified, document justification before proceeding.\r\n\u003cspan class=\"p\"\u003e   -\u003c/span\u003e \u003cspan class=\"gs\"\u003e**red**\u003c/span\u003e: violation, do not proceed without contract change.\r\n\r\n\u003cspan class=\"gu\"\u003e#### When reviewing (post-step)\r\u003c/span\u003e\n\r\nAn agent submits an implementation result for review.\r\n\u003cspan class=\"p\"\u003e\r\n1.\u003c/span\u003e Check the result against all relevant contracts.\r\n\u003cspan class=\"p\"\u003e2.\u003c/span\u003e Return structured feedback:\r\n\u003cspan class=\"p\"\u003e   -\u003c/span\u003e which contracts were checked\r\n\u003cspan class=\"p\"\u003e   -\u003c/span\u003e what is compliant\r\n\u003cspan class=\"p\"\u003e   -\u003c/span\u003e what is not compliant, with specific reference to the violated rule\r\n\r\n\u003cspan class=\"gu\"\u003e#### Contracts\r\u003c/span\u003e\n\r\n[Embed relevant contracts here or retrieve via MCP at runtime]\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eThe workflow agent, in turn, is instructed so it cannot bypass the gatekeeper. It knows: before every non-trivial implementation step, I ask the gatekeeper. After every step, I have the result checked. This isn’t a recommendation; it’s part of the workflow structure.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"gu\"\u003e### Implementation Agent — Workflow Instructions (excerpt)\r\u003c/span\u003e\n\r\n\u003cspan class=\"gu\"\u003e#### Contract Compliance\r\u003c/span\u003e\n\r\nBefore any implementation decision that touches a domain boundary, a dependency,\r\nor an interface: consult the contract-gatekeeper skill.\r\n\r\nProvide: what you intend to implement, in which domain, and which resources\r\nyou plan to use.\r\n\r\nAfter completing each step: submit the result to the contract-gatekeeper for review.\r\nDo not proceed until the gatekeeper returns green, or a justified yellow\r\nis documented.\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eThis pattern solves both problems at once. The agent gets contract context at the right time—not somewhere as an attachment, but actively embedded in the workflow. And enforcement doesn’t happen only at the end; it’s continuous—early enough to prevent bad decisions before they’re cast into code.\u003c/p\u003e\n\n\u003cp\u003eThe same logic works in a classic CI/CD pipeline, even if the development workflow isn’t agentic yet. A dedicated pipeline step calls an agent that checks every commit against the relevant contracts. Existing CI/CD tools only need an additional step.\u003c/p\u003e\n\n\u003cp\u003eThat makes contracts a pragmatic entry point. Teams that haven’t switched to agentic workflows yet can still introduce contracts and immediately benefit from automated validation. The contract linter in a CI job and the gatekeeper skill in an agentic workflow are two expressions of the same principle. If you start with the CI step, you’ve already laid the foundation for the move to SDD workflows.\u003c/p\u003e\n\n\u003ch3 id=\"governance-without-constant-objections\"\u003eGovernance Without Constant Objections\u003c/h3\u003e\n\n\u003cp\u003eIn most projects, governance is a role someone has to play. Someone who keeps the overview in the architecture review board. Someone who comments on the PR: “But did we actually check whether this violates our layering rules?” Someone who asks after the sprint why the new service has no tracing. That person isn’t malicious. They’re necessary because otherwise the knowledge exists nowhere.\u003c/p\u003e\n\n\u003cp\u003eSDA fundamentally changes this role. People who previously had to demand governance during review now get a tool that acts earlier: contracts that apply from the first commit—not only once everything is built. The responsibility stays the same. What changes is when it takes effect. When architecture decisions live in versioned contracts, when the gatekeeper skill checks every workflow step against those contracts, when the CI job validates on every commit: governance is simply there—not as control exercised by someone, but as a property of the process.\u003c/p\u003e\n\n\u003cp\u003eMeetings and architecture review boards don’t disappear. They regain their real purpose: making decisions, formulating contracts, consciously setting boundaries. What goes away is the constant overhead of repeatedly enforcing and explaining those decisions in every context. The architecture review board decides what applies. The rest of the process ensures that it applies.\u003c/p\u003e\n\n\u003cp\u003eSDA is not a framework, not a tool, not a replacement for DDD or microservice patterns. It’s a governance principle for a world in which implementation has become cheap—and architecture is the last thing keeping systems coherent.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/aq1qnywz1sx413pba3u4ax2jihdp?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/03/rest-ist-das-bessere-mcp/","url":"https://www.innoq.com/en/blog/2026/03/rest-ist-das-bessere-mcp/","date_published":"2026-03-24T00:00:00.000+00:00","date_modified":"2026-03-24T13:17:23.211+01:00","author":[{"name":"Martin Otten","url":"https://www.innoq.com/en/staff/martin-otten/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eThere are always tasks we’d love to automate—but don’t. Often we don’t have the time, it’s not important enough, or we do it too infrequently. Or the tasks are context-dependent and can only be standardized with significant effort. So far, humans have handled them—even when they’re mind-numbingly boring.\u003c/p\u003e\n\n\u003cp\u003eFor me, one such task is regularly submitting travel expenses. In theory, we already have a decent tool that takes a lot of work off my plate. But there are many factors—travel times, locations, and different invoice formats—that mean I still have to create per-diem entries, assign cost centers, and attach receipts myself.\u003c/p\u003e\n\n\u003cp\u003eManual work like this adds unnecessary process cost. For me, it means context switching to a non-value-adding activity. And it causes waiting time elsewhere in the company. Waiting time is both the biggest problem—and the biggest lever—in process optimization. And since this is neither something that generates revenue nor something I enjoy, I keep postponing my expense report until the last minute—until accounting sends me a reminder. None of that would be necessary if someone else could just do it for me …\u003c/p\u003e\n\n\u003cp\u003eLuckily, an AI agent can do it for me today. That requires combining information from different data sources and entering it into the application. Appointments and travel times come from my calendar, invoices from my emails, and the expense report is created in a web application. There are different ways to access this data. MCP servers are the obvious approach—but they come with real drawbacks. They always load their entire instruction set into context, and you have to carefully review and choose the server implementations. At their core, MCP servers are just RPC—the caller has to know in advance which functions exist. A web app, on the other hand, shows you exactly what you can do in the current context. That’s why I chose different approaches for two of the integrations.\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eCalendar: Instead of an MCP server, I use a shared calendar that can be fetched as an iCal feed over HTTP—a standardized format that works everywhere without additional implementation.\u003c/li\u003e\n\u003cli\u003eMail: Access is handled via a command-line client (Himalaya), which provides a robust implementation and API. This is close to the idea of MCP, but works without the extra context overhead.\u003c/li\u003e\n\u003cli\u003eTravel-expense web application: There is no travel-expense MCP server here. Instead, we use the browser via an MCP for Playwright.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eBut can an AI agent work well with websites? When we think of browser automation, we probably think of tools like Selenium and brittle automated UI tests. That’s the wrong mental model. In fact, AI agents are very good at operating browsers interactively. Because they adapt, they can handle errors—and recover.\u003c/p\u003e\n\n\u003cp\u003eUsing web applications as interfaces for AI agents has clear advantages over an MCP server:\u003c/p\u003e\n\n\u003col\u003e\n\u003cli\u003eThe web app already exists—no need to build anything new.\u003c/li\u003e\n\u003cli\u003eWeb apps are the most mature form of hypermedia we have: via HTML and JavaScript we get context. Content, navigation structures, and available commands are already defined. They even change depending on context, instead of offering a static set of functions like MCP.\u003c/li\u003e\n\u003cli\u003eWeb apps are built for humans and humans make mistakes We already validate inputs in web applications and check whether actions are allowed or would lead to inconsistent states. It’s normal for people to make at least careless mistakes. That’s a big advantage for AI agents: they can make incorrect inputs, get feedback, and adjust their behavior to solve the problem.\u003c/li\u003e\n\u003cli\u003eWeb apps already come with roles and permissions. An agent can be given the minimum required privileges—without having to implement new security infrastructure.\u003c/li\u003e\n\u003c/ol\u003e\n\n\u003cp\u003eSo AI agents operate within a concrete set of possibilities, rather than solving a problem from scratch in a vacuum. There is a fixed context the agent can move within.\u003c/p\u003e\n\n\u003cp\u003eThere’s no split between the interface for humans and machines—it’s the same interface with the same capabilities, security guarantees, and problems.\u003c/p\u003e\n\n\u003cp\u003eWith JSON APIs, you typically get raw data, but no semantics.\u003c/p\u003e\n\n\u003cp\u003eSo far, hypermedia\u003ca href=\"https://www.innoq.com/en/blog/2026/03/rest-ist-das-bessere-mcp/#fn:1\" id=\"fnref:1\" title=\"see footnote\" class=\"footnote\"\u003e[1]\u003c/a\u003e has really only become established for “human” browser users. AI agents are the first non-human users that can actually take full advantage of these interfaces.\u003c/p\u003e\n\n\u003cp\u003eTo make this work, we only need an MCP server for browser usage. Like a web browser itself, it has no domain-specific functionality. It enables access to additional domain “tools” in the form of web applications. As a result, tool selection and tool specification become contextual and dynamic.\u003c/p\u003e\n\n\u003cp\u003eThat fits well with the fact that we also “program” AI agents dynamically through prompts. We give an instruction in natural language, and the language model is trained to develop a problem-solving strategy from it. Of course, the idea of declarative programming isn’t new. The most successful example is SQL, where I describe what I want without prescribing the solution path. But that’s a much narrower use case in which this approach has been feasible so far.\u003c/p\u003e\n\n\u003ch3 id=\"practical-example\"\u003ePractical example\u003c/h3\u003e\n\n\u003cp\u003eAn example of filing an expense report with an AI agent. We start with a naive prompt.\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eFor my “Event Storming workshop” at Capitol Versicherung in January, I need to file a travel expense report. We use https://reisekosten.example for that. You can find the travel details in my calendar. The hotel and train invoices are in my emails. Please create a folder for the trip. Use Playwright MCP to access the website. For each travel day, create a per-diem entry. Ask me if anything is unclear.\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eThis prompt is far from being perfect instructions. Still, the agent had hardly any follow-up questions.\u003c/p\u003e\n\n\u003cp\u003eA quick confirmation:\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cul\u003e\n\u003cli\u003eOn \u003cstrong\u003e2026–01–09\u003c/strong\u003e, were there really \u003cstrong\u003eno\u003c/strong\u003e meals provided?\u003c/li\u003e\n\u003cli\u003eWas breakfast provided by the hotel on 2026–01–10, 2026–01–11, and 2026–01–12?\u003c/li\u003e\n\u003cli\u003eI couldn’t find the project “Capitol Versicherung Event Storming workshop”, so I only set the \u003cstrong\u003ereason\u003c/strong\u003e. Should I leave it like that, or is there a different project name you use that I should select?\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eThe agent missed several contextual factors: that with an arrival time of 10 pm I probably don’t want to eat anything, that the breakfast information is already in the hotel invoice, and that it didn’t try to use the customer name to find an existing project.\u003c/p\u003e\n\n\u003cp\u003eAfter I provided the missing information, my expense report was actually already done. And that was with very little contextual detail. Before submitting, I review the folder myself in the web application.\u003c/p\u003e\n\n\u003cp\u003eNow I want the next expense report to run without follow-up questions if possible. So I ask the agent to write its own instructions. At first, the instructions consist of a process description and heuristics you could also give a human to complete this task. In addition, the agent made notes about technical issues when operating the website:\u003c/p\u003e\n\n\u003ch3 id=\"ui-workarounds-playwright-mcp\"\u003eUI workarounds (Playwright MCP)\u003c/h3\u003e\n\n\u003cp\u003eWhen date picker blocks checkboxes:\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cul\u003e\n\u003cli\u003eClose the date picker first.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eIf the UI click is still blocked:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eSet the state in a controlled way via \u003ccode\u003eevaluate\u003c/code\u003e (\u003ccode\u003echecked\u003c/code\u003e) and trigger a \u003ccode\u003echange\u003c/code\u003e event.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eIf the project dropdown returns no results:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eClear the field, type slowly, wait for the listbox.\u003c/li\u003e\n\u003c/ul\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eWe can use these instructions as a skill and now have the agent review and update them after each run. This way, we can also document technical aspects that would otherwise be invisible to the user.\u003c/p\u003e\n\n\u003cp\u003eThat gives us a self-improving system that learns with each task, becomes more efficient, and makes fewer mistakes.\u003c/p\u003e\n\n\u003cp\u003eEven with a final review, the agent already removes a lot of tedious busywork. The agent is not yet fully autonomous: it is triggered manually, observed, and the result is checked. To truly eliminate waiting time, the agent would need to detect trips on its own and submit them as well.\u003c/p\u003e\n\n\u003cp\u003eTo make such a decision, we need to know not only whether the agent makes errors, but also the type and frequency of those errors. Fortunately, the process was designed for humans, who also make mistakes. This is where classic risk analysis comes in: What types of errors are possible, how likely are they, and what would the consequences be? In the end, you can decide whether the benefits from optimizing the process outweigh the risks overall.\u003c/p\u003e\n\n\u003cp\u003eIf we want to implement this for a higher-volume process—for example, all travel expense reports for all employees—we’ll also see more variance. One strategy is to run both processes in parallel and initially compare only the agent’s results with the manual expense reports. Only when the numbers match can you gradually switch over. Once this approach is established, it can also be used for auditing by automating only a portion of the tasks, or by having samples processed by humans and AI agents.\u003c/p\u003e\n\n\u003cp\u003eThere are many processes like this that could benefit from such automation. We probably haven’t paid attention to them so far because they weren’t relevant to software development. But now systems can be integrated automatically via web interfaces. We build on an existing system that provides context and security guarantees. The challenge is to identify the workflows that can be supported by agents and have the biggest impact on business operations.\u003c/p\u003e\n\n\u003cfoot-notes class=\"footnotes\"\u003e\n\n\u003col class=\"footnotes__list\"\u003e\n\n\u003cli id=\"fn:1\"\u003e\n\u003cp\u003eREST Level 3 (HATEOAS) \u003ca href=\"https://www.innoq.com/en/blog/2026/03/rest-ist-das-bessere-mcp/#fnref:1\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003c/ol\u003e\n\u003c/foot-notes\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/2atjvvzitx2hj1rt0ynflsavmnee?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/03/sdd-ddd-why-bmad-wont-save-you/","url":"https://www.innoq.com/en/blog/2026/03/sdd-ddd-why-bmad-wont-save-you/","date_published":"2026-03-18T00:00:00.000+00:00","date_modified":"2026-03-16T11:27:50.805+01:00","author":[{"name":"Daniel Westheide","url":"https://www.innoq.com/en/staff/daniel-westheide/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eIn my post \u003ca href=\"https://www.innoq.com/en/blog/2025/07/erst-agil-dann-agentisch/\"\u003eFirst Agile, Then Agentic\u003c/a\u003e, I argued that organisations need certain capabilities shaped by the agile and DevOps movements before agentic engineering practices can lead to benefits at the organisational level. If you just speed up development work without adapting your organisation accordingly, those local benefits will not translate to system-level improvements.\u003c/p\u003e\n\n\u003cp\u003eSpec-driven development tools like BMAD are receiving a lot of attention right now, and for good reason. They aim to solve a real problem I identified in that post: requirements engineering cannot keep up with AI-assisted development teams. If agents can implement a well-specified story in minutes, your upstream process becomes the bottleneck. BMAD promises to solve this by using structured AI interviews to quickly generate rich specifications. No more waiting for the product owner!\u003c/p\u003e\n\n\u003cp\u003eAre BMAD and other spec-driven development tools the missing piece to make your agentic engineering take off? Maybe, under specific conditions. The problem is the assumption hiding underneath the bold claims.\u003c/p\u003e\n\n\u003ch3 id=\"what-bmad-actually-does\"\u003eWhat BMAD actually does\u003c/h3\u003e\n\n\u003cp\u003eBMAD provides multiple agents, each one covering a distinct role in a software product development team. One of them is Mary, the analyst agent. Mary conducts structured discovery interviews, performs competitor analysis, evaluates business models, and generates comprehensive product requirements documents. For someone who has been vibe-coding their own ad-hoc requirements into existence, this is a significant step forward.\u003c/p\u003e\n\n\u003ch3 id=\"the-specification-is-only-as-good-as-the-domain-knowledge-in-the-room\"\u003eThe specification is only as good as the domain knowledge in the room\u003c/h3\u003e\n\n\u003cp\u003eBut here is the thing: the specification layer depends completely on the quality of domain knowledge the human brings to the interview. The agent asks rigorous questions and can be quite persistent. It won’t let you off the hook with vague or evasive answers, probing until it has something concrete to work with. That is valuable. But it cannot supply domain knowledge that isn’t in the room.\u003c/p\u003e\n\n\u003ch3 id=\"ddd-practioners-have-seen-this-before\"\u003eDDD practioners have seen this before\u003c/h3\u003e\n\n\u003cp\u003eDomain-Driven Design practitioners will recognise this constraint immediately. DDD requires sustained, genuine collaboration between developers and domain experts to build a shared ubiquitous language and a rich domain model. When organisations struggle with DDD (and in my experience, most of them do) it’s rarely because developers are not properly trained. More often, it’s because domain experts aren’t directly accessible to developers. Very often, they are buffered behind proxy product owners, and no matter how good a job those product owners do, there is almost certainly a loss in translation. Sadly, many organisations suffer from a structure and culture that actively discourages the direct cross-boundary collaboration that successful domain-driven design requires.\u003c/p\u003e\n\n\u003cp\u003eBMAD hits exactly the same wall for exactly the same reason. If your organisation can’t do DDD properly, it can’t benefit from BMAD’s specification layer either.\u003c/p\u003e\n\n\u003ch3 id=\"upfront-specification-vs-continuous-discovery\"\u003eUpfront specification vs. continuous discovery\u003c/h3\u003e\n\n\u003cp\u003eHowever, there is an important difference. DDD requires domain experts to be continuously available throughout development. You go back to them when your domain model hits conceptual friction during implementation, which happens repeatedly and unpredictably. The model emerges through iteration, not upfront specification. Eric Evans is explicit about this: discovery is continuous, not a phase you complete before coding begins.\u003c/p\u003e\n\n\u003cp\u003eSpec-driven development operates on a different assumption: Discovery happens upfront through structured interviews, and the resulting specification drives implementation.\u003c/p\u003e\n\n\u003cp\u003eYes, BMAD does support iterative refinement within the planning phase. But the dominant flow is still to plan and then implement. Critics of spec-driven development say that\n\u003ca href=\"https://marmelab.com/blog/2025/11/12/spec-driven-development-waterfall-strikes-back.html\"\u003e“waterfall strikes back”\u003c/a\u003e, leading to big design up front with AI-generated documentation. They have a point. A domain model that emerges through implementation and repeated collaboration will capture things no upfront interview process reliably surfaces.\u003c/p\u003e\n\n\u003ch3 id=\"where-spec-driven-development-actually-fits\"\u003eWhere spec-driven development actually fits\u003c/h3\u003e\n\n\u003cp\u003eThere is one context where I think BMAD can be a good fit: The technical founder building their own product idea. They are simultaneously the domain expert, the product owner, the architect, and often the developer. There is no proxy problem. There are no organisational boundaries to cross. The analyst agent interviews someone who conceived the product idea, hopefully knows the potential users and has at least some understanding of the competitive landscape.\u003c/p\u003e\n\n\u003cp\u003eFor that person, BMAD’s competitor analysis, market research, and business model evaluation can be quite valuable. And the specification interview works because the knowledge is already in the room.\u003c/p\u003e\n\n\u003cp\u003eThe productivity claims you read about in some blog posts, like planning time reduced from weeks to hours, are likely to primarily materialise in this specific context of solo entrepreneurs or solo developers.\u003c/p\u003e\n\n\u003ch3 id=\"fix-the-organisation-not-the-toolchain\"\u003eFix the organisation, not the toolchain\u003c/h3\u003e\n\n\u003cp\u003eBefore asking whether spec-driven development can accelerate your requirements engineering, ask yourself a different question: can your team get genuine access to domain experts when it needs them? Are those experts willing and available to engage deeply with development questions? Does your organisation’s structure and culture support that kind of collaboration?\u003c/p\u003e\n\n\u003cp\u003eIf the answer is yes, BMAD and other spec-driven development methods can be a viable option.\u003c/p\u003e\n\n\u003cp\u003eIf the answer is no, you have an organisational problem that no interviewing agent can fix. The tool will surface that problem more explicitly than a traditional requirements process. BMAD’s persistent questioning makes knowledge gaps visible early and right in your face. But it cannot resolve them.\u003c/p\u003e\n\n\u003cp\u003eThe prerequisite is the same one I explained in my \u003ca href=\"https://www.innoq.com/en/blog/2025/07/erst-agil-dann-agentisch/\"\u003eprevious post\u003c/a\u003e: organisational change first, then the tools that amplify it.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/1hjagk5wwxpw26ccn38ru0pkj8x8?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/03/autonomous-data-product-size-heuristics/","url":"https://www.innoq.com/en/blog/2026/03/autonomous-data-product-size-heuristics/","date_published":"2026-03-13T00:00:00.000+00:00","date_modified":"2026-04-08T09:56:35.032+02:00","author":[{"name":"Stefan Negele","url":"https://www.innoq.com/en/staff/stefan-negele/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eCreating data products incorrectly causes predictable problems. Consumers end up having to stitch multiple datasets together just to answer simple questions. Ownership becomes unclear because no one is clearly responsible for semantics, quality, and changes. The same logic is implemented in several places, which is costly and usually inconsistent. Data products can become bloated, which makes it difficult to find the right information and understand the content. As soon as data products become inconsistent and boundaries become blurred, operational disruptions occur, and incidents, refills, and escalations by stakeholders become routine.\u003c/p\u003e\n\n\u003cp\u003eThe goal is the opposite. A well-cut data product is a stable, consumable package. Its semantics are clear enough that consumers do not need a translation layer. Its profile is coherent, meaning expectations regarding freshness, granularity and reliability are both realistic and achievable. In short, the data product is both self-sufficient and reliable.\u003c/p\u003e\n\n\u003cp\u003eZhamak Dehghani’s wonderful book on \u003ca href=\"https://www.oreilly.com/library/view/data-mesh/9781492092384/\"\u003eData Mesh\u003c/a\u003e is a great source of inspiration for me. Through my work as a consultant in this field and by discussing the matter with my colleagues, I have gained experience in putting this theory into practice. The aim of this text is to help you create successful data product cuts by offering practical heuristics. First, we will discuss the heuristics that apply to all data products, and then we will move on to the specialized heuristics for the three data product archetypes: \u003ca href=\"https://www.datamesh-architecture.com/#source-aligned\"\u003esource-aligned\u003c/a\u003e, \u003ca href=\"https://www.datamesh-architecture.com/#aggregate\"\u003eaggregate\u003c/a\u003e, and \u003ca href=\"https://www.datamesh-architecture.com/#consumer-aligned\"\u003econsumer-aligned\u003c/a\u003e.\u003c/p\u003e\n\n\u003ch3 id=\"fundamentals\"\u003eFundamentals\u003c/h3\u003e\n\n\u003ch5 id=\"clearly-defined-consumer-and-use-cases\"\u003eClearly defined consumer and use cases\u003c/h5\u003e\n\n\u003cp\u003eOne of the most important heuristics is consumer fit. A good cut allows most obvious use cases to be put to productive use without consumers having to carry out their own integration work. Its primary purpose should be expressible in a single sentence. A lack of use cases and consumers or target audience means there is no data product.\u003c/p\u003e\n\n\u003cp\u003eIn order to understand whether the data product is suitable for consumers, requirements engineering is essential, which we can sensibly map with contract-first using data contract workshops. \u003ca href=\"https://datacontract.com/\"\u003eData contracts\u003c/a\u003e are, in a sense, interface descriptions for data product output ports, and they naturally provide an accurate picture of the requirements for the underlying data product.\u003c/p\u003e\n\n\u003ch5 id=\"stable-ownership\"\u003eStable ownership\u003c/h5\u003e\n\n\u003cp\u003eThe second heuristic is stable ownership. A data product must belong to a single domain or team that is responsible for semantics, quality, and operations. If a clear owner cannot be identified, the data product cannot be defined. This is because data products must be maintained and supported throughout their entire lifecycle to continue meeting consumer requirements.\u003c/p\u003e\n\n\u003cp\u003eThere are a variety of triggers that result in the genesis of data products. The most important ones are:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eA team publishes its data because it conducts its own analyses and assumes that the data could be of interest to others too.\u003c/li\u003e\n\u003cli\u003eThere is a demand for specific data, so the relevant team is approached directly.\u003c/li\u003e\n\u003cli\u003eA manager or committee decides that a data product must be developed to meet a specific demand for data.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eIn the latter case, the issue of ownership is particularly important, because it must be clear who will bear the costs of developing and maintaining the data product.\u003c/p\u003e\n\n\u003ch5 id=\"consistent-data-quality\"\u003eConsistent data quality\u003c/h5\u003e\n\n\u003cp\u003eThe previously described data contracts can provide further insight into the data product cut. They contain information about the characteristics of the data quality. This should be consistent across the output ports because the internal sources of the data product for the individual output ports should be the same. However, this statement must be qualified somewhat, especially with regard to time-dependent characteristics. For instance, a daily CSV export to S3 cannot be as up to date as a real-time data stream on Kafka.\u003c/p\u003e\n\n\u003ch5 id=\"low-integration-burden\"\u003eLow integration burden\u003c/h5\u003e\n\n\u003cp\u003eA data product should minimize integration effort for its consumers. The boundary is more likely to be appropriate when the data product represents the smallest useful unit that does not require consumers to reconstruct the context by combining other datasets. Typically, consumers should be able to start using the data product on its own without building an additional integration layer. If most consumers immediately need to combine it with other products, then the scope is probably too narrow or lacks essential context.\u003c/p\u003e\n\n\u003cp\u003eHowever, we must ensure that certain central data sets do not become single points of failure. Certain master data, for example, are candidates here, as joining them with other data always offers considerable added value and is therefore done frequently. Integrating such a central data product carries risks, including cascading effects in the event of failure, as well as operational complexity, inconsistencies and lag. Taking this into account, the standalone dataset could be more valuable, as it could serve most use cases without enriched data. At the same time, its consistency, reliability and user-friendliness would improve.\u003c/p\u003e\n\n\u003ch5 id=\"bounded-scope\"\u003eBounded scope\u003c/h5\u003e\n\n\u003cp\u003eThe scope of a data product should be limited and closely match its intended use. It should contain only what is needed for its intended use, not everything that might be useful in the future. Maintaining a narrow scope prevents semantic noise and stops the data product from becoming too big. If the data product begins to include loosely related data or speculative additions, then the scope is likely too broad.\u003c/p\u003e\n\n\u003cp\u003eAt first glance, this may seem to contradict the principle of low integration barriers. In reality, adhering to both principles results in a stable, sustainable scope for data products, making them as user-friendly as possible while ensuring they are not burdened with unnecessary extras.\u003c/p\u003e\n\n\u003cfigure\u003e\r\n\u003cimg alt=\"Gradient scale from “High Integration Burden” to “Bloated Scope”; arrow marks “smallest useful standalone unit” in the middle.\" loading=\"lazy\" srcset=\"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_400/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 400w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_800/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 800w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1024/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 1024w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1200/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 1200w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_1600/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 1600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2048/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 2048w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2300/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 2300w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2600/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 2600w, https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2800/v1/uploads-production/dwrkdpwirnk9lh2p6e7711lj39kd?_a=BACMTiAE 2800w\" sizes=\"(min-width: 1400px) 1024px, 90vw\"\u003e\r\n\u003c/figure\u003e\r\n\n\u003ch4 id=\"general-heuristics-for-cutting-data-products\"\u003eGeneral heuristics for cutting data products\u003c/h4\u003e\n\n\u003ch5 id=\"clearly-defined-consumer-and-use-cases\"\u003eClearly defined consumer and use cases\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eCan you describe the main purpose in one sentence?\u003c/li\u003e\n\u003cli\u003eAre there any specific teams or roles that want to use this data product right now?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"stable-ownership\"\u003eStable ownership\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eIs one specific domain or team accountable for semantics, quality, and operations?\u003c/li\u003e\n\u003cli\u003eWould the owner credibly handle future changes?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"consistent-data-quality\"\u003eConsistent data quality\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eAre data quality attributes consistent across output ports?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"low-integration-burden\"\u003eLow integration burden\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eIs this the smallest useful standalone unit that does not force consumers to stitch data products together?\u003c/li\u003e\n\u003cli\u003eCan a typical consumer immediately start using this data product meaningfully on their own?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"bounded-scope\"\u003eBounded scope\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eDoes the data product include only what is needed for its purpose?\u003c/li\u003e\n\u003cli\u003eIs it limited to only including things that are useful in the present?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3 id=\"source-aligned-data-products\"\u003eSource-aligned data products\u003c/h3\u003e\n\n\u003ch5 id=\"semantic-coherence\"\u003eSemantic coherence\u003c/h5\u003e\n\n\u003cp\u003eSource-aligned data products form the foundation layer of most data landscapes. They provide consumable, domain-specific data that remains close to the operational truth. In practice, a source-aligned data product typically comprises a consistent set of data, along with all the local dimensions required for interpretation. The key idea is that consumers should receive a reliable, well-defined representation of a single topic without having to reverse engineer the source system.\u003c/p\u003e\n\n\u003cp\u003eFinding the right scope starts with the source, but not with the mindset of dumping everything you find in it. The objective is to identify the smallest stable package from the consumer’s perspective. This involves understanding whether the source data can be meaningfully divided, recognizing which elements only make sense together, and identifying a narrow unit that enables consumers to work without having to rebuild integrations. Adhering to these principles will prevent you from fragmenting sources into tiny datasets that only the producing team can comprehend.\u003c/p\u003e\n\n\u003ch5 id=\"data-from-a-single-business-domain\"\u003eData from a single business domain\u003c/h5\u003e\n\n\u003cp\u003eRather than dealing with entire, more technical systems, it is more practical to take an approach that involves aligning with existing domain modules or microservices. Presenting your customers a data product called ‘All data from SAP’ usually is too broad, semantically messy, and difficult to manage. A domain module, on the other hand, has clearer responsibility boundaries and a natural owner. In \u003ca href=\"https://www.innoq.com/de/topics/domain-driven-design/\"\u003eDomain-Driven Design\u003c/a\u003e, a microservice is usually associated with a bounded context. Within this context, the domain model usually comprises one or more aggregates (not to be confused with the aggregate data product archetype). These aggregates from Domain-Driven Design are perfect candidates for source-aligned dataproducts.\u003c/p\u003e\n\n\u003cp\u003eIn reality, it can be pragmatic for a team to create a data product that is connected to a large system consisting of data from several subdomains, such as the aforementioned SAP, in order to save operating costs. However, this would likely violate the principle of domain ownership. We need to find a team that can take ownership of this data product, breaking it down into more specific ones. This data product essentially faces the same challenges as the aggregate data product archetype presented below and should therefore be avoided.\u003c/p\u003e\n\n\u003cp\u003eIf you want to look at the whole thing from a data architecture perspective, you can use modeling practices such as \u003ca href=\"https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/\"\u003eKimball’s Dimensional Modeling Techniques\u003c/a\u003e. Kimball describes a method for dimensional modeling of data warehouse structures in which business processes are modeled as facts (measurable events) on a clearly defined granularity and contextualized by descriptive dimensions. Advanced modeling techniques such as \u003ca href=\"https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimensional-modeling-techniques/conformed-dimension/\"\u003eConformed Dimensions\u003c/a\u003e become interesting at a higher level (aggregate, consumer-aligned) or can even contradict data mesh principles. However, these business processes, facts, and dimensions can form a good basis for cutting data products, provided that data architects are familiar with at least the basics.\u003c/p\u003e\n\n\u003cp\u003eThe same care is required when deciding which contextual information belongs in a data product. Context that is clearly owned, maintained, and evolved within the same domain should live inside the data product itself. Context that is shared across domains and must remain semantically consistent, such as customers or products, should be exposed as separate data products with their own ownership. Other products should reference these shared domain products rather than redefining them locally. This avoids semantic drift, where core business concepts slowly diverge and end up meaning different things in different parts of the data landscape.\u003c/p\u003e\n\n\u003ch5 id=\"source-data-blast-radius\"\u003eSource data blast radius\u003c/h5\u003e\n\n\u003cp\u003eFrom an operational perspective, it should be avoided that routine changes to the source code affect many source-aligned data products. Ideally, you want the impact of changes to the source schema or logic to be as localized as possible. This helps teams minimise the effort required to maintain the data product while offering consumers insight into how the operational world works. However, this is naturally heavily influenced by good upstream design. Therefore, bear in mind any known flaws when designing the output ports of your data products. It is, of course, inevitable that changes to the quality of the source data will affect all data products that depend on the corresponding aligned source data.\u003c/p\u003e\n\n\u003ch4 id=\"special-heuristics-for-source-aligned-data-products\"\u003eSpecial heuristics for source-aligned data products\u003c/h4\u003e\n\n\u003ch5 id=\"semantic-coherence\"\u003eSemantic coherence\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eDoes the data product make sense on its own, or does it require the other parts of the source data?\u003c/li\u003e\n\u003cli\u003eDoes it feel like a cohesive, integrated whole rather than a random collection of related items?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"data-from-a-single-business-domain\"\u003eData from a single business domain\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eDoes the cut follow meaningful domain modules rather than whole systems?\u003c/li\u003e\n\u003cli\u003eDoes the data contain only internal or also cross-domain context?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"source-data-blast-radius\"\u003eSource data blast radius\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eDo changes on the data source impact only this data product directly?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3 id=\"aggregate-data-products\"\u003eAggregate data products\u003c/h3\u003e\n\n\u003ch5 id=\"value-and-reuse-versus-cost-and-complexity\"\u003eValue and reuse versus cost and complexity\u003c/h5\u003e\n\n\u003cp\u003eAggregate data products are built on top of source-aligned products. They combine or summarize data from multiple source-aligned products to provide semantics that can be used across an entire domain. Think of them as shared building blocks that eliminate the need for repeated integration and calculation work for many consumers. As they encode cross-source meaning, they require greater governance and coordination than source-aligned products.\u003c/p\u003e\n\n\u003cp\u003eAggregates are special because determining ownership and cost allocation is more difficult. A source-aligned data product typically has an obvious owner, which is the domain operating the underlying system. Since the source data of aggregates often span several owners, aggregates should only be created when the shared value is clear, and responsibility can be assigned to a single team capable of maintaining the integrated semantics. The question with aggregates is often not how to cut them, but whether they should exist at all.\u003c/p\u003e\n\n\u003cp\u003eIt is worth creating an aggregate when:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eSeveral teams require the same derived view with the same meaning. A good rule of thumb is three or more teams. Allowing each team to build its own version would be costly and lead to inconsistent results.\u003c/li\u003e\n\u003cli\u003eThe derivation itself is expensive. Typical examples include feature computation for machine learning, entity matching or deduplication. Reusing one high-quality derivation is cheaper than repeating it in multiple places.\u003c/li\u003e\n\u003cli\u003eThe core value only emerges once sources have been combined. For example, no single source-aligned data product can provide a complete customer view, which only emerges once orders, payments, returns, and CRM signals are brought together.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"scope-and-governance\"\u003eScope and governance\u003c/h5\u003e\n\n\u003cp\u003eThey are not useful if the only motivation is a desire for central control or if the shared use cases are actually heterogeneous and would lead to disagreements over definitions. In those cases, it is better to keep the derivations closer to the consumers and provide separate, consumer-aligned products if needed.\u003c/p\u003e\n\n\u003cp\u003eAlthough their size is driven by concrete use cases, they still need a tight scope. The risk is turning an aggregate into a small warehouse that tries to answer every possible question as this would increase their complexity and blast-radius. Aggregates change slowly by nature because any semantic change affects many consumers, so careful governance and clear boundaries are required.\u003c/p\u003e\n\n\u003ch4 id=\"special-heuristics-for-aggregates\"\u003eSpecial heuristics for aggregates\u003c/h4\u003e\n\n\u003ch5 id=\"value-and-reuse\"\u003eValue and reuse\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eAre there more than two teams that need the same derived view with identical meaning?\u003c/li\u003e\n\u003cli\u003eWould teams repeatedly build the same integration or calculation without the aggregate?\u003c/li\u003e\n\u003cli\u003eDoes value emerge only after combining sources?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"cost-and-complexity\"\u003eCost and complexity\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eIs the derivation expensive (feature engineering, entity matching, deduplication, cross-source joins)?\u003c/li\u003e\n\u003cli\u003eIs there someone in the company willing to bear the costs of this data product?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"scope-and-governance\"\u003eScope and governance\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eIs the scope tight enough so the data product is not drifting toward a mini data warehouse?\u003c/li\u003e\n\u003cli\u003eIs the outcome valuable enough to justify the required strong governance?\u003c/li\u003e\n\u003cli\u003eCan the owning team maintain the integrated semantics, despite spanning multiple sources?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3 id=\"consumer-aligned-data-products\"\u003eConsumer-aligned data products\u003c/h3\u003e\n\n\u003ch5 id=\"clear-purpose\"\u003eClear purpose\u003c/h5\u003e\n\n\u003cp\u003eConsumer-aligned data products are designed for a specific purpose and require minimal effort from consumers. Compared to source-aligned data products, as the name implies, the data is optimized for end users and sophisticated analyses. Rather than presenting domain reality in a general way, they provide precisely the data and semantics required by the consumer to make a based decision, run a process, or power a specific analytical or operational artifact. Success is measured by whether a consumer can use the data product without first building a custom integration layer.\u003c/p\u003e\n\n\u003cp\u003eA quick test is to determine if the data product serves one coherent purpose for a defined audience. The result may be a single report or a well-scoped data mart that supports a family of related reports. For instance, a finance data mart that supports “producing consistent margin and revenue reporting for the monthly close” clearly passes this test, even if it powers dozens of reports. The warning sign is not the number of questions answered but whether those questions belong to the same decision context. A data product framed as “supporting finance reporting, marketing analysis, ad hoc exploration, and future machine learning use cases” fails the test because it mixes unrelated audiences and responsibilities.\u003c/p\u003e\n\n\u003ch5 id=\"natural-focused-size\"\u003eNatural, focused size\u003c/h5\u003e\n\n\u003cp\u003eConsumer-aligned products tend to have a natural size, but that size is defined by the job to be done rather than by a single artifact. These products may correspond directly to a specific report or dashboard or support a family of closely related outputs, such as multiple views of the same dashboard. Other examples include reverse ETL flows into an operational tool or a machine learning model with its tailored feature set. This is a heuristic, not a hard rule, but it helps keep scope in check. If a data product feeds many different dashboards or models, it should probably be split into several products, each aligned to a specific consumer.\u003c/p\u003e\n\n\u003ch5 id=\"meaningful-boundaries\"\u003eMeaningful boundaries\u003c/h5\u003e\n\n\u003cp\u003eTheir boundaries usually follow decision or process lines rather than system lines. A natural cut is a business moment at which a decision is made or an action is taken. Examples include a fraud analyst reviewing a case or a marketing lead deciding how to allocate budget. These reductions often contradict the structure of the source systems, but that is acceptable since consumer-oriented products aim to optimize consumption rather than reflect processes.\u003c/p\u003e\n\n\u003ch5 id=\"business-consumers\"\u003eBusiness Consumers\u003c/h5\u003e\n\n\u003cp\u003eFinally, they require clear business objectives and strict scope management. If you cannot identify the people or teams who will use the data product, there is a risk of creating a thinly disguised catch-all dataset. The data product should contain only what is needed to do the job, not everything that might be useful one day.\u003c/p\u003e\n\n\u003ch4 id=\"special-heuristics-for-consumer-aligned-data-products\"\u003eSpecial heuristics for consumer-aligned data products\u003c/h4\u003e\n\n\u003ch5 id=\"clear-purpose\"\u003eClear purpose\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eCan this purpose be expressed as a verb + object sentence (for example, monitor churn by segment, forecast demand by categories, or review fraud cases)?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"natural-focused-size\"\u003eNatural, focused size\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eDoes the data product support a single decision context that results in one or more related dashboards, reports, or outputs?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"meaningful-boundaries\"\u003eMeaningful boundaries\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eDoes the boundary follow a process and not a system boundary?\u003c/li\u003e\n\u003cli\u003eDoes the cut reflect how a consumer acts or decides, not how data happens to be stored?\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch5 id=\"business-consumers\"\u003eBusiness Consumers\u003c/h5\u003e\n\n\u003cul\u003e\n\u003cli\u003eWill your data product be used by business users, data analysts, data scientists, or applications?\u003c/li\u003e\n\u003c/ul\u003e\n\u003cp\u003eGetting the size of a data product right is not a one-time decision; it’s an ongoing balancing act. The heuristics presented here are not rigid rules, but guiding questions to help teams make informed choices. The same basic principles apply when shaping a source-aligned product close to the operational truth, justifying the cost of an aggregate, or tailoring a consumer-aligned product to a specific decision context: clear ownership, a defined scope, and genuine consumer value. Although no heuristic can replace conversations with the people who will build, maintain and use the data product, a shared vocabulary for these discussions can significantly improve outcomes.\u003c/p\u003e\n\n\u003cp\u003eWhat do you think? Do you have similar ideas about data product cuts? Is anything missing? Let’s discuss it!\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/lhm4j5cjevflejeisd83hq50gejt?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/03/agenten-gut-alles-gut/","url":"https://www.innoq.com/en/blog/2026/03/agenten-gut-alles-gut/","date_published":"2026-03-12T00:00:00.000+00:00","date_modified":"2026-03-12T10:56:21.231+01:00","author":[{"name":"Fabian Walther","url":"https://www.innoq.com/en/staff/fabian-walther/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003ch3 id=\"the-never-ending-backlog\"\u003eThe Never-Ending Backlog\u003c/h3\u003e\n\n\u003cp\u003e\u003cem\u003eHey, let’s integrate consumer-driven contract tests with Pact, because we’ve inherited these 23 microservices that we’re supposed to maintain with three people. A BDD test suite would totally help us out, tackle this ultra-complex domain, and ensure that everything is transparent. But we really don’t have time to fix the Selenium setup every two weeks for three days, because in the meantime five browser updates happened, leading to 70% failing tests—and we’re totally lost.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eThere’s this one refactoring we always push into the future, but if we do it, it would touch the whole codebase. It would take at least three days of manual labor, Sisyphus labor, and deliver no meaningful impact for users. If we don’t do it, we’ll feel the pain in two to three years, because our delivery speed will drop. There’s really no glory in prevention.\u003c/p\u003e\n\n\u003ch3 id=\"now-agents-will-fix-everything\"\u003eNow Agents Will Fix Everything\u003c/h3\u003e\n\n\u003cp\u003eHelpful project infrastructure that would require either a high initial time investment or long-running maintenance costs will often not be pursued. Either the teams are not well enough staffed, or the necessary backing throughout the organization is not in place. No matter the reason, resources are not allocated for what would often be necessary for the project.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eBut should we now tackle all these things with agents?\u003c/strong\u003e Would our life and work really become better? My initial reflex was: yes, for sure. And I started doing some of the mentioned refactorings very efficiently while, in parallel, working on new features. That felt really, really good.\u003c/p\u003e\n\n\u003cp\u003eThere were times when I felt completely unleashed. Every second, there was something useful to do, and I could finally focus on the “meaningful work”—focus my time on solving domain issues. Technology wasn’t standing in my way but instead was elevating me, doing the things I really care about.\u003c/p\u003e\n\n\u003cp\u003eIf I was tripping over weird bugs or wrestling with frameworks, I didn’t even start trying to understand them in the first place. Just copy and paste -\u0026gt; a short note to the agent —\u0026gt; “fix this!” —\u0026gt; switch to another task -\u0026gt; get a push notification later “Pling!” -\u0026gt; tests are green -\u0026gt; repeat. I finally had the feeling of getting into a real flow again. No interruptions from hour-long step debugging or researching poorly maintained API documentation and digging through Stackoverflow (RIP). Did everything turn out well in the end? Finally?\u003c/p\u003e\n\n\u003cp\u003eIn the next step, I was tempted to introduce completely new classes of tests into the project. There was this neglected BDD test suite that constantly caused bad feelings. Just as I was about to start, I had a moment of enlightenment and stopped.\u003c/p\u003e\n\n\u003cp\u003eDid I really want to use tools that could be ripped from my hands in a second if the VC money was burned, just to introduce a whole new level of project complexity? Aren’t resource constraints exactly what let us focus on the things that really matter? I realized that this way of working was mentally exhausting and could lead to the exact same situation in the future—with the only difference being tenfold complexity to care about.\u003c/p\u003e\n\n\u003ch3 id=\"more-is-more-right\"\u003eMore is More. Right?\u003c/h3\u003e\n\n\u003cp\u003eMaybe my brain could recharge during these kinds of mind-numbing refactorings and regain new problem-solving energy. I took a step back and radically changed my style of working. Together with the agent, I took my time and used many more iterations to reduce the generated code—to make it much more expressive and truly reflect the domain concerns. I let the agent challenge my architectural decisions and was able to bring more refined concepts into discussions with my team. I used the agent to produce less code, but better code than before.\u003c/p\u003e\n\n\u003cp\u003eBecause in the end, I must be able to read the code, review it, and take responsibility for all the code I produce, no matter how it was created. The results are better, productivity is higher—but not tenfold.\u003c/p\u003e\n\n\u003cp\u003eBuilding different features in parallel led to a high mental load, and the results weren’t better than before. So I used agents to give myself more time to think and to unblock myself from annoying bug research.\u003c/p\u003e\n\n\u003ch3 id=\"divide-and-conquer-how-to-actually-use-agents\"\u003eDivide and Conquer: How to Actually Use Agents\u003c/h3\u003e\n\n\u003cp\u003eBecause of this experience, here’s my call to action: Please generate good, domain-focused code and build secure and stable frameworks with agents—but not at the same time. Divide and conquer. Don’t fall into the trap of generating huge amounts of boilerplate. That’s what frameworks are for. Frameworks aren’t obsolete; they’re becoming even more important as a harness for your agent. Not everyone has to whip up their own ORM or authentication code. Please don’t.\u003c/p\u003e\n\n\u003ch3 id=\"less-coding-better-engineering\"\u003eLess Coding, Better Engineering\u003c/h3\u003e\n\n\u003cp\u003eI believe AI will make me a worse programmer, because I don’t need to have as many concepts directly accessible in my brain. I only have to understand them on demand. But do I even want to be a good programmer? Producing huge amounts of code didn’t solve problems in the past. Creating more problems was often the case. Software development was never about producing code. Code is a tool to solve domain issues, business issues. That’s what I’d rather use agents for.\u003c/p\u003e\n\n\u003cp\u003eReal people are still needed to ask agents the right questions and give the correct answers. That’s where i’m in the same boat with Karl Marx: Machines elevate human productivity, but it always takes human work to deliver value. Machines can never do that alone\u003ca href=\"https://www.innoq.com/en/blog/2026/03/agenten-gut-alles-gut/#fn:1\" id=\"fnref:1\" title=\"see footnote\" class=\"footnote\"\u003e[1]\u003c/a\u003e. But is everyone in the same boat? There are still people measuring developer productivity in lines of code. A positive outcome would be that those people now understand—through huge amounts of code slop—that this was never a good idea and never will be.\u003c/p\u003e\n\n\u003cfoot-notes class=\"footnotes\"\u003e\n\n\u003col class=\"footnotes__list\"\u003e\n\n\u003cli id=\"fn:1\"\u003e\n\u003cp\u003e“Machinery, like every other component of constant capital, creates no new value, but yields up its own value to the product that it serves to beget.” https://www.marxists.org/archive/marx/works/1867-c1/ch15.htm \u003ca href=\"https://www.innoq.com/en/blog/2026/03/agenten-gut-alles-gut/#fnref:1\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003c/ol\u003e\n\u003c/foot-notes\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/aql52ddjxjs2i87xu06ucfi416zm?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/03/ai-cognitive-lens-cognitive-load-theory/","url":"https://www.innoq.com/en/blog/2026/03/ai-cognitive-lens-cognitive-load-theory/","date_published":"2026-03-10T00:00:00.000+00:00","date_modified":"2026-04-07T12:03:24.269+02:00","author":[{"name":"Daniel Westheide","url":"https://www.innoq.com/en/staff/daniel-westheide/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003e\u003cem\u003eThis is the third post in “Developing with AI Through the Cognitive Lens,” a series exploring how AI coding tools affect the way programmers learn, work, and build expertise. Drawing on cognitive psychology research—particularly Felienne Hermans' work in The Programmer’s Brain—this series examines what happens to our skills when we delegate cognitive work to AI. The goal isn’t to reject AI, but to use it deliberately, making conscious choices about when it helps and when it hinders.\u003c/em\u003e\u003c/p\u003e\n\n\u003cp\u003eA \u003ca href=\"https://www.anthropic.com/research/AI-assistance-coding-skills\"\u003estudy\u003c/a\u003e recently published by Anthropic, the company behind Claude Code, examined how AI coding assistants affect developers learning unfamiliar technology. The overall finding seems damning: participants using AI were only marginally faster, and this difference is not even statistically significant. At the same time, they score significantly worse on comprehension—50% versus 67% for those without AI assistance.\u003c/p\u003e\n\n\u003cp\u003eBut the study results are much more nuanced than that. Within the AI group, researchers identified six distinct interaction patterns with comprehension scores ranging from 24% to 86%. Some patterns severely harmed learning, while others enhanced it beyond what manual coding without any AI assistance achieved. The difference wasn’t the tool, but how participants engaged with it cognitively.\u003c/p\u003e\n\n\u003cp\u003eIn this post, I’ll examine these patterns through the lens of cognitive psychology, particularly cognitive load theory, to explain why they produced such different outcomes. I’ll also discuss the study’s limitations: what these findings can tell us about AI coding patterns, and where we must be cautious about generalising beyond this specific context.\u003c/p\u003e\n\n\u003cp\u003eBefore diving into interpretation, let’s review the study design and results.\u003c/p\u003e\n\n\u003ch3 id=\"the-study-design\"\u003eThe study design\u003c/h3\u003e\n\n\u003cp\u003eIn the study, 52 engineers, most of them juniors, had to solve programming tasks in Python using the Trio library. All of the participants had prior experience with both Python and AI-assisted coding, but were not familiar with Trio.\u003c/p\u003e\n\n\u003cp\u003eThe participants were split into two groups, a test group that had access to AI assistants, and a control group with no such help. All of the participants first had to solve two different programming tasks, implementing a feature using Trio. They were told to solve the tasks as quickly as possible, and they had a hard time limit of 35 minutes for doing so.\u003c/p\u003e\n\n\u003cp\u003eThe programming tasks were directly followed by a quiz, where their understanding of the concepts they had to use to solve the programming tasks.\u003c/p\u003e\n\n\u003ch3 id=\"the-results\"\u003eThe results\u003c/h3\u003e\n\n\u003cp\u003eFor the detailed results, please refer to the study. Here, I want to highlight the most relevant and striking results:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eParticipants in the test group were two minutes faster than those in the control group, on average. However, this result is \u003cem\u003enot statistically significant.\u003c/em\u003e\n\u003c/li\u003e\n\u003cli\u003eOn average, participants in the test group scored 17% lower than the control group in the quiz that followed the programming tasks, which corresponds to about two letter grades. Unlike the speed increase, this result is statistically significant.\u003c/li\u003e\n\u003cli\u003eIn the test group, six distinct AI interaction patterns emerged with very different outcomes for both productivity and learning.\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003ch3 id=\"interaction-patterns-that-hinder-learning\"\u003eInteraction patterns that hinder learning\u003c/h3\u003e\n\n\u003cp\u003eOf the six interaction patterns the study authors identified, three led to low scores in the quiz, showing a poor understanding of the technology they had just used. On average, participants using one of these failing patterns reached scores of less than 40%, whereas the average score among all participants in the test group was 50%.\u003c/p\u003e\n\n\u003cp\u003eLet’s look at these interaction patterns in more detail.\u003c/p\u003e\n\n\u003ch4 id=\"ai-delegation-and-progressive-ai-delegation\"\u003eAI delegation and progressive AI delegation\u003c/h4\u003e\n\n\u003cp\u003eFour participants relied entirely on AI to write code. They described what they needed and let the AI generate the solution. This group completed tasks fastest and encountered few errors. Four other participants started by asking the AI assistant one or two questions, but then decided to delegate all code generation to the AI. In the quiz, the AI delegation group scored at 39% and the progressive AI delegation group at 35%.\u003c/p\u003e\n\n\u003cp\u003eWhy did these participants score so poorly in the quiz? What cognitive psychologists have found out about how learning works can explain this. These two patterns bypass \u003ca href=\"https://www.innoq.com/en/blog/2025/12/ai-cognitive-lens-elaboration/\"\u003eelaboration\u003c/a\u003e entirely—a process in which you connect new information to your existing mental models.\u003c/p\u003e\n\n\u003cp\u003eWhen AI generates code without your engagement, critical learning processes never occur. You don’t compare alternatives, you don’t adapt the code, and you don’t encounter errors.\u003c/p\u003e\n\n\u003cp\u003eThe manual coding group hit many errors that, according to the authors, “mapped directly to topics tested on the evaluation.” The AI delegation group avoided these errors, which felt efficient but eliminated the friction that helps with encoding concepts into memory.\u003c/p\u003e\n\n\u003ch4 id=\"iterative-ai-debugging\"\u003eIterative AI debugging\u003c/h4\u003e\n\n\u003cp\u003eFour more participants used an interaction patterns the authors of the study call iterative AI debugging:\u003c/p\u003e\n\n\u003cblockquote\u003e\n\u003cp\u003eParticipants in this group relied on AI to debug or verify their code. They asked more questions, but relied on the assistant to solve problems, rather than to clarify their own understanding. They scored poorly as a result, and were also slower at completing the two tasks.\u003c/p\u003e\n\u003c/blockquote\u003e\n\n\u003cp\u003eParticipants using this interaction pattern scored at 24% on average in the quiz, so they showed the worst understanding of Trio from all six interaction patterns. Interestingly, they were also notably slower at completing their two tasks.\u003c/p\u003e\n\n\u003cp\u003eWhen you ask AI to diagnose and solve errors, you outsource the cognitive work that builds not only debugging skills but a mental model of how the technology you use works—in this case Trio. If you don’t spend time thinking about what could cause an observed error, and don’t systematically test your theories, there is nothing to connect your possibly poor mental model of how that technology works.\u003c/p\u003e\n\n\u003ch3 id=\"interaction-patterns-that-help-with-learning\"\u003eInteraction patterns that help with learning\u003c/h3\u003e\n\n\u003cp\u003eThe good news is that not all ways of interacting with AI assistants are bad. The authors of the study identified three interaction patterns whose behaviours led to an average score of 65% or higher. Let’s look at how these behaviours differ from the previously described patterns and how we can explain their benefit for building understanding.\u003c/p\u003e\n\n\u003ch4 id=\"generation-then-comprehension\"\u003eGeneration-then-comprehension\u003c/h4\u003e\n\n\u003cp\u003eThe two participants in this group used AI for generating code, but after doing so, they did not move on to the next task. Instead, they continued by asking the AI assistant questions to verify their understanding of the solution. In contrast to the AI delegation group, they were slower, but achieved the highest comprehension scores in the entire study: 86% on average—significantly better than even the control group’s 67%.\u003c/p\u003e\n\n\u003cp\u003eHow can we explain the difference in understanding? The key difference from AI delegation: the cognitive work that happens after generation. The AI reduces the friction of syntax and boilerplate, but the learner maintains full cognitive ownership of understanding. This combines the efficiency benefits of AI generation with the learning benefits of active interrogation. Generating hypotheses about how code works and testing them strengthens memory more than passive reading. The participants weren’t just accepting AI output. Instead, they were actively building and testing their mental models against it.\u003c/p\u003e\n\n\u003cp\u003eWhat’s remarkable is that this interaction pattern scored 19 percentage points better than solving the tasks without AI. This shows that in some circumstances, conscious AI use can enhance learning, not just preserve it.\u003c/p\u003e\n\n\u003cp\u003eHowever, this pattern requires discipline. The temptation after generating working code is to move on. These participants resisted that temptation and invested time in verification. The slightly slower completion time bought significantly deeper understanding.\u003c/p\u003e\n\n\u003ch4 id=\"hybrid-code-explanation\"\u003eHybrid code-explanation\u003c/h4\u003e\n\n\u003cp\u003eThree participants used an interaction pattern that the authors call hybrid code-explanation. They asked the AI assistant to generate both code and the appropriate explanations. The participants took their time to read the explanations, so they were slower than the AI delegation group, but scored better in the quiz. With an average score of 68%, they were in the same range as the control group.\u003c/p\u003e\n\n\u003cp\u003eBy requesting both code and explanations, participants got the solution and the reasoning behind it. The explanations made expert reasoning visible, showing not just what works, but why it works. This provided connection points to existing mental models and helped participants understand the concepts behind the implementation.\u003c/p\u003e\n\n\u003cp\u003eHowever, this pattern doesn’t involve true elaboration in the sense of actively connecting concepts to prior knowledge. Participants read explanations provided upfront rather than generating their own understanding through questioning and active processing. The explanations provide valuable context, but the learning remains relatively passive. This is more like reading a well-explained textbook than actively working to build understanding. This distinction helps explain why hybrid code-explanation, while effective, didn’t achieve the highest comprehension scores.\u003c/p\u003e\n\n\u003ch4 id=\"conceptual-inquiry\"\u003eConceptual inquiry\u003c/h4\u003e\n\n\u003cp\u003eSeven participants used the AI assistant exclusively for asking conceptual questions. They then used their improved understanding of Trio to implement a solution for the respective task. They encountered many errors, but did not rely on the AI to resolve them, instead coming to a solution on their own. In the quiz, their average score was 65%, in the same range as the control group.\u003c/p\u003e\n\n\u003cp\u003eThis pattern preserves elaboration through both questioning and manual implementation. By asking conceptual questions, participants connected new Trio concepts to their existing knowledge. By coding manually and debugging independently, they engaged in active generation and error-driven learning—all mechanisms that support encoding into long-term memory.\u003c/p\u003e\n\n\u003cp\u003eThe score of 65%, similar to the control group’s 67%, suggests this pattern is effective for building understanding. Both groups engaged in the same fundamental learning processes: forming hypotheses, testing them through implementation, and refining mental models through error correction. The AI served primarily to clarify concepts more efficiently than searching documentation, but the cognitive work of elaboration remained with the learner.\u003c/p\u003e\n\n\u003ch3 id=\"understanding-the-exceptional-results\"\u003eUnderstanding the exceptional results\u003c/h3\u003e\n\n\u003cp\u003eWe’ve seen how elaboration explains the fundamental differences between these patterns. The three failing patterns bypassed elaboration entirely, while the three successful patterns preserved it through different mechanisms. This accounts for why some patterns lead to learning while others don’t.\u003c/p\u003e\n\n\u003cp\u003eBut elaboration alone doesn’t explain why generation-then-comprehension achieved a quiz score of 86%, which is substantially higher than both conceptual inquiry (65%) and even the control group (67%).\u003c/p\u003e\n\n\u003cp\u003eAll three preserved elaboration, yet the outcomes differed significantly. To understand why one elaboration-preserving pattern outperformed the others so dramatically, we need to examine cognitive load theory and how the study’s 35-minute time constraint shaped these results.\u003c/p\u003e\n\n\u003ch4 id=\"a-cognitive-load-theory-primer\"\u003eA cognitive load theory primer\u003c/h4\u003e\n\n\u003cp\u003e\u003ca href=\"https://en.wikipedia.org/wiki/Cognitive_load\"\u003eCognitive Load Theory\u003c/a\u003e, developed by educational psychologist John Sweller, explains how the limitations of working memory affect our ability to perform cognitive tasks. Our working memory can only hold a limited amount of information at once. When it becomes overloaded, both task performance and learning suffer.\u003c/p\u003e\n\n\u003cp\u003eSweller identified three types of cognitive load:\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eIntrinsic load\u003c/strong\u003e is the inherent complexity of the task itself. In the Anthropic study, intrinsic load came from understanding the problem requirements and the conceptual complexity of asynchronous concurrency. This load can’t be eliminated. It’s fundamental to the task and concepts involved.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eExtraneous load\u003c/strong\u003e is cognitive effort wasted on things that don’t contribute to completing the task or understanding the concepts. Poor documentation, confusing error messages, a user interface full of distractions, unfamiliar syntax and concepts, or navigating unfamiliar tools, all create extraneous load. Unlike intrinsic load, extraneous load should be minimised since it consumes working memory capacity without advancing either task completion or learning. Whether extrinsic load is low or high for the same task can differ depending on which mental models and schemata you have already built through experience.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eGermane load\u003c/strong\u003e is the productive cognitive effort invested in building understanding—constructing mental models and forming durable knowledge structures. This includes elaboration, schema construction, and integrating new patterns into long-term memory. Germane load is what enables lasting learning that transfers beyond the immediate task.\u003c/p\u003e\n\n\u003cp\u003eThe three types of load compete for the same limited resource. If intrinsic load is high (complex task, unfamiliar concepts) and extraneous load is also high (poor tools, unclear documentation), little capacity remains for germane load, the mental work that builds the understanding tested in the quiz. If extraneous load is low, more working memory capacity is available for germane load, helping with integrating new concepts into long-term memory.\u003c/p\u003e\n\n\u003ch4 id=\"explaining-the-results-through-cognitive-load-theory\"\u003eExplaining the results through Cognitive Load Theory\u003c/h4\u003e\n\n\u003cp\u003eCognitive load theory helps explain why the 35-minute time constraint in the study mattered so much. How participants allocated their limited cognitive resources determined both their task completion speed and their learning outcomes.\u003c/p\u003e\n\n\u003cp\u003eThe key insight: patterns that bypassed the intrinsic load of the implementation task and minimised extraneous load freed up more capacity for germane load. In a time-constrained context, this allocation of cognitive resources proved decisive. Generation-then-comprehension participants could spend nearly all 35 minutes on learning, while others had to split time between solving the problem, implementing it, and learning concepts.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eGeneration-then-comprehension’s\u003c/strong\u003e exceptional comprehension score becomes clear through this lens. By having AI generate the solution, participants completely bypassed the intrinsic load of figuring out how to implement the features. They also avoided the extraneous load of wrestling with unfamiliar Trio syntax and patterns. This meant they could dedicate virtually all 35 minutes to germane load—actively questioning the code, building mental models, and connecting Trio concepts to their existing knowledge. The time allocation was optimal for learning: zero minutes on implementation, maximum minutes on understanding.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eConceptual inquiry and the control group\u003c/strong\u003e (scoring at 65% and 67% respectively) performed similarly because both faced the same cognitive load demands. Participants had to manage the full intrinsic load of solving the implementation problem themselves, handle the extraneous load of unfamiliar Trio syntax (conceptual inquiry) or documentation searching (control), and still try to learn the underlying concepts, for instance through dealing with errors. Their 35 minutes was split across multiple competing demands: designing the solution, implementing it, debugging errors, and building understanding. Less time for germane load meant less learning, reflected in their lower scores.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eHybrid code-explanation\u003c/strong\u003e (scoring at 68%) sits between these extremes. Like generation-then-comprehension, it bypassed task intrinsic load through AI-generated solutions and minimised extraneous load. Participants could spend their time studying rather than implementing. However, as mentioned previously, the reading of provided explanations engaged working memory less intensively than the active questioning in generation-then-comprehension. The difference between 68% and 86% reflects the difference between passively reading explanations and actively constructing understanding through self-directed inquiry. This is completely consistent with what we know from the science of learning.\u003c/p\u003e\n\n\u003cp\u003eThe 35-minute constraint amplified these differences. In a time-unlimited setting, the costs of managing intrinsic and extraneous load might matter less. But with a hard time limit, how cognitive resources were allocated became the determining factor in learning outcomes.\u003c/p\u003e\n\n\u003ch3 id=\"what-we-can-and-cannot-conclude\"\u003eWhat we can and cannot conclude\u003c/h3\u003e\n\n\u003cp\u003eThe Anthropic study provides valuable data about how AI interaction patterns affect immediate comprehension in time-constrained learning tasks. However, the study’s specific contex limits how broadly we can apply these findings. Skill development involves more than quiz performance happening immediately after the tasks. It requires procedural fluency, debugging capability, long-term retention, and the ability to transfer knowledge to new problems. Moreover, durable encoding into long-term memory requires time and repetition. A single 35-minute session with immediate testing doesn’t capture whether learning persists. Let’s examine what the study does and doesn’t tell us.\u003c/p\u003e\n\n\u003ch4 id=\"the-study-s-specific-context\"\u003eThe study’s specific context\u003c/h4\u003e\n\n\u003cp\u003eThe following constraints shaped which interaction succeeded:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003e\n\u003cstrong\u003eTime constraint:\u003c/strong\u003e Hard 35-minute limit for completing tasks\u003c/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eTask type:\u003c/strong\u003e Implementing features with an unfamiliar library (Trio)\u003c/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePrior knowledge:\u003c/strong\u003e Participants experienced with Python (1+ year weekly use), unfamiliar with Trio\u003c/li\u003e\n\u003cli\u003e\n\u003cstrong\u003ePopulation:\u003c/strong\u003e Mostly junior engineers with AI coding experience\u003c/li\u003e\n\u003cli\u003e\n\u003cstrong\u003eMeasurement:\u003c/strong\u003e Immediate comprehension test following task completion\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eAs we’ve seen through cognitive load theory, the 35-minute time limit made generation-then-comprehension optimal because it allowed participants to spend virtually all their time on germane load. But this finding may not generalise beyond these specific conditions.\u003c/p\u003e\n\n\u003ch4 id=\"what-we-can-confidently-conclude-from-the-study\"\u003eWhat we can confidently conclude from the study\u003c/h4\u003e\n\n\u003cp\u003eThe study provides strong evidence for several important findings:\u003c/p\u003e\n\n\u003col\u003e\n\u003cli\u003eHow you use AI matters fundamentally. The 24% to 86% range in comprehension scores shows that interaction patterns, not just AI presence or absence, determine learning outcomes. Some patterns severely harm understanding while others enhance it.\u003c/li\u003e\n\u003cli\u003eElaboration distinguishes effective from ineffective patterns. Patterns that bypassed elaboration (AI delegation, progressive AI reliance, iterative AI debugging) led to poor comprehension. Patterns that preserved elaboration through questioning or manual implementation maintained or improved learning.\u003c/li\u003e\n\u003cli\u003eIn time-constrained contexts with unfamiliar libraries, generation-then-comprehension optimises learning. When developers need to quickly understand new frameworks or libraries, generating code then actively questioning it produces superior conceptual understanding compared to other approaches.\u003c/li\u003e\n\u003cli\u003ePassive consumption versus active construction matters. Even among patterns that bypassed task intrinsic load, active questioning (86%) outperformed passive reading of explanations (68%), consistent with established learning science about the generation effect.\u003c/li\u003e\n\u003c/ol\u003e\n\n\u003ch4 id=\"what-we-cannot-conclude-from-the-study\"\u003eWhat we cannot conclude from the study\u003c/h4\u003e\n\n\u003cp\u003eThe study’s limited context means we must be cautious about broader generalisations:\u003c/p\u003e\n\n\u003col\u003e\n\u003cli\u003eWe cannot conclude that generation-then-comprehension is optimal for junior developers learning foundational programming concepts. The study tested developers who already had Python foundations learning an unfamiliar library. For developers still building foundational understanding the cognitive load landscape differs. When basic syntax, control flow, and fundamental concepts still require conscious effort, the extraneous load doesn’t come only from unfamiliar library patterns but from the programming fundamentals themselves.\u003c/li\u003e\n\u003cli\u003eWe cannot conclude anything about long-term skill development. The study measured immediate comprehension through a quiz. It didn’t test whether participants could implement similar solutions independently days or weeks later, debug their own code, or transfer understanding to new problems. These longer-term outcomes may favour different interaction patterns.\u003c/li\u003e\n\u003cli\u003eWe cannot conclude much about procedural skill development. The quiz mainly tested conceptual understanding. Manual implementation builds muscle memory, debugging patterns, and the procedural knowledge that comes from repeated practice.\u003c/li\u003e\n\u003cli\u003eWe cannot assume these results hold without time constraints. The 35-minute limit amplified the advantages of patterns that bypassed task intrinsic load. In realistic development contexts where developers have hours or days for tasks, the time allocation advantage of generation-then-comprehension diminishes.\u003c/li\u003e\n\u003c/ol\u003e\n\n\u003ch4 id=\"why-realistic-junior-developer-contexts-may-require-different-patterns\"\u003eWhy realistic junior developer contexts may require different patterns\u003c/h4\u003e\n\n\u003cp\u003eUnderstanding the study’s limitations through cognitive psychology reveals why junior developers might need different approaches:\u003c/p\u003e\n\n\u003ch5 id=\"time-constraints-don-t-reflect-reality\"\u003eTime constraints don’t reflect reality\u003c/h5\u003e\n\n\u003cp\u003eJunior developers rarely work under hard 35-minute limits. When time isn’t scarce, the cognitive load optimisation that made generation-then-comprehension excel becomes less decisive. Spending time on implementation doesn’t compete with learning. It becomes a crucial part of learning.\u003c/p\u003e\n\n\u003ch5 id=\"procedural-fluency-requires-practice\"\u003eProcedural fluency requires practice\u003c/h5\u003e\n\n\u003cp\u003eThe study tested conceptual understanding but not procedural skills: Can you implement similar solutions? Do you recognise when to apply these patterns? Can you debug independently? These abilities develop through repeated manual practice, not just through understanding explanations. The generation effect and deliberate practice research suggest that actively doing implementation work, even when it’s slower, may build more durable procedural knowledge than studying AI-generated solutions.\u003c/p\u003e\n\n\u003ch5 id=\"error-driven-learning-builds-debugging-skills-and-mental-models-of-failure\"\u003eError-driven learning builds debugging skills and mental models of failure\u003c/h5\u003e\n\n\u003cp\u003eGeneration-then-comprehension participants avoided errors, which helped them score well on conceptual questions. But encountering errors, forming hypotheses about causes, and debugging independently builds crucial mental models of how systems fail. These failure models are essential for validating AI-generated code and debugging production issues.\u003c/p\u003e\n\n\u003ch5 id=\"long-term-retention-may-differ-from-immediate-comprehension\"\u003eLong-term retention may differ from immediate comprehension\u003c/h5\u003e\n\n\u003cp\u003eLearning science research on the generation effect and desirable difficulties suggests that more effortful learning—like implementing code yourself and debugging errors—often produces better long-term retention even when immediate performance is lower. The study’s quiz measured understanding immediately after the task. We don’t know how well participants retained this knowledge days or weeks later, or whether they could transfer it to different problems.\u003c/p\u003e\n\n\u003ch5 id=\"foundational-learning-differs-from-learning-new-libraries\"\u003eFoundational learning differs from learning new libraries\u003c/h5\u003e\n\n\u003cp\u003eThe study participants had solid Python foundations, so the extraneous load came primarily from unfamiliar Trio patterns. For junior developers learning foundational programming concepts, extraneous load is high across the board. The optimal interaction pattern for building foundations may differ from the optimal pattern for adding new libraries to an existing foundation.\u003c/p\u003e\n\n\u003cp\u003eThe study provides valuable insights about AI interaction patterns in specific contexts. But realistic junior developer skill development involves longer timeframes, needs for procedural fluency, importance of error-driven learning, and requirements for long-term retention that the study didn’t measure. Different contexts may call for different patterns, even if those patterns don’t maximise immediate quiz performance under time pressure.\u003c/p\u003e\n\n\u003ch3 id=\"conclusion\"\u003eConclusion\u003c/h3\u003e\n\n\u003cp\u003eHow you use AI matters more than whether you use it. The same tools produced comprehension scores ranging from 24% to 86%. Interaction patterns that bypassed elaboration failed, while those that preserved it succeeded. In the study’s time-constrained context, generation-then-comprehension excelled by maximizing time for active learning.\u003c/p\u003e\n\n\u003cp\u003eBut context shapes outcomes. The study’s 35-minute limit and focus on immediate comprehension don’t reflect realistic skill development over months and years. Understanding the cognitive mechanisms, notably elaboration, cognitive load, active engagement, deliberate practice, and spaced repetition, helps you choose interaction patterns that support learning and understanding in your specific context, rather than blindly following findings from a single study.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/x37hhr5hpaxpbgzk5jzlq3cskri8?_a=BACMTiAE"},{"id":"https://www.innoq.com/de/blog/2026/02/ai-asstance-kurzer-ueberblick-q1-2026/","url":"https://www.innoq.com/de/blog/2026/02/ai-asstance-kurzer-ueberblick-q1-2026/","date_published":"2026-03-06T00:00:00.000+00:00","date_modified":"2026-03-10T15:35:43.687+01:00","author":[{"name":"Sven Johann","url":"https://www.innoq.com/en/staff/sven-johann/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eEs ist die letzten Monate ziemlich leicht geworden, bei AI-assisted Softwareentwicklung den Überblick zu verlieren. Ständig tauchen bei LinkedIn neue Heldentaten auf (“17.000 LoC pro Tag pro Entwickler”), neue Tools wie \u003ca href=\"https://github.com/steveyegge/gastown\"\u003eGas Town\u003c/a\u003e, deren Vorgehen überwältigt oder CEOs, die verkünden, dass man “keine Entwickler mehr braucht” bzw. gleichzeitig andere CEOs, die wiederum verkünden, dass “man mehr Entwickler einstellen will”.\u003c/p\u003e\n\n\u003cp\u003eDas Ende der Programmiersprachen wird verkündet (Erik Meijer) und dann doch wieder mit Constraints eingefangen. Code ist zukünftig kein Artefakt mehr, oder doch? Wenn nein, was ist es dann? Waren kleine “Batch Sizes” nicht lange das Nonplus-Ultra? Kurze Feedback-Loops? Gilt das (zukünftig) noch, ja oder nein? Gibt es denn gar nichts mehr, an dem man sich orientieren kann? Keine stabilen Patterns und Vorgehensweisen, an die neues Wissen angedockt werden kann?\u003c/p\u003e\n\n\u003cp\u003eGibt es schon noch. Bei unserem neusten \u003ca href=\"https://www.case-podcast.org/62-utah-retreat-insights-and-state-of-play-ai-assisted-development\"\u003eCaSE Podcast\u003c/a\u003e hatten \u003ca href=\"https://www.heinrichhartmann.com/\"\u003eHeinrich Hartmann\u003c/a\u003e, Principal SRE bei Zalando und ich letzte Woche das Vergnügen, uns mit \u003ca href=\"https://birgitta.info/\"\u003eBirgitta Boeckeler\u003c/a\u003e dazu auszutauschen. Sie ist Distinguished Engineer bei ThoughtWorks, Expertin fur AI-assisted Software Delivery, Technology Radar Member und auch Teilnehmerin des Anfang Februar statt gefundenen \u003ca href=\"https://www.thoughtworks.com/content/dam/thoughtworks/documents/report/tw_future\"\u003eRetreats in Utah\u003c/a\u003e zur Zukunft der Softwareentwicklung in Zeiten von AI, über das wir auch gesprochen haben und ich hier auch berichten will. Ihr könnt natürlich gerne den 90-minütigen Podcast hören (wir haben nicht alle o.g. Punkte adressieren können). Ich wollte aber zusätzlich einen begleitenden Text jenseits der typischen AI-generierten Zusammenfassung zu schreiben, in dem ich noch ein paar Punkte unterbringen kann, die im Podcast nicht angesprochen wurden bzw. davor oder danach diskutiert wurden und natürlich um noch einmal selbst zu reflektieren :-)\u003c/p\u003e\n\n\u003ch3 id=\"stand-der-ai-assistance-tools-19-februar-2026-19-46-uhr\"\u003eStand der AI Assistance Tools (19. Februar 2026, 19:46 Uhr)\u003c/h3\u003e\n\n\u003cp\u003eWir schreiben den 19. Februar 2026, 19:46 Uhr. Es ist wichtig, diesen Zeitstempel zu nennen, denn die Entwicklung geht rasend schnell. Wir schauen auf die groben wichtigen Entwicklungen. Das Jahr 2025 war ein interessantes Jahr: es gibt wirklich gute Modelle, es gibt Agenten, es gibt CLIs und es gibt Context Engineering. MCP war ziemlich in und ist nun ein kleines bisschen out.\u003c/p\u003e\n\n\u003cp\u003eBei den \u003cstrong\u003eModellen\u003c/strong\u003e können wir uns kurz halten: Dort sind Claude Opus 4.6 und OpenAI GPT-5.3-Codex die Spitzenreiter. Ich persönlich hänge in meinem Umfeld noch auf 5.2 fest, deswegen kann ich keine Aussage zu den neuen Modellen jenseits von Spielzeugsystemen machen, aber im Allgemeinen herrscht Begeisterung, was die Leistungsfähigkeit angeht.\u003c/p\u003e\n\n\u003cp\u003eDie Einführung von Coding \u003cstrong\u003eAgents\u003c/strong\u003e war ein weiterer Meilenstein in 2025. Im Gegensatz zum einfachen Chatfenster innerhalb der IDE können Agenten mehr. Man kann ihnen Ziele geben und sie können im Gegensatz zum Chatfenster nicht nur Code generieren oder analysieren, sondern auch Dateien schreiben, Kommandos ausführen, Tests schreiben und ausführen, Code-Qualitätswerkzeuge aufrufen oder auch Git-Kommandos ausführen. Dadurch ist es möglich, nicht einfach nur im Frage-/Antwort Modus zu arbeiten, sondern konkrete Aufgaben zu vergeben, die iterativ ausgeführt werden können. Man kann auch mehrere \u003cstrong\u003e(Sub-)Agenten\u003c/strong\u003e unterschiedliche Aufgaben ausführen lassen, z.B. einer plant, ein anderer implementiert, wiederum ein anderer testet, usw. Implementierungen können auch gesplittet werden, z.B. Frontend-Subagent und Backend-Subagent.\u003c/p\u003e\n\n\u003cp\u003e\u003cstrong\u003eContext Engineering\u003c/strong\u003e ist die Kuratierung dessen, was das Modell sieht und ist auch eine nennenswerte Entwicklung in 2025. Man muss dem Modell nicht immer die gesamte Codebasis als Kontext mitgeben, denn das führt nicht nur zu erhöhten Kosten, sondern auch zur Degradierung der Antwortqualität. Code ist auch nicht das Einzige, was ein Modell benötigt, um eine Aufgabe gut zu lösen. Zusätzliche nützliche Informationen können Beschreibungen sein, wie man eine Technologie idiomatisch korrekt umsetzt, Coding-Guidelines des Unternehmens oder die zu wählende Konstruktionsoption aus einem Architecture Decision Record. Diese können je nach Tool in dafür vorgesehenen Dateien (z.B. agents.md), slash-commands oder Skills mitgegeben werden. Mit Subagenten kann man die Kontexte für spezielle Aufgaben klein halten und dadurch bessere Resultate erzielen.\u003c/p\u003e\n\n\u003cp\u003eDer Trend, für alles einen eigenen MCP-Server anzubieten, nimmt ab. Stattdessen kann man existierende CLIs nutzen, mit denen die AI inzwischen gut umgehen kann.\u003c/p\u003e\n\n\u003ch3 id=\"you-must-be-this-tall-to-use-microservices-ah-agentic-coding\"\u003e“You must be this tall to use Microservices, äh, Agentic Coding“\u003c/h3\u003e\n\n\u003cp\u003eIhr lest es sicher täglich bei LinkedIn: wer den generierten Code noch reviewed und feinjustiert ist ein Dino und bald ausgestorben. Die Frontsäue lassen große Teile von neuen Funktionalitäten komplett generieren. Software muss zu großen Teilen immer noch in kurzen, kleinen Schritten entwickelt werden, zumindest wenn sie geschäfts- oder sicherheitskritisch ist und im Team entwickelt wird. Es kommt auch darauf an, von wo man startet und wie das Umfeld ist. Die wenigsten Systeme und Organisationen sind heutzutage AI Native. Die Überschrift sagt es schon: so wie bei Martin Fowlers \u003ca href=\"https://martinfowler.com/bliki/MicroservicePrerequisites.html\"\u003eArtikel\u003c/a\u003e, welche Bedingungen gelten müssen um Microservices zu entwickeln und zu betreiben, müssen auch einige Bedingungen erfüllt werden, um AI Agents erfolgreich nutzen zu können.\u003c/p\u003e\n\n\u003ch3 id=\"wo-bleibt-die-prazision-im-hipster-land\"\u003eWo bleibt die Präzision im Hipster-Land?\u003c/h3\u003e\n\n\u003cp\u003eDie Frage, die ich mir und ihr euch und auch die Teilnehmer des o.g. Events in Utah stellen, ist, wie man denn in der Gegenwart, der nahen und fernen Zukunft die Qualität unserer Software sicherstellen kann, wenn man deutlich schneller viel mehr Code erzeugen kann. Jetzt interessiere ich mich eher für das heute und die nahe Zukunft und hier ist es so, dass individuelle Teams auch ihre individuellen Developer-Workflows finden müssen. Wie immer ist es ratsam, klein anzufangen und auszuprobieren was geht und was nicht. Die o.g. Dino-Taktik funktioniert für viele Teams gut und kann inkrementell weiterentwickelt werden. Aber wohin? Wie sieht die State-of-Art aus? Bei Microservices war das Poster-Child Netflix, wer ist es bei AI-assistance?\u003c/p\u003e\n\n\u003cp\u003eOpenAI (und auch andere Firmen wie z.B. Databricks) haben das Ziel, gar keinen Code mehr von Hand zu schreiben und stattdessen alles generieren zu lassen. Aber welche Voraussetzungen müssen gelten, um das sicher in der benötigte Qualität tun zu können? Natürlich, wie immer in komplexen Setups, geht das gut in kleinen, inkrementellen Schritten. Baby-Steps, wie Kent Beck sagen würde. Ein großes komplexes (soziotechnisches) System, welches funktioniert, entsteht immer aus einem kleineren komplexen System, welches funktioniert. OpenAI nutzt z.B. \u003ca href=\"https://openai.com/index/harness-engineering/\"\u003eHarness Engineering\u003c/a\u003e um Präzision sicherzustellen und fokussiert sich auf \u003ca href=\"https://martinfowler.com/articles/exploring-gen-ai/harness-engineering.html\"\u003e3 Säulen\u003c/a\u003e:\u003c/p\u003e\n\n\u003cul\u003e\n\u003cli\u003eContext-Engineering: Kontinuierlich erweiterte Wissensdatenbank im Code sowie Zugriff der Agenten auf dynamische Kontexte wie Monitoringdaten, Browser-Navigation oder ADRs.\u003c/li\u003e\n\u003cli\u003eArchitektureinschränkungen: Überwachung nicht nur durch LLM-basierte Agenten, sondern auch durch deterministische benutzerdefinierte Linter und strukturelle Tests. Wir schreiben Unit Tests, Acceptance Tests und Property-Tests selbst - wir wollen im Driver-Seat sitzen. ArchUnit kann strukturelle Einschränkungen als Test liefern und die Freiheit des Agenten einschränken.\u003c/li\u003e\n\u003cli\u003e„Garbage Collection“: Agenten, die regelmäßig ausgeführt werden, um technische Schulden zu bekämpfen (Agenten replizieren existierende Patterns und wenn diese schlecht sind und man das erkennt, dann muss man das eben bekämpfen).\u003c/li\u003e\n\u003c/ul\u003e\n\n\u003cp\u003eImmer wenn die Resultate nicht gut sind, dann ist das ein Smell und diese Säulen müssen verbessert werden (wie auch bei der Netflix-Microservices-Architektur muss sich jeder vor Conference-Driven Development und Cargo-Culting schützen. Selbst überlegen was im eigenen Kontext richtig ist).\u003c/p\u003e\n\n\u003cp\u003eAber, huch? Die letzten beiden Punkte gelten für gute Software-Entwicklung im Allgemeinen. Hier sehe ich eine einfach zu begehende Rampe für uns alle.\u003c/p\u003e\n\n\u003ch3 id=\"risikodenken\"\u003eRisikodenken\u003c/h3\u003e\n\n\u003cp\u003eAch ja, wieder so etwas, dass im AI Land notwendig ist und in der Software Entwicklung im Allgemeinen äußerst nützlich: Risikodenken. Der Code, den ich gerade entwickeln (oder generieren lassen) will: wieviel Risiko (Wahrscheinlichkeit x Auswirkung) steckt da drin? Brauche ich gerade einen Testdata-Builder oder eine Steuerberechnung? CRUD für Stammdaten oder Atomreaktorsteuerungslogik? Wie groß ist der Blastradius? Schreibe ich für mich privat ein Tool oder ist es ein produktionskritisches System? Wie einfach ist es, einen Fehler im generierten Code zu erkennen? Kann das ein Test oder brauche ich eine qualitative Analyse? Es ist ein ständiger Balanceakt – wie viel Aufsicht man braucht bzw. wie viel Autonomie man gibt, ist abhängig vom Risiko und von den möglichen Feedbackzyklen.\u003c/p\u003e\n\n\u003ch3 id=\"innovatoren-fruhe-und-spate-mehrheit-nachzugler\"\u003eInnovatoren, frühe und späte Mehrheit, Nachzügler\u003c/h3\u003e\n\n\u003cp\u003eEin Problem in sehr vielen Organisationen heutzutage sind Spannungen zwischen Enthusiasten und Skeptikern. Das ist normal. Bei neuen Technologien und Methoden unterscheidet man im allgemeinen zwischen Innovatoren, der frühen Mehrheit, der späten Mehrheit und Nachzüglern. Die meisten Menschen sind je nach Situation in allen diesen Kategorien gleichzeitig vertreten. Ich kann AI-Innovator sein, aber in ganz anderen Bereichen ein Nachzügler und umgekehrt.\u003c/p\u003e\n\n\u003cp\u003eWie kann eine Organisation diese Spannungen positiv nutzen? Wir brauchen beides. Experimentierlust und Vorsicht sind gleichermaßen zu belohnen. Es ist gut, wenn eine Kollegin ein interessantes neues Tool oder eine clevere Methodik ausprobiert und sie in die Entwicklungsteams trägt. Skeptiker sorgen dafür, dass unterschiedliche Entwicklungskontexte beachtet werden und die Technologie/Methode besser eingeordnet werden kann.\u003c/p\u003e\n\n\u003cp\u003eSkeptiker sind nicht immer Nachzügler. Die frühe Mehrheit ist auch erstmal skeptisch und will sehen, dass eine Technologie funktioniert, bevor sie sie nutzt. Innovatoren müssen die frühe Mehrheit überzeugen, sonst geht es nicht weiter.\u003c/p\u003e\n\n\u003cp\u003eNicht zu vernachlässigen ist auch die allgemeine Adoptionsrate. Innovation passiert im Allgemeinen viel, viel langsamer als man denkt. Daher: Nur keine Panik schieben, aber ein Gefühl der Dringlichkeit braucht heute jeder Softwareentwickler.\u003c/p\u003e\n\u003cp\u003eIn diesem Artikel wollte ich eine persönliche Zusammenfassung des Gesprächs zwischen Birgitta, Heinrich und mir zum Stand der AI-assistierten Programmierung geben. Sie ist Anfang 2026 weder Heilsversprechen noch Jobvernichter, sondern kann ein ordentlicher Produktivitätsschub sein. Modelle und Agenten sind mittlerweile leistungsfähig – aber ihre Wirksamkeit hängt von vielen Faktoren ab. Die Einführung und Weiterentwicklung dieser Faktoren sind im Grunde genommen auch vorteilhafte Maßnahmen in der Entwicklung klassischer Softwaresysteme. Kann also nicht schaden agentisch werden zu wollen, nicht wahr?\u003c/p\u003e\n\n\u003cp\u003eDie Grundprinzipien guter Softwareentwicklung gelten zudem weiterhin: kleine Schritte, kurze Feedbackzyklen, Tests, explizite Entscheidungen und bewusstes Risikomanagement. Wer sie ignoriert, erhöht nur die Geschwindigkeit, mit der Fehler entstehen. Wenn man intensiv AI nutzt, wird es nur viel schneller viel schlimmer.\u003c/p\u003e\n\n\u003cp\u003eDer produktive Weg liegt zwischen Euphorie und Skepsis. Wir sollten experimentieren – aber gleichzeitig unser tatsächliches Ziel verfolgen und nicht dem Social Media Druck nachgeben: gute Software entwickeln und betreiben, die ihre Stakeholder glücklich macht.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/piyf0gwmyyg5u3ntfifo8qcyfg6v?_a=BACMTiAE"},{"id":"https://www.innoq.com/de/blog/2026/03/jvm-schneller-start/","url":"https://www.innoq.com/de/blog/2026/03/jvm-schneller-start/","date_published":"2026-03-05T00:00:00.000+00:00","date_modified":"2026-03-04T15:23:10.161+01:00","author":[{"name":"Michael Vitz","url":"https://www.innoq.com/en/staff/michael-vitz/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eWenn ich heute mit Nicht-JVM-Menschen über die JVM spreche, dann höre ich häufig zwei Kritikpunkte. Zum einen wird der hohe Arbeitsspeicherbedarf genannt, zum anderen die langsame Startzeit. Der zweite Punkt macht sich umso deutlicher bemerkbar, wenn wir nicht einen modernen Entwicklungsrechner, sondern die in Produktion häufig genutzten kleinen Instanzen betrachten.\u003c/p\u003e\n\n\u003cp\u003eDass beide Punkte mit der Historie der JVM zusammenhängen, soll dabei keine Entschuldigung oder Ausrede sein, sondern ist einfach ein Zeichen für die Zeit, aus der die JVM hervorging. Vor 20 Jahren haben wir wenige, dafür aber größere Anwendungen gebaut und betrieben. Außerdem wurde primär die „Mean Time Between Failures“ optimiert und die Anwendungen liefen oft Monate am Stück. Es gab also wenig Bedarf für eine Optimierung der Startzeit.\u003c/p\u003e\n\n\u003cp\u003eAber natürlich arbeiten die Menschen im OpenJDK daran, beide Punkte zu verbessern. Wie im JVM-Universum üblich, wird das sehr sorgfältig und stets mit dem Blick auf Rückwärtskompatibilität angegangen. Dadurch dauert das in Summe länger, geht aber stetig voran und sorgt bei uns Anwendungsentwickelnden für weniger Frust.\u003c/p\u003e\n\n\u003cp\u003eIn diesem Artikel wollen wir uns deswegen die bereits gemachten Schritte und jene, die noch kommen, anschauen. Doch vorher wollen wir uns einen kurzen Überblick verschaffen, was eigentlich so alles beim Start einer Anwendung auf der JVM passiert.\u003c/p\u003e\n\n\u003ch3 id=\"was-geschieht-beim-start\"\u003eWas geschieht beim Start?\u003c/h3\u003e\n\n\u003cp\u003eNeben dem Auswerten von Programmargumenten und weiteren JVM-Optionen und dem Starten von Komponenten wie dem Garbage Collector verbringt die JVM einen Großteil des Starts damit, die für die Anwendung benötigten Klassen zu laden. Dabei folgt diese den in Kapitel 5 der \u003ca href=\"https://docs.oracle.com/javase/specs/jvms/se24/html/index.html\"\u003eJVM-Spezifikation\u003c/a\u003e definierten Regeln und Anforderungen. Dort ist genau spezifiziert, wie Klassen geladen, gelinkt und initialisiert werden. Das Verhalten ist somit auf allen JVMs identisch, auch wenn die Implementierungen im Detail variieren können.\u003c/p\u003e\n\n\u003cp\u003eFür das Laden von Klassen werden \u003cem\u003eClassLoader\u003c/em\u003e verwendet. Diese sind dabei hierarchisch aufgebaut, das bedeutet, jeder \u003cem\u003eClassLoader\u003c/em\u003e besitzt einen weiteren \u003cem\u003eClassLoader\u003c/em\u003e als Parent. Das Vorgehen ist dabei, dass ein \u003cem\u003eClassLoader\u003c/em\u003e, bevor er selbst eine Klasse lädt, seinen Parent bittet, diese zu laden. Nur wenn diese Bitte nicht erfüllt werden kann, ist es seine Aufgabe, die Klasse zu laden. An der Spitze dieser Hirarchie gibt es den Bootstrap-\u003cem\u003eClassLoader\u003c/em\u003e, der in Kapitel 5.3.1 definiert ist.\u003c/p\u003e\n\n\u003cp\u003eBeim Laden an sich geht es darum, die binäre Repräsentation einer Klasse – das, was in einer \u003cem\u003eclass\u003c/em\u003e-Datei zu finden ist – einzulesen. Anschließend wird daraus die dort definierte Klasse oder das Interface abgeleitet und in die Method Area (in der HotSpot JVM auch Metaspace genannt) geladen. Beim Start der JVM werden nicht alle Klassen automatisch geladen, sondern – ausgehend von der Hauptklasse – nur jene, die tatsächlich benötigt und referenziert werden.\u003c/p\u003e\n\n\u003cp\u003eNach dem Laden muss die Klasse noch gelinkt werden. Dieser Schritt besteht wiederum aus den drei größeren Aufgaben der \u003cstrong\u003eVerifikation\u003c/strong\u003e, \u003cstrong\u003eVorbereitung\u003c/strong\u003e (Preparation) und \u003cstrong\u003eAuflösung\u003c/strong\u003e (Resolution).\u003c/p\u003e\n\n\u003cp\u003eBei der \u003cstrong\u003eVerifikation\u003c/strong\u003e überprüft die JVM die vorher geladene Klasse auf ihre strukturelle Korrektheit. Dabei gilt es, eine ganze Reihe von Dingen zu überprüfen (Kapitel 4.9.1 und 4.9.2 der JVM-Spezifikation). Zusätzlich gilt es sicherzustellen, dass es für eine finale Klasse keine Subklasse gibt und dass finale Methoden nicht überschrieben wurden. Außerdem muss sichergestellt werden, dass alle Klassen außer \u003cem\u003ejava.lang.Object\u003c/em\u003e eine direkte Elternklasse haben. Wird dabei eine nicht valide Klasse gefunden, wird ein \u003cem\u003eVerifyError\u003c/em\u003e geworfen.\u003c/p\u003e\n\n\u003cp\u003eNach der erfolgreichen Verifikation kann die Klasse \u003cstrong\u003evorbereitet\u003c/strong\u003e werden. In diesem Schritt werden sämtliche statischen Felder erzeugt und auf ihren Standardwert gesetzt. So würde ein Feld mit der Deklaration \u003ccode\u003estatic int foo = 42\u003c/code\u003e nach diesem Schritt mit dem Wert \u003ccode\u003e0\u003c/code\u003e initialisiert sein.\u003c/p\u003e\n\n\u003cp\u003eAls Nächstes folgt die \u003cstrong\u003eAuflösung\u003c/strong\u003e. Hierbei werden die symbolischen Referenzen, die in der binären Repräsentation einer Klasse vorhanden sind, aufgelöst. Auch wenn diese drei Unteraufgaben hier nacheinander aufgeführt sind, kann die tatsächliche Reihenfolge der Ausführung abweichen.\u003c/p\u003e\n\n\u003cp\u003eNach dem Linken kann die Klasse initialisiert werden. Dabei werden die deklarierten Werte von statischen Feldern oder Klassenvariablen gesetzt und statische Initializer ausgeführt. Dazu wird die automatisch vom Compiler generierte Methode \u003cem\u003e\u0026lt;clinit\u0026gt;\u003c/em\u003e von der JVM aufgerufen. Diese gibt es jedoch nicht immer, sondern wirklich nur dann, wenn der Bedarf besteht.\u003c/p\u003e\n\n\u003cp\u003eDa die JVM in der Regel mehrere Hundert Klassen laden muss, dauert dieser Prozess mitunter eine Weile. Und somit ist er natürlich auch ein Kandidat, um durch Optimierungen die Startzeit deutlich zu reduzieren. Wie das gelingt, schauen wir uns im Folgenden an.\u003c/p\u003e\n\n\u003ch3 id=\"class-data-sharing\"\u003eClass Data Sharing\u003c/h3\u003e\n\n\u003cp\u003eClass Data Sharing, abgekürzt als \u003cstrong\u003eCDS\u003c/strong\u003e, wurde schon vor langer Zeit mit einem Update für JDK 5 in der HotSpot JVM eingeführt. Die Idee dahinter ist es, dass es eine Reihe von Klassen im JDK gibt wie die aus \u003cem\u003ejava.lang\u003c/em\u003e, die eigentlich jede Anwendung benötigt. Anstatt sie bei jedem Start zu laden, wird bei CDS einmalig ein Archiv erzeugt, in dem diese Klassen fertig geladen liegen. Beim nächsten Start wird das Archiv erkannt und die Klassen werden direkt von dort in den Arbeitsspeicher gepackt. Somit entfällt ein Schritt und die JVM startet schneller. Um dieses Feature zu aktivieren, musste es jedoch lange aktiv angeschaltet werden. Erst mit \u003ca href=\"https://openjdk.org/projects/jdk/12/\"\u003eJDK 12\u003c/a\u003e und \u003ca href=\"https://openjdk.org/jeps/341\"\u003eJEP 341\u003c/a\u003e wurde für CDS eingeführt, dass das Archiv bereits mit im JDK ausgeliefert und beim Start der JVM standardmäßig mitgeladen wird.\u003c/p\u003e\n\n\u003cp\u003eZeitweise war es außerdem möglich, dass mehrere JVMs, die auf demselben System laufen, sich den Speicherbereich dieser Klassen teilen. Somit wurde auch der Speicherverbrauch etwas reduziert. Da das heute aber oft keine Rolle mehr spielt, wird dieser Aspekt nicht mehr primär unterstützt.\u003c/p\u003e\n\n\u003ch3 id=\"application-class-data-sharing\"\u003eApplication Class Data Sharing\u003c/h3\u003e\n\n\u003cp\u003eDurch CDS wurde zwar die Startzeit reduziert, aber gerade bei großen Anwendungen ist der Anteil der Klassen aus dem JDK nur sehr gering, ein Großteil sind von uns selbst geschriebene oder vom genutzten Framework mitgebrachte Klassen. Angesichts dessen wurde mit \u003ca href=\"https://openjdk.org/projects/jdk/10/\"\u003eJDK 10\u003c/a\u003e und \u003ca href=\"https://openjdk.org/jeps/310\"\u003eJEP 310\u003c/a\u003e Application Class Data Sharing (\u003cstrong\u003eAppCDS\u003c/strong\u003e) eingeführt. Im Grunde handelt es sich hierbei um eine Erweiterung von CDS, um neben den JDK-eigenen auch die Klassen der Anwendung mit in das Archiv zu legen. Das geschieht allerdings nicht automatisch, sondern wir müssen das Feature selbst aktiv anschalten und benötigen zudem einen Trainingslauf. Dabei werden die geladenen Klassen erkannt und in eine Datei geschrieben. Mit einem zweiten Lauf kann aus dieser Liste dann ein Archiv mit den geladenen Klassen erzeugt werden.\u003c/p\u003e\n\n\u003cp\u003eFür den Trainingslauf nutzen wir die Optionen \u003ccode\u003e-Xshare:off\u003c/code\u003e und \u003ccode\u003e-XX:DumpLoadedClassList=./application.lst\u003c/code\u003e. Bei älteren JDKs muss zudem noch die Option \u003ccode\u003e-XX:+UseAppCDS\u003c/code\u003e gesetzt werden. Abhängig von der Anwendung können wir diese direkt nach dem Start wieder beenden oder vorher noch Aktionen ausführen, damit die Anwendung wirklich alle Klassen lädt. Beim Beenden der JVM schreibt diese die Liste aller geladener Klassen in die angegebene Datei (Listing 1).\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"Listing 1: Liste der geladenen Klassen nach dem Trainingslauf\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e# NOTE: Do not modify this file.\r\n#\r\n# This file is generated via the -XX:DumpLoadedClassList=\u0026lt;class_list_file\u0026gt; option\r\n# and is used at CDS archive dump time (see -Xshare:dump).\r\n#\r\njava/lang/Object id: 0\r\njava/io/Serializable id: 1\r\njava/lang/Comparable id: 2\r\njava/lang/CharSequence id: 3\r\njava/lang/constant/Constable id: 4\r\n...\r\norg/springframework/util/ConcurrentReferenceHashMap$EntrySet id: 5293\r\norg/springframework/util/ConcurrentReferenceHashMap$EntryIterator id: 5294\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cfigcaption\u003eListing 1: Liste der geladenen Klassen nach dem Trainingslauf\u003c/figcaption\u003e\u003c/figure\u003e\n\n\u003cp\u003eDiese Liste können wir nun nutzen, um mit den Optionen \u003ccode\u003e-Xshare:dump\u003c/code\u003e, \u003ccode\u003e-XX:SharedClassListFile=./application.lst\u003c/code\u003e und \u003ccode\u003e-XX:SharedArchiveFile=./application.jsa\u003c/code\u003e daraus ein statisches Archiv mit den geladenen Klassen zu erzeugen. Das Archiv kann beim Starten der Anwendung mit der Option \u003ccode\u003e-XX:SharedArchiveFile=./application.jsa\u003c/code\u003e referenziert werden. Die JVM wird nun eben nicht nur wie mit CDS die JDK-eigenen Klassen direkt in den Arbeitsspeicher legen, sondern eben auch die Klassen der Anwendung. Somit verringert sich die Startzeit der Anwendung noch einmal deutlich.\u003c/p\u003e\n\n\u003cp\u003eUm die Nutzung von AppCDS weiter zu vereinfachen, wurden in \u003ca href=\"https://openjdk.org/projects/jdk/13/\"\u003eJDK 13\u003c/a\u003e mit \u003ca href=\"https://openjdk.org/jeps/350\"\u003eJEP 350\u003c/a\u003e dynamische Archive eingeführt. Um ein solches zu erstellen, nutzen wir beim Trainingslauf nur noch die Option \u003ccode\u003e-XX:ArchiveClassesAtExit=./application.jsa\u003c/code\u003e. Dadurch wird das fertige Archiv direkt im Trainingslauf erzeugt. Wir sparen uns somit das Erzeugen der Klassenliste und deren separate Umwandlung in das Archiv.\u003c/p\u003e\n\n\u003cp\u003eDamit das alles funktioniert, gibt es allerdings zwei kleine Einschränkungen. Zum einen muss der Classpath, inklusive Reihenfolge, zwischen dem Trainingslauf und dem eigentlichen Start identisch sein, und zum zweiten muss dasselbe JDK genutzt werden. Zum Glück sind beide Bedingungen bei der Nutzung von Containern leicht umzusetzen, beispielsweise mit einem Multi-Stage Dockerfile, wie es in Listing 2 zu sehen ist.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"Listing 2: Multi-Stage Dockerfile für AppCDS einer Spring-Boot-Anwendung\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"k\"\u003eFROM\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003eopenjdk:24-slim\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"k\"\u003eAS\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003ebuilder\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003e\u003cspan class=\"nb\"\u003emkdir\u003c/span\u003e \u003cspan class=\"nt\"\u003e-p\u003c/span\u003e /app\r\n\u003cspan class=\"k\"\u003eWORKDIR\u003c/span\u003e\u003cspan class=\"s\"\u003e /app\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e ./target/jvm-startup.jar jvm-startup.jar\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003ejava \u003cspan class=\"nt\"\u003e-Djarmode\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003etools \u003cspan class=\"nt\"\u003e-jar\u003c/span\u003e jvm-startup.jar extract \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    --layers\r\n\r\n\r\n\u003cspan class=\"k\"\u003eFROM\u003c/span\u003e\u003cspan class=\"s\"\u003e openjdk:24-slim\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCMD\u003c/span\u003e\u003cspan class=\"s\"\u003e [ \"java\", \\\r\u003c/span\u003e\n      \"-XX:SharedArchiveFile=./application.jsa\", \\\r\n      \"-jar\", \\\r\n      \"jvm-startup.jar\"]\r\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003e\u003cspan class=\"nb\"\u003emkdir\u003c/span\u003e \u003cspan class=\"nt\"\u003e-p\u003c/span\u003e /app \u003cspan class=\"o\"\u003e\u0026amp;\u0026amp;\u003c/span\u003e \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    chown -R daemon /app\r\n\r\n\u003cspan class=\"k\"\u003eUSER\u003c/span\u003e\u003cspan class=\"s\"\u003e daemon\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eWORKDIR\u003c/span\u003e\u003cspan class=\"s\"\u003e /app\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/dependencies/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/spring-boot-loader/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/snapshot-dependencies/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/application/ ./\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003ejava \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    -Dspring.context.exit=onRefresh \\\r\n    -XX:ArchiveClassesAtExit=./application.jsa \\\r\n    -jar jvm-startup.jar\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cfigcaption\u003eListing 2: Multi-Stage Dockerfile für AppCDS einer Spring-Boot-Anwendung\u003c/figcaption\u003e\u003c/figure\u003e\n\n\u003cp\u003eDa es sich im Beispiel um eine Spring-Boot-Anwendung handelt, besteht der erste Trick darin, das Executable-JAR zu entpacken. Hierzu nutzen wir die im JAR enthaltenen jarmode-Tools und das Kommando \u003cem\u003eextract\u003c/em\u003e mit der Option \u003cem\u003e–layers\u003c/em\u003e in der \u003cstrong\u003eersten Stage\u003c/strong\u003e. Dadurch entsteht die Listing 3 gezeigte Struktur.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"Listing 3: Struktur der entpackten Spring-Boot-Anwendung\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"nv\"\u003e$ \u003c/span\u003etree jvm-startup\r\njvm-startup\r\n├── application\r\n│   └── jvm-startup.jar\r\n├── dependencies\r\n│   └── lib\r\n│       ├── jackson-annotations-2.19.2.jar\r\n│       ├── jackson-core-2.19.2.jar\r\n...\r\n│       ├── tomcat-embed-el-10.1.43.jar\r\n│       └── tomcat-embed-websocket-10.1.43.jar\r\n├── snapshot-dependencies\r\n└── spring-boot-loader\r\n\r\n6 directories, 31 files\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cfigcaption\u003eListing 3: Struktur der entpackten Spring-Boot-Anwendung\u003c/figcaption\u003e\u003c/figure\u003e\n\n\u003cp\u003eIn der \u003cstrong\u003ezweiten Stage\u003c/strong\u003e kopieren wir nun die Struktur aus der ersten in unser Image und starten anschließen mit \u003ccode\u003eRUN\u003c/code\u003e den Trainingslauf, um das AppCDS-Archiv zu erzeugen. Hierzu nutzen wir die Option \u003ccode\u003e-Dspring.context.exit=onRefresh\u003c/code\u003e. Sie sorgt dafür, dass die Anwendung automatisch heruntergefahren wird, nachdem der Spring-Kontext aufgebaut und geladen wurde. An dieser Stelle sind bereits die meisten Klassen geladen und befinden sich nun erfolgreich im Archiv.\u003c/p\u003e\n\n\u003cp\u003eMit \u003ca href=\"https://openjdk.org/projects/jdk/19/\"\u003eJDK 19\u003c/a\u003e wurde in \u003ca href=\"https://bugs.openjdk.org/browse/JDK-8261455\"\u003eJDK-8261455\u003c/a\u003e zudem die Option \u003ccode\u003e-XX:+AutoCreateSharedArchive\u003c/code\u003e hinzugefügt. Nutzen wir diese in Verbindung mit \u003ccode\u003e-XX:SharedArchiveFile\u003c/code\u003e wird, wenn kein Archiv vorhanden ist, beim Beenden der JVM eines geschrieben, und falls ein Archiv vorhanden ist, wird es genutzt.\u003c/p\u003e\n\n\u003ch3 id=\"project-leyden\"\u003eProject Leyden\u003c/h3\u003e\n\n\u003cp\u003eInnerhalb des OpenJDK werden größere Bemühungen in separaten Projekten umgesetzt. Eines davon ist \u003cstrong\u003e\u003ca href=\"https://openjdk.org/projects/leyden/\"\u003eLeyden\u003c/a\u003e\u003c/strong\u003e. Sein Ziel besteht darin, die Startzeit, die Zeit bis zur Peak-Performance und den Fußabdruck der JVM zu verringern. Wie für OpenJDK-Projekte üblich, geschieht das nicht in einem riesigen Wurf, sondern als eine Reihe kleinerer JEPs, durch die man sich Schritt für Schritt dem großen Ziel annähert.\u003c/p\u003e\n\n\u003cp\u003eIn diesem Rahmen hat Projekt Leyden bereits im \u003ca href=\"https://openjdk.org/projects/jdk/24/\"\u003eJDK 24\u003c/a\u003e mit dem \u003ca href=\"https://openjdk.org/jeps/483\"\u003eJEP 483\u003c/a\u003e ein erstes Feature geliefert. Es baut auf den Schultern von CDS bzw. AppCDS auf und verringert die Startzeit damit erneut ein ganzes Stück. Hierzu wird nicht mehr nur die binäre Repräsentation der Klassen der Anwendung direkt in den Arbeitsspeicher geladen, sondern auch der Schritt des Linkens wird mit im Archiv abgelegt. Ähnlich wie AppCDS müssen wir hierzu allerdings auch einen Trainingslauf durchführen, um die benötigten Daten aufzuzeichnen. Hierzu wird die Anwendung mit den beiden Optionen \u003ccode\u003e-XX:AOTMode=record\u003c/code\u003e und \u003ccode\u003e-XX:AOTConfiguration=./application.aotconf\u003c/code\u003e ausgeführt und anschließend beendet.\u003c/p\u003e\n\n\u003cp\u003eDie so geschriebene Konfiguration müssen wir noch mit einem zweiten Aufruf in das eigentliche Archiv konvertieren. Hierzu müssen drei weitere Optionen, nämlich \u003ccode\u003e-XX:AOTMode=create\u003c/code\u003e, \u003ccode\u003e-XX:AOTConfiguration=./application.aotconf\u003c/code\u003e und \u003ccode\u003e-XX:AOTCache=./application.aot\u003c/code\u003e gesetzt werden.\u003c/p\u003e\n\n\u003cp\u003eAnschließend lässt sich das so erzeugte Archiv beim Start der Anwendung mit der Option \u003ccode\u003e-XX:AOTCache=./application.aot\u003c/code\u003e nutzen. In Listing 4 ist ein hierfür nutzbares Multi-Stage Dockerfile zu sehen.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"Listing 4: Multi-Stage Dockerfile für­Leyden AOT einer Spring-Boot-Anwendung mit JDK 24\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"k\"\u003eFROM\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003eopenjdk:24-slim\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"k\"\u003eAS\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003ebuilder\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003e\u003cspan class=\"nb\"\u003emkdir\u003c/span\u003e \u003cspan class=\"nt\"\u003e-p\u003c/span\u003e /app\r\n\u003cspan class=\"k\"\u003eWORKDIR\u003c/span\u003e\u003cspan class=\"s\"\u003e /app\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e ./target/jvm-startup.jar jvm-startup.jar\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003ejava \u003cspan class=\"nt\"\u003e-Djarmode\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003etools \u003cspan class=\"nt\"\u003e-jar\u003c/span\u003e jvm-startup.jar extract \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    --layers\r\n\r\n\r\n\u003cspan class=\"k\"\u003eFROM\u003c/span\u003e\u003cspan class=\"s\"\u003e openjdk:24-slim\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCMD\u003c/span\u003e\u003cspan class=\"s\"\u003e [ \"java\", \\\r\u003c/span\u003e\n      \"-XX:AOTCache=./application.aot\", \\\r\n      \"-jar\", \\\r\n      \"jvm-startup.jar\"]\r\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003e\u003cspan class=\"nb\"\u003emkdir\u003c/span\u003e \u003cspan class=\"nt\"\u003e-p\u003c/span\u003e /app \u003cspan class=\"o\"\u003e\u0026amp;\u0026amp;\u003c/span\u003e \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    chown -R daemon /app\r\n\r\n\u003cspan class=\"k\"\u003eUSER\u003c/span\u003e\u003cspan class=\"s\"\u003e daemon\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eWORKDIR\u003c/span\u003e\u003cspan class=\"s\"\u003e /app\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/dependencies/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/spring-boot-loader/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/snapshot-dependencies/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/application/ ./\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003ejava \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    -Dspring.context.exit=onRefresh \\\r\n    -XX:AOTMode=record -XX:AOTConfiguration=./application.aotconf \\\r\n    -jar jvm-startup.jar\r\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003ejava \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    -XX:AOTMode=create -XX:AOTConfiguration=./application.aotconf\r\n    -XX:AOTCache=./application.aot \\\r\n    -jar jvm-startup.jar\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cfigcaption\u003eListing 4: Multi-Stage Dockerfile für­Leyden AOT einer Spring-Boot-Anwendung mit JDK 24\u003c/figcaption\u003e\u003c/figure\u003e\n\n\u003cp\u003eIn \u003ca href=\"https://openjdk.org/projects/jdk/25/\"\u003eJDK 25\u003c/a\u003e wird dieses Feature durch zwei weitere JEPs, \u003ca href=\"https://openjdk.org/jeps/514\"\u003e514\u003c/a\u003e und \u003ca href=\"https://openjdk.org/jeps/515\"\u003e515\u003c/a\u003e, erweitert.\u003c/p\u003e\n\n\u003cp\u003eJEP 514 optimiert dabei die Benutzung des Features und die benötigten Optionen, ähnlich wie bei AppCDS. Wir brauchen damit nur noch einen Trainingslauf, um direkt das Archiv zu erzeugen. Der Zwischenschritt über die Konfiguration entfällt. Es wird nur noch die Option \u003ccode\u003e-XX:AOTCacheOutput=./application.aot\u003c/code\u003e benötigt.\u003c/p\u003e\n\n\u003cp\u003eMit JEP 515 wird zusätzlich dafür gesorgt, dass während des Trainingslaufs aufgezeichnete Informationen über Methoden mit in das Archiv geschrieben werden. Hierdurch kann die JVM beim erneuten Lauf früher feststellen, welche Methoden der JIT-Compiler wie optimieren kann. Somit wird die Methode deutlich früher beispielsweise in Maschinencode übersetzt und weniger oft von der JVM selbst interpretiert – und die Anwendung erreicht deutlich schneller ihre Peak-Performance. Selbstverständlich werden trotzdem weiterhin Profilinginformationen aufgezeichnet, schließlich kann sich der reale Workload von dem des Trainingslaufes unterscheiden. Somit kann die JVM zur richtigen Laufzeit immer noch weitere Optimierungen vornehmen oder alte verwerfen.\u003c/p\u003e\n\n\u003cp\u003eDarüber hinaus gibt es im aktuellen \u003ca href=\"https://github.com/openjdk/leyden/blob/leyden-ea1-release-notes/README.md\"\u003eEarly-Access-Release\u003c/a\u003e von Projekt Leyden bereits weitere Features, die es in Zukunft vermutlich als JEPs mit ins JDK schaffen werden. Dazu zählt vor allem, weitere Dinge mit in das Archiv zu packen, um noch mehr Arbeit bereits vor dem eigentlichen Start der Anwendung zu erledigen. Hierzu zählt das Erzeugen von Dynamic Proxies, Informationen über Zugriffe per Reflection und sogar beim Trainingslauf bereits vom JIT kompilierten Code mit in das Archiv zu legen. Außerdem ist mit \u003ca href=\"https://openjdk.org/jeps/516\"\u003eJEP 516\u003c/a\u003e bereits ein JEP in Vorbereitung, das die Unterstützung der bisherigen Features mit allen Garbage Collectors sicherstellen soll.\u003c/p\u003e\n\n\u003ch3 id=\"coordinated-restore-at-checkpoint\"\u003eCoordinated Restore at Checkpoint\u003c/h3\u003e\n\n\u003cp\u003eNeben Projekt Leyden gibt es aktuell noch ein zweites Projekt, dass sich mit der Optimierung von Startzeiten beschäftigt, nämlich \u003cstrong\u003e\u003ca href=\"https://openjdk.org/projects/crac/\"\u003eProjekt CRaC\u003c/a\u003e\u003c/strong\u003e. Es verfolgt einen vollkommen anderen Weg, nämlich die Unterstützung sogenannter Checkpoints. Daher kommt auch der Name des Projektes: CRaC ist die Abkürzung von Coordinated Restore at Checkpoint.\u003c/p\u003e\n\n\u003cp\u003eDie Idee ist, dass ein Signal an eine laufende JVM-Anwendung geschickt werden kann, um einen Checkpoint zu erzeugen. Ein Checkpoint ist dabei eine Art Image oder Snapshot der laufenden Anwendung zum Zeitpunkt des Signals. Darin ist der gesamte Zustand der Anwendung vorhanden und nicht nur, wie in den vorherigen Ansätzen, Metadaten. Somit sind auch Dinge wie die Datenbankverbindungen enthalten. Das ist natürlich nicht wirklich praktisch, denn es würde bedeuten, dass ich zum Erzeugen des Checkpoints zwangsläufig bereits mit der richtigen Datenbank verbunden sein müsste. Deshalb ist eine der Aufgaben des Projektes, ein Java API zu entwickeln, mit dem die Anwendung auf das Checkpoint-Signal reagieren und Aufräumarbeiten durchführen kann. Außerdem wird dann für das Restore auch ein API benötigt, um beim Start aus so einem Checkpoint den Zustand wiederherzustellen. Der aktuelle Stand des API ist in Listing 5 zu sehen.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"Listing 5: CRaC Java API\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"kd\"\u003epublic\u003c/span\u003e \u003cspan class=\"kd\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eSomeResource\u003c/span\u003e \u003cspan class=\"kd\"\u003eimplements\u003c/span\u003e \u003cspan class=\"nc\"\u003eResource\u003c/span\u003e \u003cspan class=\"o\"\u003e{\u003c/span\u003e\r\n\r\n    \u003cspan class=\"kd\"\u003epublic\u003c/span\u003e \u003cspan class=\"nf\"\u003eSomeResource\u003c/span\u003e\u003cspan class=\"o\"\u003e()\u003c/span\u003e \u003cspan class=\"o\"\u003e{\u003c/span\u003e\r\n        \u003cspan class=\"nc\"\u003eCore\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003egetGlobalContext\u003c/span\u003e\u003cspan class=\"o\"\u003e().\u003c/span\u003e\u003cspan class=\"na\"\u003eregister\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"nc\"\u003eSomeResource\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003ethis\u003c/span\u003e\u003cspan class=\"o\"\u003e);\u003c/span\u003e\r\n    \u003cspan class=\"o\"\u003e}\u003c/span\u003e\r\n\r\n    \u003cspan class=\"nd\"\u003e@Override\u003c/span\u003e\r\n    \u003cspan class=\"kd\"\u003epublic\u003c/span\u003e \u003cspan class=\"kt\"\u003evoid\u003c/span\u003e \u003cspan class=\"nf\"\u003ebeforeCheckpoint\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\r\n            \u003cspan class=\"nc\"\u003eContext\u003c/span\u003e\u003cspan class=\"o\"\u003e\u0026lt;?\u003c/span\u003e \u003cspan class=\"kd\"\u003eextends\u003c/span\u003e \u003cspan class=\"nc\"\u003eResource\u003c/span\u003e\u003cspan class=\"o\"\u003e\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003econtext\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e \u003cspan class=\"kd\"\u003ethrows\u003c/span\u003e \u003cspan class=\"nc\"\u003eException\u003c/span\u003e \u003cspan class=\"o\"\u003e{\u003c/span\u003e\r\n        \u003cspan class=\"nc\"\u003eSystem\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003eout\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003eprintln\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"s\"\u003e\"Before Checkpointing\"\u003c/span\u003e\u003cspan class=\"o\"\u003e);\u003c/span\u003e\r\n    \u003cspan class=\"o\"\u003e}\u003c/span\u003e\r\n\r\n    \u003cspan class=\"nd\"\u003e@Override\u003c/span\u003e\r\n    \u003cspan class=\"kd\"\u003epublic\u003c/span\u003e \u003cspan class=\"kt\"\u003evoid\u003c/span\u003e \u003cspan class=\"nf\"\u003eafterRestore\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\r\n            \u003cspan class=\"nc\"\u003eContext\u003c/span\u003e\u003cspan class=\"o\"\u003e\u0026lt;?\u003c/span\u003e \u003cspan class=\"kd\"\u003eextends\u003c/span\u003e \u003cspan class=\"nc\"\u003eResource\u003c/span\u003e\u003cspan class=\"o\"\u003e\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003econtext\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e \u003cspan class=\"kd\"\u003ethrows\u003c/span\u003e \u003cspan class=\"nc\"\u003eException\u003c/span\u003e \u003cspan class=\"o\"\u003e{\u003c/span\u003e\r\n        \u003cspan class=\"nc\"\u003eSystem\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003eout\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003eprintln\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"s\"\u003e\"After Checkpointing\"\u003c/span\u003e\u003cspan class=\"o\"\u003e);\u003c/span\u003e\r\n    \u003cspan class=\"o\"\u003e}\u003c/span\u003e\r\n\u003cspan class=\"o\"\u003e}\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cfigcaption\u003eListing 5: CRaC Java API\u003c/figcaption\u003e\u003c/figure\u003e\n\n\u003cp\u003eIm Grunde besteht das zu implementierende API aus dem Interface \u003cem\u003eResource\u003c/em\u003e. Es enthält zwei Methoden, die es ermöglichen, Dinge beim Erstellen des Checkpoints und vor dem Wiederherstellen aus einem Snapshot auszuführen. Außerdem müssen wir die Implementierung noch im globalen Kontext registrieren, damit die Methoden an den passenden Stellen von der JVM aufgerufen werden.\u003c/p\u003e\n\n\u003cp\u003eAktuell ist das Erzeugen eines solchen Snapshots noch relativ kompliziert und nicht so einfach automatisierbar. Basierend auf einer Spring-Boot-Anwendung und dem Dockerfile aus Listing 6 gelingt das mit den folgenden Schritten.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"Listing 6: Dockerfile für CraC\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"k\"\u003eFROM\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003ebellsoft/liberica-runtime-container:jdk-21-crac-glibc\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"k\"\u003eAS\u003c/span\u003e\u003cspan class=\"w\"\u003e \u003c/span\u003e\u003cspan class=\"s\"\u003ebuilder\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003e\u003cspan class=\"nb\"\u003emkdir\u003c/span\u003e \u003cspan class=\"nt\"\u003e-p\u003c/span\u003e /app\r\n\u003cspan class=\"k\"\u003eWORKDIR\u003c/span\u003e\u003cspan class=\"s\"\u003e /app\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e ./target/jvm-startup.jar jvm-startup.jar\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003ejava \u003cspan class=\"nt\"\u003e-Djarmode\u003c/span\u003e\u003cspan class=\"o\"\u003e=\u003c/span\u003etools \u003cspan class=\"nt\"\u003e-jar\u003c/span\u003e jvm-startup.jar extract \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    --layers\r\n\r\n\r\n\u003cspan class=\"k\"\u003eFROM\u003c/span\u003e\u003cspan class=\"s\"\u003e bellsoft/liberica-runtime-container:jdk-21-crac-glibc\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCMD\u003c/span\u003e\u003cspan class=\"s\"\u003e [ \"java\", \\\r\u003c/span\u003e\n      \"-XX:CRaCRestoreFrom=./crac-image/\", \\\r\n      \"-jar\", \\\r\n      \"jvm-startup.jar\"]\r\n\r\n\u003cspan class=\"k\"\u003eRUN \u003c/span\u003e\u003cspan class=\"nb\"\u003emkdir\u003c/span\u003e \u003cspan class=\"nt\"\u003e-p\u003c/span\u003e /app \u003cspan class=\"o\"\u003e\u0026amp;\u0026amp;\u003c/span\u003e \u003cspan class=\"se\"\u003e\\\r\u003c/span\u003e\n    chown -R daemon /app\r\n\r\n\u003cspan class=\"k\"\u003eUSER\u003c/span\u003e\u003cspan class=\"s\"\u003e daemon\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eWORKDIR\u003c/span\u003e\u003cspan class=\"s\"\u003e /app\r\u003c/span\u003e\n\r\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/dependencies/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/spring-boot-loader/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/snapshot-dependencies/ ./\r\u003c/span\u003e\n\u003cspan class=\"k\"\u003eCOPY\u003c/span\u003e\u003cspan class=\"s\"\u003e --from=builder /app/jvm-startup/application/ ./\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cfigcaption\u003eListing 6: Dockerfile für CraC\u003c/figcaption\u003e\u003c/figure\u003e\n\n\u003cp\u003eZuerst bauen wir aus dem Dockerfile mit \u003ccode\u003edocker build -t jvm-startup:crac-pre .\u003c/code\u003e ein erstes Image, das wir mit \u003ccode\u003edocker run --rm -it –privileged -p 8080:8080 --name crac jvm-startup:crac-prep /bin/sh\u003c/code\u003e starten können.\u003c/p\u003e\n\n\u003cp\u003eIn diesem Container können wir nun den Trainingslauf mit dem Befehl \u003ccode\u003ejava -XX:CRaCCheckpointTo=./crac-image/ -jar jvm-startup.jar\u003c/code\u003e starten. Nachdem wir nun die laufende Anwendung mit einigen HTTP Requests aufgewärmt haben, müssen wir das Signal für den Checkpoint an den Prozess senden. Hierzu verbinden wir uns mittels des Befehls \u003ccode\u003edocker exec -it crac /bin/sh\u003c/code\u003e mit dem laufenden Container. Hier können wir nun mit \u003ccode\u003ejcmd jvm-startup.jar JDK.checkpoint\u003c/code\u003e das Signal schicken, woraufhin die Anwendung den Snapshot erzeugt.\u003c/p\u003e\n\n\u003cp\u003eDamit der Snapshot mit in das finale Container-Image gelangt, nutzen wir außerhalb des Containers den Befehl \u003ccode\u003edocker commit crac jvm-startup:crac-run\u003c/code\u003e. Anschließend können wir mittels \u003ccode\u003edocker run --rm -p 8080:8080 jvm-startup:crac-run\u003c/code\u003e die Anwendung wieder starten. Da sie nun den Snapshot nutzt, startet sie nicht nur schnell, sondern der erste Request wird auch sofort beantwortet.\u003c/p\u003e\n\n\u003cp\u003eAktuell gibt es nur zwei mir bekannte JVMs mit Support für CRaC, nämlich \u003ca href=\"https://www.azul.com/downloads\"\u003eAzul Zulu\u003c/a\u003e und \u003ca href=\"https://bell-sw.com/pages/downloads\"\u003eBellsoft Liberica\u003c/a\u003e. Zudem gibt es in \u003ca href=\"https://developer.ibm.com/languages/semeru-runtimes/\"\u003eIBM Semeru\u003c/a\u003e mit CRIU einen Mechanismus, der identisch funktioniert. Bei allen JVMs ist für die Nutzung zudem Linux als unterliegende Plattform notwendig. Unter Windows funktioniert es bisher grundsätzlich nicht.\u003c/p\u003e\n\n\u003ch3 id=\"sonstiges-im-jdk\"\u003eSonstiges im JDK\u003c/h3\u003e\n\n\u003cp\u003eNeben den bisher vorgestellten expliziten Mechanismen wird im JDK konstant daran gearbeitet, die Performanz zu erhöhen. So ist mir kürzlich erst aufgefallen, dass ein Microbenchmark, die ich vor etwa fünf Jahren noch auf \u003ca href=\"https://openjdk.org/projects/jdk/14/\"\u003eJDK 14\u003c/a\u003e geschrieben habe, um die Geschwindigkeit von \u003ccode\u003etoUpperCase\u003c/code\u003e und \u003ccode\u003etoLowercase\u003c/code\u003e zu vergleichen (Listing 7), mittlerweile deutlich schneller ist als damals.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"Listing 7: JMH Benchmark für toLowerCase und toUpperCase\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"nd\"\u003e@BenchmarkMode\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"nc\"\u003eAverageTime\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e\r\n\u003cspan class=\"nd\"\u003e@OutputTimeUnit\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"no\"\u003eNANOSECONDS\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e\r\n\u003cspan class=\"nd\"\u003e@Fork\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"mi\"\u003e1\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e\r\n\u003cspan class=\"nd\"\u003e@Warmup\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eiterations\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"mi\"\u003e1\u003c/span\u003e\u003cspan class=\"o\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003etime\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"mi\"\u003e1\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e\r\n\u003cspan class=\"nd\"\u003e@Measurement\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eiterations\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"mi\"\u003e5\u003c/span\u003e\u003cspan class=\"o\"\u003e,\u003c/span\u003e \u003cspan class=\"n\"\u003etime\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"mi\"\u003e5\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e\r\n\u003cspan class=\"nd\"\u003e@Threads\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"mi\"\u003e5\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e\r\n\u003cspan class=\"nd\"\u003e@State\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"nc\"\u003eThread\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e\r\n\u003cspan class=\"kd\"\u003epublic\u003c/span\u003e \u003cspan class=\"kd\"\u003eclass\u003c/span\u003e \u003cspan class=\"nc\"\u003eToUpperOrToLowerCase\u003c/span\u003e \u003cspan class=\"o\"\u003e{\u003c/span\u003e\r\n\r\n    \u003cspan class=\"kd\"\u003epublic\u003c/span\u003e \u003cspan class=\"kd\"\u003estatic\u003c/span\u003e \u003cspan class=\"kt\"\u003evoid\u003c/span\u003e \u003cspan class=\"nf\"\u003emain\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"nc\"\u003eString\u003c/span\u003e\u003cspan class=\"o\"\u003e[]\u003c/span\u003e \u003cspan class=\"n\"\u003eargs\u003c/span\u003e\u003cspan class=\"o\"\u003e)\u003c/span\u003e \u003cspan class=\"kd\"\u003ethrows\u003c/span\u003e \u003cspan class=\"nc\"\u003eException\u003c/span\u003e \u003cspan class=\"o\"\u003e{\u003c/span\u003e\r\n        \u003cspan class=\"kd\"\u003efinal\u003c/span\u003e \u003cspan class=\"nc\"\u003eOptions\u003c/span\u003e \u003cspan class=\"n\"\u003eopt\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"k\"\u003enew\u003c/span\u003e \u003cspan class=\"nc\"\u003eOptionsBuilder\u003c/span\u003e\u003cspan class=\"o\"\u003e()\u003c/span\u003e\r\n            \u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003einclude\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"nc\"\u003eToUpperOrToLowerCase\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003eclass\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003egetSimpleName\u003c/span\u003e\u003cspan class=\"o\"\u003e())\u003c/span\u003e\r\n            \u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003ebuild\u003c/span\u003e\u003cspan class=\"o\"\u003e();\u003c/span\u003e\r\n        \u003cspan class=\"k\"\u003enew\u003c/span\u003e \u003cspan class=\"nf\"\u003eRunner\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"n\"\u003eopt\u003c/span\u003e\u003cspan class=\"o\"\u003e).\u003c/span\u003e\u003cspan class=\"na\"\u003erun\u003c/span\u003e\u003cspan class=\"o\"\u003e();\u003c/span\u003e\r\n    \u003cspan class=\"o\"\u003e}\u003c/span\u003e\r\n\r\n    \u003cspan class=\"kd\"\u003eprivate\u003c/span\u003e \u003cspan class=\"nc\"\u003eString\u003c/span\u003e \u003cspan class=\"n\"\u003estring\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"nc\"\u003eRandomStringUtils\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003erandomAlphabetic\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"mi\"\u003e32\u003c/span\u003e\u003cspan class=\"o\"\u003e);\u003c/span\u003e\r\n\r\n    \u003cspan class=\"nd\"\u003e@Benchmark\u003c/span\u003e\r\n    \u003cspan class=\"kd\"\u003epublic\u003c/span\u003e \u003cspan class=\"nc\"\u003eString\u003c/span\u003e \u003cspan class=\"nf\"\u003etoLowerCase\u003c/span\u003e\u003cspan class=\"o\"\u003e()\u003c/span\u003e \u003cspan class=\"o\"\u003e{\u003c/span\u003e\r\n        \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003estring\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003etoLowerCase\u003c/span\u003e\u003cspan class=\"o\"\u003e();\u003c/span\u003e\r\n    \u003cspan class=\"o\"\u003e}\u003c/span\u003e\r\n\r\n    \u003cspan class=\"nd\"\u003e@Benchmark\u003c/span\u003e\r\n    \u003cspan class=\"kd\"\u003epublic\u003c/span\u003e \u003cspan class=\"nc\"\u003eString\u003c/span\u003e \u003cspan class=\"nf\"\u003etoUpperCase\u003c/span\u003e\u003cspan class=\"o\"\u003e()\u003c/span\u003e \u003cspan class=\"o\"\u003e{\u003c/span\u003e\r\n        \u003cspan class=\"k\"\u003ereturn\u003c/span\u003e \u003cspan class=\"n\"\u003estring\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003etoUpperCase\u003c/span\u003e\u003cspan class=\"o\"\u003e();\u003c/span\u003e\r\n    \u003cspan class=\"o\"\u003e}\u003c/span\u003e\r\n\u003cspan class=\"o\"\u003e}\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cfigcaption\u003eListing 7: JMH Benchmark für toLowerCase und toUpperCase\u003c/figcaption\u003e\u003c/figure\u003e\n\n\u003cp\u003eJDK 25 enthält außerdem mit \u003ca href=\"https://openjdk.org/jeps/502\"\u003eJEP 502\u003c/a\u003e noch ein Preview-Feature, das es uns einfacher macht, Konstanten zu definieren, die nicht bereits beim Start, sondern erst bei der ersten Nutzung initialisiert werden (Listing 8).\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"Listing 8: StableValues aus JDK 25\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e\u003cspan class=\"c1\"\u003e// Deklaration mit Initialisierung von StableValue\r\u003c/span\u003e\n\u003cspan class=\"kd\"\u003eprivate\u003c/span\u003e \u003cspan class=\"kd\"\u003efinal\u003c/span\u003e \u003cspan class=\"nc\"\u003eSupplier\u003c/span\u003e\u003cspan class=\"o\"\u003e\u0026lt;\u003c/span\u003e\u003cspan class=\"nc\"\u003eLogger\u003c/span\u003e\u003cspan class=\"o\"\u003e\u0026gt;\u003c/span\u003e \u003cspan class=\"n\"\u003elogger\u003c/span\u003e \u003cspan class=\"o\"\u003e=\u003c/span\u003e \u003cspan class=\"nc\"\u003eStableValue\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003esupplier\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\r\n    \u003cspan class=\"o\"\u003e()\u003c/span\u003e \u003cspan class=\"o\"\u003e-\u0026gt;\u003c/span\u003e \u003cspan class=\"nc\"\u003eLoggerFactory\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003egetLogger\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"nc\"\u003eMyClass\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003eclass\u003c/span\u003e\u003cspan class=\"o\"\u003e));\u003c/span\u003e\r\n\r\n\u003cspan class=\"c1\"\u003e// Nutzung von StableValue\r\u003c/span\u003e\n\u003cspan class=\"nc\"\u003eLogger\u003c/span\u003e\u003cspan class=\"o\"\u003e.\u003c/span\u003e\u003cspan class=\"na\"\u003eget\u003c/span\u003e\u003cspan class=\"o\"\u003e().\u003c/span\u003e\u003cspan class=\"na\"\u003einfo\u003c/span\u003e\u003cspan class=\"o\"\u003e(\u003c/span\u003e\u003cspan class=\"s\"\u003e\"Hallo\"\u003c/span\u003e\u003cspan class=\"o\"\u003e);\u003c/span\u003e\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\n\u003cfigcaption\u003eListing 8: StableValues aus JDK 25\u003c/figcaption\u003e\u003c/figure\u003e\n\n\u003cp\u003eDadurch sparen wir nicht nur beim Start Zeit, wenn der Wert erst später benötigt wird; außerdem kann die JVM, da sie die Semantik dieses Konstruktes kennt, hier Optimierungen vornehmen – schließlich verhält sich ein solcher Wert nach der Initialisierung wie eine Konstante.\u003c/p\u003e\n\n\u003cp\u003eUnd auch der hohe Arbeitsspeicherbedarf der JVM wird aktuell im \u003cstrong\u003e\u003ca href=\"https://openjdk.org/projects/lilliput/index.html\"\u003eProject Lilliput\u003c/a\u003e\u003c/strong\u003e angegangen. Ziel ist es, die Header der Objekte von aktuell zwischen 96 und 128 Bit auf 64 Bit zu reduzieren. Dadurch verringert sich der benötigte Heap und damit auch der benötigte Arbeitsspeicher. Diese Arbeit wurde bereits mit \u003ca href=\"https://openjdk.org/jeps/450\"\u003eJEP 450\u003c/a\u003e in JDK 24 als experimentelles Feature zur Verfügung gestellt und wird mit \u003ca href=\"https://openjdk.org/jeps/519\"\u003eJEP 519\u003c/a\u003e nun in JDK 25 final in die JVM integriert werden.\u003c/p\u003e\n\u003cp\u003eEiner der aktuellen Hauptkritikpunkte an der JVM ist deren langsamer Start. Wir haben dabei gesehen, dass ein Großteil dieser Zeit für das Laden der Klassen verwendet wird. In der Vergangenheit wurden hier bereits mit CDS und AppCDS Features zur Verfügung gestellt, um diesen Prozess zu beschleunigen.\u003c/p\u003e\n\n\u003cp\u003eUnd die Arbeit ist – natürlich – noch nicht beendet. In naher Zukunft werden verschiedene Features aus Projekt Leyden in die JVM Einzug halten, die – aufbauend auf den beiden vorherigen – noch mehr der bislang beim Start der JVM ausgeführten Arbeit in die Build-Zeit verlagern.\u003c/p\u003e\n\n\u003cp\u003eAlternativ gibt es mit CRaC ein zweites Projekt, um eine gestartete Anwendung zu pausieren und anschließend von genau diesem Stand wieder zu starten. Und auch abseits der größeren Initiativen ist den Menschen, die am OpenJDK arbeiten, stets bewusst, dass Performance ein relevanter Punkt ist. Und so werden die JVMs generell von Release zu Release schneller.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/fuozr7dnnjuyef7f7ke0kzgoj89o?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/","url":"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/","date_published":"2026-03-04T00:00:00.000+00:00","date_modified":"2026-03-06T11:34:35.126+01:00","author":[{"name":"Gerrit Beine","url":"https://www.innoq.com/en/staff/gbeine/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eWith OpenClaw, Captain Picard’s legendary command “Computer, tea. Earl Grey, hot.” is finally becoming reality.\nThe automation agent connects all the internet-based tools that play a role in everyday life with the cryptic low-level tools of the command line, allowing humans to take control via chat interfaces.\nIn the relevant online channels, the tenor is clear: the dream of universal agents is finally coming true.\nThe author Peter Steinberger is being celebrated for having accomplished something that all the companies with all their money and brilliant minds had supposedly failed to achieve.\u003c/p\u003e\n\n\u003cp\u003eAn observer viewing the discourse from the outside might ask at this point:\nAre we witnessing a second Linus Torvalds, with OpenClaw as a reincarnation of the Linux kernel on a new level, or are we witnessing a second Robert Oppenheimer, because “the most dangerous software in the world” (Steinberger, \u003ca href=\"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/#fn:1\" id=\"fnref:1\" title=\"see footnote\" class=\"footnote\"\u003e[1]\u003c/a\u003e) possesses a potential for social destruction that is quite comparable to Oppenheimer’s Manhattan Project?\u003c/p\u003e\n\n\u003cp\u003eThis question will be answered in the future.\nFor now, the focus here is on why this tool demonstrates how right Angela Merkel was.\nThe topic of vibe coding will be set aside.\nInstead, the focus is on the social mechanics that can be observed through this example.\u003c/p\u003e\n\n\u003cp\u003eWhat the tool can do in detail is hardly surprising.\nAccessing internet services via APIs and defining automations through specified workflows has long been solved at the technical level and is feasible even for technically inexperienced users thanks to tools like Zapier or n8n.\nUser-friendly interfaces for command line tools have likewise been common for some time.\nAnd that large language models achieve sufficient quality for most cases in generating commands in programming languages can also be considered established.\u003c/p\u003e\n\n\u003ch3 id=\"killing-the-social-contract\"\u003eKilling the social contract\u003c/h3\u003e\n\n\u003cp\u003eSo what is new about the idea behind OpenClaw, and why is it being celebrated?\nThe technical answer is trivial: the software combines these capabilities by acting as an agent that links all these tools together and makes them accessible through simple interfaces such as chats via Telegram or Discord.\u003c/p\u003e\n\n\u003cp\u003eThe answer at the social level is less straightforward and requires closer examination.\nOpenClaw, through its architecture, terminates a social contract that underlies all interpersonal collaboration.\nThis informal yet nonetheless effective social contract ensures that responsibility can be assigned through negotiation, and that this negotiation takes place at the interfaces between systems—social as well as technical.\u003c/p\u003e\n\n\u003cp\u003eAt technical interfaces, such a contract is typically formulated as a specification that clarifies whether the participating systems are a client or a server, what data is transmitted, and which software development team must ultimately bear responsibility.\nIf these matters were unclear, the interface would function only by fortunate coincidence, and in the event of a problem, it would be unclear who could contribute to a solution and how.\u003c/p\u003e\n\n\u003cp\u003eAt social interfaces, these social contracts function quite similarly.\nFor the purchase of a die, for example, it is negotiated who is the buyer and who is the seller.\nBoth have rights and obligations that are the subject of this negotiation.\nThe outcome creates binding commitments and the possibility of clarifying who may have failed to fulfill their obligations.\nWithout this social contract and the associated negotiation, it would be impossible to establish a stable legal system.\u003c/p\u003e\n\n\u003cp\u003eThis fact may seem banal at first; its consequences are not.\nOpenClaw may be just a single tool today, but it follows the trend of blurring the distinctions necessary for the assignment of responsibility.\u003c/p\u003e\n\n\u003cp\u003eOpenClaw can, by design, only function without this social contract.\nIt runs as an agent on a computer defined by the user and connects to the tools on the internet for which the user authorizes it.\nFor control, the user can select one or more channels, and OpenClaw receives its intelligence through a large language model specified by the user, to which access must likewise be provided.\nAs Steinberger says, the software is designed to be allowed to do everything.\nIt is “Skynet” (Steinberger, \u003ca href=\"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/#fn:1\" title=\"see footnote\" class=\"footnote\"\u003e[1]\u003c/a\u003e) with root privileges, and this is intentional.\nIt can have root privileges on my computer and, if I authorize it, over my life as well.\nWhat makes security experts' hair stand on end is, in this case, deliberate:\nOpenClaw works precisely because it brings no inherent security barriers.\nPut bluntly: Steinberger succeeds at what supposedly no company had managed only because he ignores a restriction that companies cannot ignore.\u003c/p\u003e\n\n\u003ch3 id=\"the-return-of-the-chancellor\"\u003eThe return of the chancellor\u003c/h3\u003e\n\n\u003cp\u003eThis problem is further amplified by the large language models, which themselves must, by design, be insecure.\nAny attempt to impose a security barrier on natural language is perceived in the real world as censorship and leads to substitute patterns in language use.\nA large language model will, of course, eventually adapt to these as well.\nSecurity must therefore necessarily occur at the interfaces that connect the large language model to other tools.\u003c/p\u003e\n\n\u003cp\u003eOpenClaw removes these barriers insofar as it will attempt to carry out all interactions with the interfaces of connected services that can be described in natural language.\nUntil now, this security check at interfaces has functioned on the basis of the contract described above: for example, users authenticate themselves with credentials at a payment service provider and transfer money.\nOpenClaw can theoretically do this as well if it has access to the payment service provider.\nHowever, the large language model controlling OpenClaw can neither recognize the intent behind a command nor is it possible to ensure that this command was not triggered by a prompt injection.\nThe payment service provider practically cannot distinguish whether OpenClaw or the user herself triggered the command.\nSteinberger repeatedly points to these problems and has largely protected himself legally through his choice of the MIT License.\u003c/p\u003e\n\n\u003cp\u003eAnd it is precisely here that Angela Merkel steps out of the shadows and smiles at us as if she were thinking: I told you so.\nFor what Steinberger demonstrates with OpenClaw is how little the software engineering community reflects on the social implications of software, as long as it merely functions technically.\u003c/p\u003e\n\n\u003ch3 id=\"sprinting-blind-through-a-marathon\"\u003eSprinting blind through a marathon\u003c/h3\u003e\n\n\u003cp\u003eThe result is a shift in social contracts for which no one is prepared and which, even if OpenClaw as a tool disappears again, will have lasting effects for a long time to come.\nThe consequence of OpenClaw’s availability is that people will use it.\nThe previously accepted social contract is thrown off balance because, on the one hand, the binding commitments it entails suddenly become an obstacle in the use of OpenClaw and, on the other hand, the attribution necessary for clarifying responsibility is no longer possible.\nThis unclear attribution of responsibility is the reason why no company has released software like OpenClaw until now.\nSteinberger places the responsibility with the users, and a counterargument is likely to be difficult in the age of personal responsibility.\u003c/p\u003e\n\n\u003cp\u003eThis attitude, which Adrian Daub described in his book “What Tech Calls Thinking” \u003ca href=\"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/#fn:2\" id=\"fnref:2\" title=\"see footnote\" class=\"footnote\"\u003e[2]\u003c/a\u003e, is either naive or ignorant.\nIt reveals a resistance to learning that has persisted in the software engineering community for decades with regard to the social implications of software.\nFor the fact that, for example, the mechanisms necessary for the social contract described above cannot be retrofitted without entailing functional limitations is likely to be largely uncontested.\u003c/p\u003e\n\n\u003cp\u003eThis becomes fatal in situations where security must be considered as a principle because social coexistence is based on it.\nUnlike the reliable computer in Star Trek, OpenClaw controls processes in an opaque manner.\nNo one is prepared for a situation in which it becomes necessary to clarify how OpenClaw arrived at a result, but this is technically impossible.\nResponsibility lies solely with individuals who have only marginal possibilities to intervene in the processes.\nFor such a situation, there is neither a legal framework capable of addressing the consequences nor will there be social acceptance that goes beyond the word “personal responsibility”.\u003c/p\u003e\n\n\u003cp\u003eThat the perspective presented here is scarcely represented in IT allows only one conclusion: Angela Merkel is still right, even in 2026.\u003c/p\u003e\n\n\u003cinfo-box title=\"Excursus: The Social Contract\"\u003e\u003cdetails\u003e\u003csummary\u003e\u003cdiv class=\"infobox__teaser\"\u003e\n\u003cdiv class=\"infobox__teaser__left\"\u003e\n\u003cspan class=\"infobox__teaser__heading\"\u003eExcursus: The Social Contract\u003c/span\u003e\u003cspan class=\"icon icon-info\"\u003e\u003c/span\u003e\n\u003c/div\u003e\n\u003cdiv class=\"infobox__teaser__right\"\u003e\u003cspan class=\"icon icon-chevron-down icon--small\"\u003e\u003c/span\u003e\u003c/div\u003e\n\u003c/div\u003e\u003c/summary\u003e\u003cdiv class=\"infobox__content\"\u003e\r\n\n\u003cp\u003eThe concept of the social contract is informal and not strictly defined.\u003c/p\u003e\n\n\u003cp\u003eHumans reduce the uncertainty of an open future by negotiating shared goals and mutually committing to their observance.\nThis can occur formally, such as through contracts, or informally, as in friendships or romantic relationships.\nAs a result, social action becomes normatively expectable as a justified assumption that agreements will be honored.\u003c/p\u003e\n\n\u003cp\u003eThis mechanism corresponds to Luhmann’s concept of trust \u003ca href=\"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/#fn:3\" id=\"fnref:3\" title=\"see footnote\" class=\"footnote\"\u003e[3]\u003c/a\u003e: trust reduces social complexity by allowing one to mentally exclude certain courses of action on the part of the other.\nThe social contract institutionalizes this trust.\nOne no longer trusts merely the person but the mutually recognized rule.\u003c/p\u003e\n\n\u003cp\u003eAn example of the application of this social contract is the formal legal system.\nWithout the basic idea of binding agreements, the concept of a breach of law would be meaningless.\nThe law thus operates on the institutionalized expectability that the social contract creates.\u003c/p\u003e\n\n\u003cp\u003eIn IT, the social contract is implemented through compliance and security.\u003c/p\u003e\n\n\u003c/div\u003e\u003c/details\u003e\u003c/info-box\u003e\r\n\n\u003ch3 id=\"links-literature\"\u003eLinks \u0026amp; Literature\u003c/h3\u003e\n\n\u003cfoot-notes class=\"footnotes\"\u003e\n\n\u003col class=\"footnotes__list\"\u003e\n\n\u003cli id=\"fn:1\"\u003e\n\u003cp\u003ec’t 3003, \u003ca href=\"https://www.youtube.com/watch?v=ps7kqEXkwEs\"\u003eOpenClaw: Ja, der Hype ist gerechtfertigt\u003c/a\u003e \u003ca href=\"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/#fnref:1\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:2\"\u003e\n\u003cp\u003eDaub, Adrian, “What Tech Calls Thinking”, FSG Originals, 2020 \u003ca href=\"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/#fnref:2\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003cli id=\"fn:3\"\u003e\n\u003cp\u003eLuhmann, Niklas, “Vertrauen. Ein Mechanismus der Reduktion sozialer Komplexität”, 5. Auflage, UTB, 2014 \u003ca href=\"https://www.innoq.com/en/blog/2026/03/neuland-reloaded/#fnref:3\" title=\"return to body\" class=\"reversefootnote\"\u003e ↩︎\u003c/a\u003e\u003c/p\u003e\n\u003c/li\u003e\n\n\u003c/ol\u003e\n\u003c/foot-notes\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/i088c9swp6s8k3swr3o1r40q5zd8?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/03/dev-sandbox-network/","url":"https://www.innoq.com/en/blog/2026/03/dev-sandbox-network/","date_published":"2026-03-03T00:00:00.000+00:00","date_modified":"2026-04-27T14:18:50.080+02:00","author":[{"name":"Joy Heron","url":"https://www.innoq.com/en/staff/joy-heron/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eIn a \u003ca href=\"https://www.innoq.com/en/blog/2025/12/dev-sandbox/\"\u003eprevious post\u003c/a\u003e, I described my solution for setting up a development sandbox for my coding agents. My solution entails using a \u003ca href=\"https://github.com/lima-vm/lima\"\u003eLima\u003c/a\u003e virtual machine (VM) on MacOS and limiting the user capabilities within that machine to the very minimum. Since my user within the VM has only the necessary permissions, a coding agent that runs using my user’s permissions will also be similarly limited. That post details the first necessary step for creating a sandbox that severely limits the amount of code that an agent will be able to access. This provides protection against the first of the \u003ca href=\"https://simonwillison.net/2025/Jun/16/the-lethal-trifecta/\"\u003elethal trifecta\u003c/a\u003e by \u003cstrong\u003epreventing access to private data\u003c/strong\u003e as much as absolutely possible.\u003c/p\u003e\n\n\u003cp\u003eHowever, the solution does not protect against the other two members of the lethal trifecta: \u003cstrong\u003eexposure to untrusted content\u003c/strong\u003e and \u003cstrong\u003ethe ability to communicate externally\u003c/strong\u003e because the virtual machine that I am using as a sandbox is still connected to the internet. Any arbitrary HTTP request can be sent, allowing information to leak out of the sandbox. The result of that arbitrary request will be added to the system context without verification which can expose our agents to untrusted content.\u003c/p\u003e\n\n\u003cp\u003eUp until now, I have kept myself in the loop, using the built-in network policy tools in \u003ca href=\"https://openai.com/index/introducing-codex/\"\u003eCodex\u003c/a\u003e and \u003ca href=\"https://www.claude.com/product/claude-code\"\u003eClaude Code\u003c/a\u003e to explicitly approve each connection to the internet. However, I’d really like to take myself out of the loop and define specific rules for websites that can be accessed freely without my explicit consent.\u003c/p\u003e\n\n\u003cp\u003eThis post details my solution to the problem. I want to add the caveat that with this post I am leaning outside of my comfort zone. I am a full-stack developer, and I normally shy away from Linux configuration and networking. I’ve had to retrieve much that I had forgotten (what does the TLS handshake look like again?) and learn much that I didn’t know before. But if the advent of AI doesn’t force us to sometimes venture out from what we are comfortable doing, then what will?\u003c/p\u003e\n\n\u003cp\u003eI used AI for analyzing different solutions and AI helped me generate the configuration I am sharing in this post. But at no point did I let an agent loose on either the computer or the sandbox to configure the system itself. I’ve read the documentation, double and triple checked everything, and \u003ca href=\"https://www.innoq.com/en/blog/2025/12/ai-cognitive-lens-elaboration/#askquestionsactively\"\u003eactively asked every question that I could think of\u003c/a\u003e. If you spot anything that could be improved upon, please reach out and let me know. As with the first post, I’ve also asked our INNOQ security experts to vet my approach as well.\u003c/p\u003e\n\n\u003ch3 id=\"run-proxy-on-the-host-to-only-allow-access-to-specific-domains\"\u003eRun proxy on the host to only allow access to specific domains\u003c/h3\u003e\n\n\u003cp\u003eThe first step was to set up a proxy on the host machine to allow me to monitor any outgoing traffic from my development sandbox and set up rules about which websites are allowed and which are denied.\u003c/p\u003e\n\n\u003cp\u003eMy journey to finding a solution which worked for me was fraught with bumbling and stumbling.\u003c/p\u003e\n\n\u003cp\u003eI found out that I absolutely did not want to manually approve every single IP address in a firewall installed on my host, especially since I couldn’t tell which domain those IP addresses were mapping to.\u003c/p\u003e\n\n\u003cp\u003eAt one point the AI suggested extending a Man-in-the-Middle proxy to review every request. This could be a valid approach if you know what you are doing and could be beneficial to be able to inspect and monitor the content of the requests which are being sent to and from the sandbox. This would, however, require creating a self-signed certificate and configure all of the trust stores on the client to trust that certificate, which is something that I preferred not doing.\u003c/p\u003e\n\n\u003cp\u003eThe solution I eventually settled on is a \u003cem\u003eforward proxy\u003c/em\u003e: a proxy which intercepts a request and forwards it on to the destination. Using a forward proxy, I was able to intercept any \u003ccode\u003eCONNECT\u003c/code\u003e method that the sandbox sends when trying to build a TLS connection to the target server, which allows me to check that request against a specific allowlist of domains. If it doesn’t match, the connection will be dropped.\u003c/p\u003e\n\n\u003cp\u003eIn my desire to not reinvent the wheel, I settled on using the proxy \u003ca href=\"https://www.squid-cache.org/\"\u003eSquid\u003c/a\u003e which markets itself primarily as a caching proxy, but can also be used as a CONNECT-only allowlist proxy. Since I’m on MacOS, I installed Squid using \u003ca href=\"https://brew.sh/\"\u003eHomebrew\u003c/a\u003e. Then I added the following to my Squid configuration (for me installed in \u003ccode\u003e/opt/homebrew/etc/squid.conf\u003c/code\u003e) and restarted the service:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e############################################\r\n# Custom: CONNECT-only allowlist proxy\r\n############################################\r\n\r\n# dev proxy should listen on port 8888\r\nhttp_port 8888\r\n\r\n# Only allow CONNECT to standard TLS port 443\r\nacl SSL_ports port 443\r\nacl CONNECT method CONNECT\r\nhttp_access deny CONNECT !SSL_ports\r\n\r\n# Only allow proxy use from the sandbox network\r\nacl vmnet src 127.0.0.1/32\r\n\r\n# Destination domain allowlist\r\nacl allowed_domains dstdomain \"/opt/homebrew/etc/squid/allowed_domains.txt\"\r\n\r\n# Allow only: sandbox net + CONNECT + allowlisted domains\r\nhttp_access allow vmnet CONNECT allowed_domains\r\n\r\n# Block everything else\r\nhttp_access deny all\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eThe \u003ccode\u003eallowed_domains.txt\u003c/code\u003e list can be easily configured to allow only specific domains or wildcards:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003eexample.org\r\n.openai.com\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003ch3 id=\"configure-vm-to-use-proxy-on-host\"\u003eConfigure VM to use proxy on host\u003c/h3\u003e\n\n\u003cp\u003eHaving the proxy set up on the host is all well and good, but if the development sandbox doesn’t use the proxy it isn’t worth much. The next step was to get the environment set up so that the tools would actually route their traffic over the proxy on the host.\u003c/p\u003e\n\n\u003cp\u003eMany tools (e.g. \u003ccode\u003ecurl\u003c/code\u003e or coding agents like \u003ccode\u003ecodex\u003c/code\u003e and \u003ccode\u003eclaude\u003c/code\u003e) understand the following configuration:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003eexport HOST_IP=\"\u0026lt;My Host IP\u0026gt;\"\r\nexport PROXY_PORT=\"8888\"\r\n\r\nexport HTTPS_PROXY=\"http://$HOST_IP:$PROXY_PORT\"\r\nexport HTTP_PROXY=\"$HTTPS_PROXY\"\r\nexport NO_PROXY=\"localhost,127.0.0.1\"\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eFor Gradle, I had to additionally configure my \u003ccode\u003e~/.gradle/gradle.properties\u003c/code\u003e to contain the following:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003esystemProp.http.proxyHost=\u0026lt;My Host IP\u0026gt;\r\nsystemProp.http.proxyPort=8888\r\n\r\nsystemProp.https.proxyHost=\u0026lt;My Host IP\u0026gt;\r\nsystemProp.https.proxyPort=8888\r\n\r\nsystemProp.http.nonProxyHosts=localhost|127.0.0.1\r\nsystemProp.https.nonProxyHosts=localhost|127.0.0.1\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003ch3 id=\"force-sandbox-to-only-communicate-over-the-proxy\"\u003eForce sandbox to only communicate over the proxy\u003c/h3\u003e\n\n\u003cp\u003eThe setup so far enables the development sandbox to communicate with the proxy over the host, but it doesn’t actually force all traffic from the sandbox over the proxy. If the coding agents play by the rules, this could provide some layer of protection but it isn’t absolute\u003c/p\u003e\n\n\u003cp\u003eIn order to do this, I installed \u003ca href=\"https://www.netfilter.org/projects/nftables/index.html\"\u003enftables\u003c/a\u003e and created the following \u003ccode\u003e/etc/nftables-proxy-egress.nft\u003c/code\u003e configuration:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003etable inet sandbox {\r\n  chain output {\r\n    type filter hook output priority 0; policy drop;\r\n\r\n    # Allow loopback traffic\r\n    oif \"lo\" accept\r\n\r\n    # Allow established/related connections\r\n    ct state established,related accept\r\n\r\n    # allow DNS out (udp/tcp 53).\r\n    # (this policy could be tightened to allow DNS only to specific IPs)\r\n    udp dport 53 accept\r\n    tcp dport 53 accept\r\n    \r\n    # Allow local Docker networks (for Testcontainers, DBs, etc.)\r\n    ip daddr 172.17.0.0/16 accept\r\n    ip daddr 172.18.0.0/16 accept\r\n\r\n    # Allow traffic to the proxy\r\n    ip daddr \u0026lt;My Host Ip\u0026gt; tcp dport 8888 accept\r\n  }\r\n}\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eI loaded them with \u003ccode\u003esudo nft -f /etc/nftables-proxy.egress.nft\u003c/code\u003e and tested that internet access apart from my defined allowlist was not allowed (e.g. using curl with the proxy environment variables disabled). I also used \u003ccode\u003esudo nft list ruleset\u003c/code\u003e to check that the sandbox rules were loaded.\u003c/p\u003e\n\n\u003cp\u003eUnfortunately, the rules did not survive a reboot. It is possible to \u003ca href=\"https://blog.programster.org/nftables-cheatsheet#make-rules-persistent\"\u003emake the rules persistent\u003c/a\u003e, but that caused some issues for me with my setup because Docker also adds rules to \u003ccode\u003enftables\u003c/code\u003e which need to be active before my sandbox rules in order for my integration tests (which use testcontainers) to work properly. To get around this, I created a small service on the Linux machine which runs after docker and inserts the rules into the system. This is my \u003ccode\u003e/etc/systemd/system/nftables-proxy-egress.service\u003c/code\u003e:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e[Unit]\r\nDescription=Apply nftables proxy egress rules\r\nAfter=network-online.target docker.service\r\nWants=network-online.target\r\nOnFailure=proxy-egress-console-alert.service\r\n\r\n[Service]\r\nType=oneshot\r\nExecStartPre=/usr/sbin/nft -c -f /etc/nftables-proxy-egress.nft\r\nExecStart=/usr/sbin/nft -f /etc/nftables-proxy-egress.nft\r\n\r\n[Install]\r\nWantedBy=multi-user.target\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eThe \u003ccode\u003enft -c\u003c/code\u003e check causes the system to fail fast, and the \u003ccode\u003eOnFailure\u003c/code\u003e directive calls \u003ccode\u003eproxy-egress-console-alert.service\u003c/code\u003e upon failure which I can then use to alert me if loading the rules fails. Otherwise, I might not notice if the service fails and doesn’t load the networking rules and would be using my system without the assurance that my VM would direct all traffic over the host.\u003c/p\u003e\n\n\u003cp\u003eHere is my \u003ccode\u003e/etc/systemd/system/proxy-egress-console-alert.service\u003c/code\u003e which just creates a file \u003ccode\u003e/var/lib/nft-egress-failed\u003c/code\u003e when something goes wrong.\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003e[Unit]\r\nDescription=Console alert if nftables proxy egress fails\r\n\r\n[Service]\r\nType=oneshot\r\nExecStart=/usr/bin/touch /var/lib/nft-egress-failed\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eI then added a few lines to my \u003ccode\u003e~/.bashrc\u003c/code\u003e which add a \u003ccode\u003e🚨🚨 NFT-EGRESS-FAILED 🚨🚨\u003c/code\u003e flashing message to my prompt to alert me that loading the egress failed:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003eif [ -f /var/lib/nft-egress-failed ]; then\r\n\r\n  PS1='\\[\\033[5;1;31m\\]🚨🚨 NFT-EGRESS-FAILED 🚨🚨\\[\\033[0m\\]\\n\\[\\033[1;31m\\]\\u@\\h:\\w\\$ \\[\\033[0m\\]'\r\n\r\nfi\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eOnce I had my service designed, I then reloaded and restarted the service:\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003esudo systemctl daemon-reload\r\nsudo systemctl restart nftables-proxy-egress.service\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eA colleague suggested taking the whole network offline on failure, but when I tried that I managed to brick the whole VM because Lima’s \u003ccode\u003eshell\u003c/code\u003e command uses \u003ccode\u003essh\u003c/code\u003e over the network to communicate with the VM, so I’m sticking with the flashing prompt message for now. Here I also want to note that having to fiddle with \u003ccode\u003enftables\u003c/code\u003e in order to route all of the traffic over a proxy on the host may well be a limitation of the virtual machine technology that I chose. If you are using a virtual machine that can be provisioned by \u003ca href=\"https://developer.hashicorp.com/vagrant\"\u003eVagrant\u003c/a\u003e (e.g. \u003ca href=\"https://www.virtualbox.org/\"\u003eVirtual Box\u003c/a\u003e) then there is a \u003ca href=\"https://www.devopsroles.com/vagrant-proxy-configuration/\"\u003eplugin\u003c/a\u003e which would allow you to define that declaratively.\u003c/p\u003e\n\n\u003cp\u003eAs a final quick check, it is important to make sure that \u003ccode\u003enft\u003c/code\u003e, the \u003ccode\u003e/etc/nftables-proxy-egress.nft\u003c/code\u003e configuration, and the services can \u003cstrong\u003eonly be modifiable by root\u003c/strong\u003e! Otherwise it would be theoretically possible for the agent to modify them to open up the network with your permissions.\u003c/p\u003e\n\n\u003ch3 id=\"monitor-traffic-and-modify-allowlist\"\u003eMonitor traffic and modify allowlist\u003c/h3\u003e\n\n\u003cp\u003eThe final step is to monitor the traffic from the virtual machine and extend the allowlist to let any requests through that we need for day to day development. The logs for Squid can be accessed on the command line (for me they are found at \u003ccode\u003e/opt/homebrew/var/logs/access.log\u003c/code\u003e)\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003esudo tail -f /opt/homebrew/var/logs/access.log\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eAny request which is dropped will be logged with a \u003ccode\u003eTCP_DENIED\u003c/code\u003e method. If you want to allow requests to that URL, you can modify your \u003ccode\u003eallowed_domains.txt\u003c/code\u003e to include the new domain and restart Squid\u003c/p\u003e\n\n\u003cfigure\u003e\u003cdiv class=\"highlight\" title=\"\"\u003e\u003cpre class=\"\"\u003e\u003ccode\u003ebrew services restart squid\u003c/code\u003e\u003c/pre\u003e\u003c/div\u003e\u003c/figure\u003e\n\n\u003cp\u003eIn my day to day practice, I’ve found that I have only rarely had to modify this list. I have 12 URLs in my allowlist, and that seems sufficient. I do have package registries for Gradle and npm in the list, which theoretically opens me up for supply chain attacks, even though the risk is very much minimized and I finally feel comfortable letting down my guardrails to see what my agents can do.\u003c/p\u003e\n\n\u003ch4 id=\"agents-for-coding-chatbots-for-search\"\u003eAgents for Coding, Chatbots for Search\u003c/h4\u003e\n\n\u003cp\u003eOne of the reasons that the list of URLs is so small is that by default the coding agents will only retrieve the application dependencies (from maven or npm repositories) and communicate with the LLM models directly over the APIs from the different providers (e.g. OpenAI or Anthropic). By default, the coding agents will not perform web search unless specifically requested to do so, which means that in performing the coding tasks, they are retrieving information only from the different models and not from any random sites on the internet. This reduces the risk for prompt injection.\u003c/p\u003e\n\n\u003cp\u003eSetting up a proxy in front of the coding agents also adds friction for me if I were to want to activate the web search features for my agents because I would have to temporarily disable or modify the proxy and keep myself in the loop to approve any web request that the agent would want to make. This is something I don’t want to do, because I often have longer running tasks in the sandbox where I want the proxy to remain active.\u003c/p\u003e\n\n\u003cp\u003eIn practice what this means is that I exclusively rely on the models with no web search for any programming task, and for tasks where I do want to activate web search and find the most up-to-date information, I use chatbots in the browser. From the lethal trifecta perspective this is ideal. The long running tasks without direct supervision only have access to a limited subset of data and extremely limited network access. The shorter research tasks requiring web search do have access to the internet, but take place in the browser with extremely limited access to data.\u003c/p\u003e\n\n\u003ch3 id=\"possible-next-steps-putting-icing-on-the-top-of-the-cake\"\u003ePossible next steps: Putting icing on the top of the cake\u003c/h3\u003e\n\n\u003cp\u003eFor my current threat model, I am satisfied with the solution that I’ve presented in this article. There are, however, a few other steps that you could take to improve the sandbox solution even more. One of them would be to integrate a network analysis and threat detection software like \u003ca href=\"https://suricata.io/\"\u003eSuricata\u003c/a\u003e into the sandbox solution in order to be alerted directly about any fishy behavior and more easily figure out what went wrong. In a similar vein, a digital forensic software like \u003ca href=\"https://github.com/Velocidex/velociraptor\"\u003eVelociraptor\u003c/a\u003e could be used to collect detailed information about what is actually going on within the sandbox. This would provide insights into the processes that are running and help to figure out if any malicious behavior has occurred. I also do want to tighten down the DNS configuration in the nftables configuration to prevent data exfiltration via DNS tunneling.\u003c/p\u003e\n\n\u003ch3 id=\"loosening-the-reins-on-the-programming-agents\"\u003eLoosening the reins on the programming agents\u003c/h3\u003e\n\n\u003cp\u003eAt this point, I have provided mitigations for each of the lethal trifecta. A virtual machine with as few privileges as possible provides a sandbox for my coding agents, ensuring they do not have any critical data that could leak should the agent become compromised. Forcing all of the traffic through a proxy on the host with a very strict allowlist also severely limits any exposure to untrusted content and strictly limits the ability to communicate with the outside world.\u003c/p\u003e\n\n\u003cp\u003eWith this basis in place, I finally feel comfortable loosening the strict guardrails on my coding agents. Running \u003ccode\u003ecodex --yolo\u003c/code\u003e and \u003ccode\u003eclaude --dangerously-skip-permissions\u003c/code\u003e has been really exciting because it allows me to create detailed multi-step tasks for my agents to execute without having to keep myself in the loop and monitor all of the things that they are doing. This in turn gives me \u003ca href=\"https://www.innoq.com/en/blog/2026/01/what-to-do-while-ai-is-thinking/\"\u003emore time that I can use for other tasks\u003c/a\u003e, without suffering a mental overload from having to switch my focus too often. However, I want to reiterate here that I consider loosening these guardrails only to be acceptable because I have set up my sandbox and network policy to provide even more protection than those provided by the agent tooling itself.\u003c/p\u003e\n\n\u003cp\u003eMy current solution is still very basic, but I believe it provides a solid foundation that will allow me to add features without sacrificing security.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/twifwlxxn1z95cll3s90o97d9urz?_a=BACMTiAE"},{"id":"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/","url":"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/","date_published":"2026-03-03T00:00:00.000+00:00","date_modified":"2026-03-16T09:14:50.093+01:00","author":[{"name":"Torben Keller","url":"https://www.innoq.com/en/staff/torben-keller/"}],"content_html":"\u003c!DOCTYPE html PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\" \"http://www.w3.org/TR/REC-html40/loose.dtd\"\u003e\n\u003chtml\u003e\u003cbody\u003e\n\u003cp\u003eAI agents write code. A lot of code. Fast. If you run multiple agents in parallel, you get thousands of lines of new code in just a few hours. Last week, one of my agents implemented a complete feature in an hour and a half – 4,500 lines, cleanly structured, merge request created. It just worked. Then I did some quick math: a thorough review would have taken me two, maybe three days.\u003c/p\u003e\n\n\u003cp\u003eThis is the reality nobody likes to talk about: while I’m going through the first merge request, the agent is already producing the next 10,000 lines. If you want to read everything, you become the bottleneck. You’re slowing down exactly the productivity you were hoping to get from agents in the first place.\u003c/p\u003e\n\n\u003cp\u003eOpenAI demonstrated in an internal experiment what’s already possible: a team of three engineers built roughly one million lines of code in five months – without a single manually written line. 3.5 pull requests per engineer per day.\u003csup\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/#fn1\"\u003e[1]\u003c/a\u003e\u003c/sup\u003e If you think you can handle this output with traditional code review, you haven’t done the math.\u003c/p\u003e\n\n\u003cp\u003eI don’t read everything anymore. I only look at the important things. But how do I decide what’s important? And above all: how do I take responsibility for code I haven’t fully read?\u003c/p\u003e\n\n\u003ch2 id=\"weveseenthisbefore\"\u003eWe’ve Seen This Before\u003c/h2\u003e\n\n\u003cp\u003eBut is this really a new problem? I’d argue: no.\u003c/p\u003e\n\n\u003cp\u003eA new team member joins a project, the original authors left long ago – and yet productive work continues. Responsibility is taken for new features, even though the new team member doesn’t know every line of the codebase. What makes this possible? Not the hope that the code will just work. But tangible things: a comprehensible architecture, a test suite that immediately flags when a change breaks something else, usable documentation, a CI pipeline that catches errors. Only then is brownfield development possible at all.\u003c/p\u003e\n\n\u003cp\u003eAgents are like these new team members – except that every new session requires a fresh onboarding. This works best when they only need to understand small parts of the application to make changes. What helps with that are the same things as in brownfield: a clear structure, good tests, and comprehensible documentation.\u003c/p\u003e\n\n\u003cp\u003eExactly this environment – tests, architecture checks, documentation, CI pipelines – is what makes the difference. In the world of agents, a term has emerged for it: the agent harness.\u003c/p\u003e\n\n\u003ch2 id=\"whatisanagentharness\"\u003eWhat Is an Agent Harness?\u003c/h2\u003e\n\n\u003cp\u003eCharlie Guo puts it succinctly: an agent harness is “the set of constraints, tools, documentation, and feedback loops that keep an agent productive and on track”.\u003csup\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/#fn2\"\u003e[2]\u003c/a\u003e\u003c/sup\u003e The formula behind it: Model + Harness = Agent.\u003csup\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/#fn3\"\u003e[3]\u003c/a\u003e\u003c/sup\u003e A good model alone isn’t enough – what makes the difference in code quality is the harness.\u003c/p\u003e\n\n\u003cp\u003eThe core philosophy behind it is radically practical: mechanical enforcement instead of hope. Mitchell Hashimoto, who coined the term “harness engineering”, puts the principle like this: “Anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again.”\u003csup\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/#fn4\"\u003e[4]\u003c/a\u003e\u003c/sup\u003e Don’t trust that the agent will do better next time – instead, build feedback loops and deterministic checks that automatically catch the mistake next time around.\u003c/p\u003e\n\n\u003cp\u003eMy conviction, which grows stronger with every week of practice: the harness is the most important part of the entire equation. In my experience, a solid model with a well-designed harness delivers better results than a top-tier model without guardrails. That’s why I invest the majority of my time not in better prompts, but in a better harness.\u003c/p\u003e\n\n\u003cp\u003eBut what does a harness actually look like in practice?\u003c/p\u003e\n\n\u003ch2 id=\"whataharnesslookslikeinpractice\"\u003eWhat a Harness Looks Like in Practice\u003c/h2\u003e\n\n\u003cp\u003eIn my projects, the harness has evolved into a multi-layered safety net. Four layers that build on each other – each one catches what the previous one lets through.\u003c/p\u003e\n\n\u003ch3 id=\"deterministic-guardrails-the-first-line-of-defense\"\u003eDeterministic Guardrails – the First Line of Defense\u003c/h3\u003e\n\n\u003cp\u003eEverything that can be checked automatically, I check automatically. On two levels: pre-commit hooks run the fast checks – unit tests, integration tests, architecture tests with ArchUnit, linting, and formatting. In the CI pipeline, the heavier artillery follows: E2E tests, security scans, static code analysis. The list grows with every mistake an agent makes – in the spirit of Hashimoto’s principle.\u003c/p\u003e\n\n\u003cp\u003ePre-commit hooks are the critical mechanism. They are the gate: as long as these checks aren’t green, the agent cannot commit. No exceptions, no warnings – zero-warning tolerance. The agent hits the wall, gets the error message, and corrects itself.\u003c/p\u003e\n\n\u003ch3 id=\"ai-reviews-the-second-look\"\u003eAI Reviews – the Second Look\u003c/h3\u003e\n\n\u003cp\u003eA second agent reviews the first agent’s code. This sounds redundant at first, but in practice it’s surprisingly effective – precisely because the review agent has its own independent context. It doesn’t know the creation process, only the changes, the ticket, and the acceptance criteria. On this basis, it checks whether the merge request fulfills the requirements, looks for architecture violations, and finds code smells that static analysis doesn’t catch.\u003c/p\u003e\n\n\u003cp\u003eWhen both the deterministic checks and an AI review show green, my confidence increases significantly. This doesn’t fully replace human review. But it adds a layer that works fast and consistently.\u003c/p\u003e\n\n\u003ch3 id=\"selective-human-review-the-important-parts\"\u003eSelective Human Review – the Important Parts\u003c/h3\u003e\n\n\u003cp\u003eThis is where the real shift happens: I don’t read everything anymore. I specifically look at the core business logic. Is the agent really doing what I expect? Is the domain decision correct? Does the code accurately represent the domain?\u003c/p\u003e\n\n\u003cp\u003eBoilerplate, mapping code, standard patterns – I leave that to the harness. When the first two layers show green, I don’t need to dig in here. Over time, you learn where you need to look and where you don’t. This intuition develops with every feature you build with agents.\u003c/p\u003e\n\n\u003ch3 id=\"product-testing-does-it-actually-work\"\u003eProduct Testing – Does It Actually Work?\u003c/h3\u003e\n\n\u003cp\u003eAt the end of the day, only one thing matters: does the software do what it’s supposed to? I test the feature, check the behavior, click through the application. Preview environments that are automatically spun up per merge request make this easy – a quick way to verify the result before it goes to production. All quality metrics can be perfect – if the feature doesn’t do what the user needs, it’s all worthless.\u003c/p\u003e\n\n\u003cp\u003eWhen the harness shows green, the AI reviews pass, the selective review brings no surprises, and the feature works – then I can ship. Not blindly, not naively, but with a confidence built on multiple independent layers of verification.\u003c/p\u003e\n\n\u003cp\u003eBut this harness doesn’t run itself.\u003c/p\u003e\n\n\u003ch2 id=\"whattheharnessisnot:alicensetolookaway\"\u003eWhat the Harness Is Not: a License to Look Away\u003c/h2\u003e\n\n\u003cp\u003eJust throwing agents at a legacy project and expecting a 10x boost – that’s a pipe dream. However, if you’ve already been practicing good software engineering – tests, clean architecture, documentation, independently testable components – you already have most of the harness and can integrate AI agents relatively easily. If, on the other hand, you’ve inherited a big ball of mud, you’ll need to invest first: build documentation, establish testability, create structure. This isn’t a weekend project.\u003c/p\u003e\n\n\u003cp\u003eMy experience shows: with a well-designed harness, the nature of control shifts. It’s no longer about reading every line. It’s about reading the right parts – the core business logic, the domain decisions. The rest you leave to the harness. What matters is targeted attention: how likely is a bug? How severe is it? How easily does the harness detect it?\u003csup\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/#fn5\"\u003e[5]\u003c/a\u003e\u003c/sup\u003e The more the harness catches automatically, the less I need to check manually.\u003c/p\u003e\n\n\u003cp\u003eA well-built harness primarily catches structural problems and ensures that the code stays readable and manageable – for humans too. Because everything needs to be testable in isolation, I can look into individual parts at any time without needing to understand the big picture. What the harness doesn’t reliably catch: whether the feature actually does what it’s supposed to. This functional verification remains with the human.\u003c/p\u003e\n\n\u003cp\u003eThe best litmus test: would you ship this if you were on call tonight?\u003csup\u003e\u003ca href=\"https://www.innoq.com/en/blog/2026/02/from-vibe-coder-to-code-owner/#fn6\"\u003e[6]\u003c/a\u003e\u003c/sup\u003e If the answer is no, the harness isn’t strong enough – no matter how green the pipeline looks.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003cp\u003eWhoever invests in their harness today can work at a speed tomorrow that wasn’t possible before – and still take responsibility for every feature they ship.\u003c/p\u003e\n\n\u003chr\u003e\n\n\u003ch2 id=\"references\"\u003eReferences\u003c/h2\u003e\n\n\u003cp\u003e\u003ca id=\"fn1\"\u003e\u003c/a\u003e[1] OpenAI, \u003ca href=\"https://openai.com/index/harness-engineering/\"\u003e“Harness engineering: leveraging Codex in an agent-first world”\u003c/a\u003e, February 11, 2026.\u003c/p\u003e\n\n\u003cp\u003e\u003ca id=\"fn2\"\u003e\u003c/a\u003e[2] Charlie Guo, \u003ca href=\"https://www.ignorance.ai/p/the-emerging-harness-engineering\"\u003e“The Emerging ‘Harness Engineering’ Playbook”\u003c/a\u003e, Artificial Ignorance, February 22, 2026.\u003c/p\u003e\n\n\u003cp\u003e\u003ca id=\"fn3\"\u003e\u003c/a\u003e[3] Simon Willison, \u003ca href=\"https://simonwillison.net/2026/Feb/22/how-i-think-about-codex/\"\u003e“How I think about Codex”\u003c/a\u003e, February 22, 2026.\u003c/p\u003e\n\n\u003cp\u003e\u003ca id=\"fn4\"\u003e\u003c/a\u003e[4] Mitchell Hashimoto, \u003ca href=\"https://mitchellh.com/writing/my-ai-adoption-journey\"\u003e“My AI Adoption Journey”\u003c/a\u003e, February 5, 2026.\u003c/p\u003e\n\n\u003cp\u003e\u003ca id=\"fn5\"\u003e\u003c/a\u003e[5] Birgitta Böckeler, \u003ca href=\"https://martinfowler.com/articles/exploring-gen-ai/to-vibe-or-not-vibe.html\"\u003e“To Vibe or Not to Vibe”\u003c/a\u003e, martinfowler.com, September 23, 2025.\u003c/p\u003e\n\n\u003cp\u003e\u003ca id=\"fn6\"\u003e\u003c/a\u003e[6] Birgitta Böckeler, \u003ca href=\"https://martinfowler.com/articles/exploring-gen-ai/i-still-care-about-the-code.html\"\u003e“I Still Care About the Code”\u003c/a\u003e, martinfowler.com, July 9, 2025.\u003c/p\u003e\n\u003c/body\u003e\u003c/html\u003e\n","image":"https://res.cloudinary.com/innoq/image/upload/c_limit,f_auto,q_auto,w_2000/v1/uploads-production/qhx4r3gsuriulcntmlzrx8ygjbmf?_a=BACMTiAE"}]}