Dieser Blogpost ist auch auf Deutsch verfügbar
TL;DR
- Web apps make better interfaces: Instead of building custom MCP servers, AI agents can operate existing web apps through a generic browser MCP—context, validation, and access control are already there.
- Context over static function lists: Web applications dynamically show which actions are currently possible—MCP servers only provide a fixed list of functions.
- Built-in error handling: Validation and error feedback are already built into web applications—agents make mistakes, get feedback, and adapt.
- Self-improving through skill documentation: After each run, the agent writes its own instructions, becoming more efficient each time.
- Do a risk analysis before going fully autonomous: Before autonomous operation, you need to weigh the trade-offs: What errors are possible, how likely are they, and how severe? Running processes in parallel helps build trust.
There are always tasks we’d love to automate—but don’t. Often we don’t have the time, it’s not important enough, or we do it too infrequently. Or the tasks are context-dependent and can only be standardized with significant effort. So far, humans have handled them—even when they’re mind-numbingly boring.
For me, one such task is regularly submitting travel expenses. In theory, we already have a decent tool that takes a lot of work off my plate. But there are many factors—travel times, locations, and different invoice formats—that mean I still have to create per-diem entries, assign cost centers, and attach receipts myself.
Manual work like this adds unnecessary process cost. For me, it means context switching to a non-value-adding activity. And it causes waiting time elsewhere in the company. Waiting time is both the biggest problem—and the biggest lever—in process optimization. And since this is neither something that generates revenue nor something I enjoy, I keep postponing my expense report until the last minute—until accounting sends me a reminder. None of that would be necessary if someone else could just do it for me …
Luckily, an AI agent can do it for me today. That requires combining information from different data sources and entering it into the application. Appointments and travel times come from my calendar, invoices from my emails, and the expense report is created in a web application. There are different ways to access this data. MCP servers are the obvious approach—but they come with real drawbacks. They always load their entire instruction set into context, and you have to carefully review and choose the server implementations. At their core, MCP servers are just RPC—the caller has to know in advance which functions exist. A web app, on the other hand, shows you exactly what you can do in the current context. That’s why I chose different approaches for two of the integrations.
- Calendar: Instead of an MCP server, I use a shared calendar that can be fetched as an iCal feed over HTTP—a standardized format that works everywhere without additional implementation.
- Mail: Access is handled via a command-line client (Himalaya), which provides a robust implementation and API. This is close to the idea of MCP, but works without the extra context overhead.
- Travel-expense web application: There is no travel-expense MCP server here. Instead, we use the browser via an MCP for Playwright.
But can an AI agent work well with websites? When we think of browser automation, we probably think of tools like Selenium and brittle automated UI tests. That’s the wrong mental model. In fact, AI agents are very good at operating browsers interactively. Because they adapt, they can handle errors—and recover.
Using web applications as interfaces for AI agents has clear advantages over an MCP server:
- The web app already exists—no need to build anything new.
- Web apps are the most mature form of hypermedia we have: via HTML and JavaScript we get context. Content, navigation structures, and available commands are already defined. They even change depending on context, instead of offering a static set of functions like MCP.
- Web apps are built for humans and humans make mistakes We already validate inputs in web applications and check whether actions are allowed or would lead to inconsistent states. It’s normal for people to make at least careless mistakes. That’s a big advantage for AI agents: they can make incorrect inputs, get feedback, and adjust their behavior to solve the problem.
- Web apps already come with roles and permissions. An agent can be given the minimum required privileges—without having to implement new security infrastructure.
So AI agents operate within a concrete set of possibilities, rather than solving a problem from scratch in a vacuum. There is a fixed context the agent can move within.
There’s no split between the interface for humans and machines—it’s the same interface with the same capabilities, security guarantees, and problems.
With JSON APIs, you typically get raw data, but no semantics.
So far, hypermedia[1] has really only become established for “human” browser users. AI agents are the first non-human users that can actually take full advantage of these interfaces.
To make this work, we only need an MCP server for browser usage. Like a web browser itself, it has no domain-specific functionality. It enables access to additional domain “tools” in the form of web applications. As a result, tool selection and tool specification become contextual and dynamic.
That fits well with the fact that we also “program” AI agents dynamically through prompts. We give an instruction in natural language, and the language model is trained to develop a problem-solving strategy from it. Of course, the idea of declarative programming isn’t new. The most successful example is SQL, where I describe what I want without prescribing the solution path. But that’s a much narrower use case in which this approach has been feasible so far.
Practical example
An example of filing an expense report with an AI agent. We start with a naive prompt.
For my “Event Storming workshop” at Capitol Versicherung in January, I need to file a travel expense report. We use https://reisekosten.example for that. You can find the travel details in my calendar. The hotel and train invoices are in my emails. Please create a folder for the trip. Use Playwright MCP to access the website. For each travel day, create a per-diem entry. Ask me if anything is unclear.
This prompt is far from being perfect instructions. Still, the agent had hardly any follow-up questions.
A quick confirmation:
- On 2026–01–09, were there really no meals provided?
- Was breakfast provided by the hotel on 2026–01–10, 2026–01–11, and 2026–01–12?
- I couldn’t find the project “Capitol Versicherung Event Storming workshop”, so I only set the reason. Should I leave it like that, or is there a different project name you use that I should select?
The agent missed several contextual factors: that with an arrival time of 10 pm I probably don’t want to eat anything, that the breakfast information is already in the hotel invoice, and that it didn’t try to use the customer name to find an existing project.
After I provided the missing information, my expense report was actually already done. And that was with very little contextual detail. Before submitting, I review the folder myself in the web application.
Now I want the next expense report to run without follow-up questions if possible. So I ask the agent to write its own instructions. At first, the instructions consist of a process description and heuristics you could also give a human to complete this task. In addition, the agent made notes about technical issues when operating the website:
UI workarounds (Playwright MCP)
When date picker blocks checkboxes:
- Close the date picker first.
If the UI click is still blocked:
- Set the state in a controlled way via
evaluate(checked) and trigger achangeevent.If the project dropdown returns no results:
- Clear the field, type slowly, wait for the listbox.
We can use these instructions as a skill and now have the agent review and update them after each run. This way, we can also document technical aspects that would otherwise be invisible to the user.
That gives us a self-improving system that learns with each task, becomes more efficient, and makes fewer mistakes.
Even with a final review, the agent already removes a lot of tedious busywork. The agent is not yet fully autonomous: it is triggered manually, observed, and the result is checked. To truly eliminate waiting time, the agent would need to detect trips on its own and submit them as well.
To make such a decision, we need to know not only whether the agent makes errors, but also the type and frequency of those errors. Fortunately, the process was designed for humans, who also make mistakes. This is where classic risk analysis comes in: What types of errors are possible, how likely are they, and what would the consequences be? In the end, you can decide whether the benefits from optimizing the process outweigh the risks overall.
If we want to implement this for a higher-volume process—for example, all travel expense reports for all employees—we’ll also see more variance. One strategy is to run both processes in parallel and initially compare only the agent’s results with the manual expense reports. Only when the numbers match can you gradually switch over. Once this approach is established, it can also be used for auditing by automating only a portion of the tasks, or by having samples processed by humans and AI agents.
There are many processes like this that could benefit from such automation. We probably haven’t paid attention to them so far because they weren’t relevant to software development. But now systems can be integrated automatically via web interfaces. We build on an existing system that provides context and security guarantees. The challenge is to identify the workflows that can be supported by agents and have the biggest impact on business operations.
-
REST Level 3 (HATEOAS) ↩︎