TL;DR

  • Without proper technical skills and understanding of security risks, we should not use AI assistants
  • Even with technical know-how, current AI assistant architectures will increase the risk posed by the lethal trifecta as each new skill is installed

Dieser Blogpost ist auch auf Deutsch verfügbar

OpenClaw (previously Clawd and Moltbot), an AI personal assistant, is currently all the rage – and additionally an absolute security nightmare. In this article, I will refer to these tools as AI assistants because OpenClaw is now old enough to have multiple different vibe-coded copycats. My concerns listed here are related to all of them. In my definition, an AI assistant is an agent harness you can chat with over a messaging app of your choice and ask it to perform tasks on your behalf. The memory of your preferences and desires are remembered so that the AI assistant will improve increasingly well over time.

As far as users are concerned, they seem to fall into two camps.

Setting up an AI assistant unaware of dangers

The first camp are those who have installed it without much thought about security and just want to see what it can do. To anyone who falls into this category, my plea would be to cease immediately: granting an agent great power can very easily backfire because those specific capabilities can be used to hurt you. An agent that can draft and send an amazing email on your behalf can just as easily write an email to your boss or customers which could be personally damaging to you. An agent which can order a pack of diapers on Amazon when the price is right can learn on Moltbook that their human will be so much happier if they buy 10 boxes of diapers instead. An agent that can book a flight on your behalf can easily be persuaded to buy an all expenses paid trip for your “cousin” Brendon.

Yes, all of the cool kids are installing AI assistants. Yes, all of the lemmings are running off the cliff.

Don’t be a lemming.

When it comes to learning new technologies I apply common sense:

I try to understand the implications and risks of what a specific piece of technology can deliver. If I’m not able to understand it fully, I prefer to wait until I do before jumping in head first.

A case in point was the MCP hype: because everybody was talking about it, I took the time to learn what it was and if it could be useful to me. I did dabble with a small experiment running a playwrite MCP on my local machine, but I didn’t have the time and mental bandwidth to figure out all of the details of the implications of the protocol, so I made the decision to sit out the hype because it was clear to me that the costs of using the protocol incorrectly were too great to be ignored.

As a software developer, I don’t have time to stay abreast of all of the latest security research and exploits, but I do know how important security is and I know enough to trust the queasy feeling in my gut when even I can see how horribly, terribly wrong this all could end up.

Setting up an AI assistant with guardrails

The second camp of people using AI assistants include developers who are using their common sense and really do have the knowledge to set up their systems properly and have given some thought to basic security measures. Their devices are not accessible on the internet. They have installed their AI assistant on a separate physical device in order to provide a physical sandbox to control damage if (when) things go wrong. Some use vibe-coded solutions for an AI assistant alternative to OpenClaw which use OS sandboxing to provide more security (Note: I haven’t had time to look into these solution in detail. Do your own due diligence).

As much as I respect any developer’s desire to try out new technolgies and see what they do, I remain concerned about the current state-of-the-art and would still recommend exercising excessive caution when using these technologies.

The reason is the lethal trifecta which is still very much in play: if any agent has access to private data, ability to externally communicate, or exposure to untrusted content then they can be coerced into leaking that data or performing undesired actions on your behalf.

Providing a sandboxed environment does restrict access to private data, so that does minimize some risk, but as I noted in my article about my sandbox solution, providing a limit to the amount of data that an agent has access to is only the first step. As long as the agent has unlimited access to the internet, the risk factors of enabling external communication and exposure to untrusted content are still very much in play. This is why I’m working on a second step to my sandbox solution to lock down internet access. The post is not live because I’m still working on ironing out a few details, but you can see my WIP solution in the slides I prepared for a talk I gave recently on the topic.

A sandbox is simply not a sufficient protection for these use cases because the data we need to provide for the agent to actually do its work is often data which is actually of sensitive nature. A case in point is with agentic programming: in order for an agent to support us with our programming tasks, we need to give the agent access to the source code of our application which in itself is often sensitive data we are not allowed to share openly on the interwebs.

The AI assistant architecture gives me pause particularly because it seems to me that the dangers of the lethal trifecta will continually grow over time. When we add a new capability to our AI assistant, we will increase the amount of private data that can be leaked in the case of a breach. We will potentially connect new internet sites that are polled or messaging apps that can be compromised and increase the likelihood of exposure to untrusted content and can provide a way for data to leak out of the system.

Ideally, we want to be moving in the other direction, decreasing each of the lethal trifecta with each step that we take.

What could an architecture with less risk look like?

In order to build an AI assistant properly with considerations for safety, we would have to lock down each capability independently. The skill I would activate to perform the categorization of my family’s photos would be sandboxed to only have access to the photos and the messenger channel over which I send them to the agent. The skill I would activate to parse and categorize my receipts would have access only to the receipts and to the messenger channel over which I send them to the agent. The skill I would activate to repeatedly poll Amazon and message me when the cost for diapers has dropped would only have access to that Amazon URL specifically and a messaging channel that I would set up specifically for that purpose.

By severely limiting the access to the internet and severely reducing the data that any agent has access to at any given time, we could limit the damage that a compromised AI assistant could cause. Ideally, each skill could be analyzed based on the lethal trifecta to assess risk. One rule of thumb is to use the Agents Rule of Two to only use a skill if at most two of the trifecta are in effect, although this in itself only reduces risk, it does not eliminate it entirely. If any skill were able to do all three of the lethal trifecta (e.g. a skill to process and send e-mail) we would exercise our common sense to not install it.

Sacrifice convenience for security

This solution would add friction to every skill that we set up. We would sacrifice our convenience in order to gain security for our data.

It would be a trade-off worth making.

But this is NOT what OpenClaw or any copy-cat tool are currently offering. If you are interested in setting up an AI assistant, my recommendation is to proceed with excessive caution and analyze the risk anew for every single new skill you want to add. And if you are intimidated by the technical expertise and security knowledge that it would take to truly set up a secure system, rest easy that you are not alone. I personally will be sitting this hype cycle out. You are welcome to join me.