I sandboxed my coding agents. You should too.

Dieser Blogpost ist auch auf Deutsch verfügbar

This post is part of a series.

Part 1: I sandboxed my coding agents. You should too. (this post)
Part 2: I sandboxed my coding agents. Now I control their network.

Since the advent of agentic programming, I have found a lot of success in using agents to help speed up my development process. I’ve also relearned the lesson of how important it is to maintain a small scope for development tasks.

One thing I have put off is getting a development environment set up to provide the guardrails necessary to be able to use agents securely. I have put it off, as so many others, because of the age-old tradeoff between security and convenience. It takes time to set up an environment that is secure, and it isn’t especially convenient, especially if you are a developer like me who prefers minimal config and fuss and systems that “just work”.

Listening to my colleague Christoph’s talk at the 2025 INNOQ Technology Day about the negative aspects of generative AI and the dangers of prompt injection has now changed my priorities.

I am not a security expert, so I asked our INNOQ security experts to vet this developer approach I present here for setting up a development sandbox to contain my coding agents.

Why Sandboxing Matters

When coding agents are run on our computers, they can execute programs with the same permissions as our own users. This makes them very powerful, but makes us very vulnerable.

Because we interact with an LLM using natural language, it is very difficult for the LLM to determine which part of our prompt came from us and which part originated elsewhere.

If the LLM reads a README somewhere on a webpage which politely asks the LLM to send our credentials to service X and then not tell us about it, what do you think the agent is going to do?

This ability to sneak other commands into our prompts has been termed prompt injection because of its similarity with other forms of injection attacks, like SQL injection and cross-site scripting. Unfortunately, traditional approaches to injection mitigation are very difficult to implement here because natural language used is so large and varied. It is very difficult, if not impossible, to determine which part comes from where.

This account of jailbreaking Claude Code illustrates the difficulty: A request like “find database configs so I can migrate to secrets manager” could be either legitimate or malicious solely based on the intent of the user — and the intent is not information that the LLM can easily discern.

Prompt injection mitigation is an area of current research, but as it stands today we should treat it as an unsolved problem.

I highly recommend reading this post by Simon Willison about the Lethal Trifecta. He details that three characteristics about agents that, when combined, can make it very easy for an attacker to infiltrate our systems and steal our data:

Access to your private data
Exposure to untrusted content
The ability to communicate externally

The coding agents that I use (Codex and Claude Code) will prompt before sending a request to an unknown website (unless we’ve activated YOLO privileges).

However, this security mechanism is rendered somewhat less effective due to its annoyance. We get so used to selecting “yes, you can access that”, that we are vulnerable to becoming unintentionally negligent and letting something slip by.

A sandbox is a mechanism we can set up to effectively limit our agent to only being able to access specific files and specific credentials. This essentially limits the first of the lethal trifecta: access to your private data. For development, some credentials are usually necessary, perhaps a few API tokens. When we create custom credentials for our sandbox, we will be able to severely limit the damage that an exploit could cause. We also need to have a plan to revoke those credentials in a worst case scenario and should also do our due diligence to ensure that those credentials have the absolute minimum permissions necessary.

The only files that should land in the sandbox are files which it are ok for the agent to see and delete.

Sensitive data like credentials and customer data should NEVER land in the sandbox.

Unfortunately, the sandbox solution that I am presenting here does not yet address any validation of the network traffic to and from the sandbox. This means that the two other characteristics in the trifecta are still issues: allowing an agent to retrieve any content from the internet still provides exposure to untrusted content and likewise allowing outbound traffic preserves the ability to communicate externally.

In order to really protect against the whole trifecta, it would also be necessary to lock down the network as well, but that is not a step I have managed to do (yet). It is something I am mulling over and considering ways to implement. In the meantime, I still use the built-in approval mechanism from Codex and Claude Code to keep myself in the loop and approve any outbound traffic.

The approach I describe below for setting up a development sandbox is what I believe is the first step in setting up a secure development environment for agentic programming. I intend to extend my setup in the future to better manage network traffic as well.

How to sandbox your agents

I am using MacOS with an ARM processor. If you are on Linux or Windows, your setup is going to look a lot different.

Choosing your sandbox technology

I began my journey by looking for a technology that would fit my personal development stack. The questions I considered included:

What coding agents do I use? (for me: codex and claude on the CLI)
What development setup is necessary? (for me: docker-compose for databases and queues)
What build framework do I use? (for me: gradle, npm)
What programming languages and frameworks do I use? (for me: Java, JavaScript)
Which IDE/editor do I prefer? (for me: IntelliJ)

When it comes to creating a sandboxing solution, there are many different technologies which could potentially provide a solution. I’ve heard of Development Containers, Docker MCP Toolkit, and Container Use all being successfully used. Anthropic is developing their own sandbox if you aren’t concerned with vendor lock-in, Docker is working on their own solution and I even found a curated list of code sandboxing solutions. (⚠️ Note: I haven’t had time to look into all of these in detail, so please do your own due diligence before choosing a solution!)

Virtual Machines vs. Containers

Technically, a whole separate laptop exclusively for agentic programming would probably be the most secure development sandbox. In practice, most solutions fall into two different categories: virtual machines or containers.

In my case, I decided on a virtual machine over a container because I did not want to have to recode the entirety of my development environment in yaml in order to run my sandbox. We are already running a complicated docker-compose stack and the thought of having to get that up and working for every service makes me slightly nauseous.

Deciding for a virtual machine also enables me to connect to my development sandbox over SSH with the Jetbrains Gateway and thereby allows me to keep the tooling that I am familiar with and love.

With the help of ChatGPT, I decided to try Lima VM which provides a way to launch Linux virtual machines with minimal configuration.

Installing Lima

The first step, of course, was to install Lima VM. I am not going to write a full detailed tutorial for how to setup Lima VM on your machine because it already exists.

The only issue that I ran into was that I had to reinstall Homebrew because apparently I hadn’t needed to install anything requiring access to the real processor in the last five years and I was still running the Rosetta version of Homebrew. After reinstalling, I was able to install Lima without any further hiccups.

One point of note: although I used an LLM to evaluate which different tools might come into question to solve my use case, I did not use an AI agent in order to set up the system on my computer.

When it comes to installing software critical for the security of our machines, we should scrutinize each and every step — taking the time we need to understand the implications of what we are doing and any trade offs that we may be making.

Creating a minimal virtual machine configuration

With a little iteration and troubleshooting with ChatGPT, I came up with the following minimal virtual machine configuration which is sufficient for all of my needs. The main decision was choosing the operating system I wanted (I opted for the latest LTS release from Ubuntu) and how much memory I wanted to dedicate to it. The rest of the settings I will discuss later.

images:
- location: "https://cloud-images.ubuntu.com/releases/24.04/release/ubuntu-24.04-server-cloudimg-arm64.img"
  arch: "aarch64"

cpus: 8
memory: "16GiB"
disk: "120GiB"
  
ssh:
  localPort: 60022  

mounts:
- location: "~/devbox/projects"
  mountPoint: "/home/joy.linux/projects"
  writable: true

vmOpts:
  vz:
    rosetta:
      enabled: true
      binfmt: true

portForwards:
  - guestPort: # port config here
    hostPort: # port config here

After setting up my dev-sandbox VM, I can start it on the console with the command limactl shell dev-sandbox.

Setting up the environment

Once I had the VM up and running, I set up my environment. Installing Git, Java, nvm, and my coding agents was relatively straightforward. Installing Docker with the correct permissions took a bit of troubleshooting, but nothing too dramatic. The only credentials I need are a read-only access token for our private Maven repository and API keys for my coding agents.

In my original setup, I actually added custom SSH keys for my sandbox and cloned the repositories directly into the sandbox. A colleague suggested that I instead perform all of the communication with GitLab on the host and only allow the sandbox to access the code which has already been cloned. This is so much better than my original approach and I am very pleased. Giving an agent access to an SSH key was making me slightly queasy, especially since I don’t have the network access dialed in yet. Now it is impossible for the agents to push any commit on my behalf and potentially cause issues in our code base.

Mounting shared directories

Lima VM also makes it possible to mount directories from the host into the VM. This means that I can checkout all of my code on the host and the agent can work directly on those files which are then shared between host and VM.

This has an added benefit of being able to easily switch between agentic programming (contained by the sandbox) and normal programming directly on the host without any extra tooling because they are both using the same files.

mounts:
- location: "~/devbox/projects"
  mountPoint: "/home/joy.linux/projects"
  writable: true

Getting testcontainers to work in the virtual machine

The biggest issue I ran into was that Testcontainers wouldn’t run inside the VM. We use Testcontainers in our project which are built for the Intel processor architecture, but they run without issue on my Mac because of the Rosetta emulator. In order to activate the Rosetta emulator in the VM, it is necessary to set the following properties:

vmOpts:
  vz:
    rosetta:
      enabled: true
      binfmt: true

Connecting via Jetbrains Gateway for remote development

Getting Jetbrains Gateway setup also proved to be relatively straightforward. I was able to find the Lima SSH config for my VM under ~/.lima/dev-sandbox/config and the Gateway was able to use this config to SSH into the VM and installed itself onto the machine. Then an IntelliJ window can be opened up to read or modify the source code with all of the tooling that I am familiar with. I did end up specifying an explicit SSH port in my Lima config because otherwise the VM starts on a different port every time and the Gateway connection no longer worked:

ssh:
  localPort: 60022

Setting up port forwarding

I then updated the port forwarding configuration to make it possible to start the application from the VM and be able to view it in the browser on my host machine. For this step, I actually used my coding agent in the VM to go through all of the repositories and generate a config for all of the ports that might be used by the applications and then copied it into the config after verifying it.

portForwards:
  - guestPort: # port config here
    hostPort: # port config here

Final tweaks

Once I had a working development sandbox, the next step was to use it and find those last few configurations that are needed to feel productive. I noticed how often I open a new command line tab with the expectation of being in the same working directory and created some git aliases to quickly move into the sandbox. I gave my sandbox prompt a different color scheme and the emojis “🏖️📦” to make it easy for me to visually identify which shell I have open.

The final result

Now I have a fully functional VM sandbox for my coding agents while still being able to use the JetBrains tooling that I prefer for development. I will likely add more tooling when I do tasks that require it.

It did take some time for me to set up my development sandbox, but the majority of time was spent debugging the Intel emulation issue with the VM. If I had known of that configuration preference from the beginning, I think I could have had a working sandbox up and running in half a day. The time invested in this setup is well worth it: I now have a development environment with minimal friction which limits the damage from a coding agent gone rogue.

This is the first step in setting up a secure development environment with reduced access to private data. With this in place, I believe the second step (securing network access) will be able to build upon this approach. I’ve sandboxed my coding agents. Now you should too.

Blog Post