I sandboxed my coding agents. Now I control their network.

Dieser Blogpost ist auch auf Deutsch verfügbar

TL;DR

I’ve controlled the network access for my development sandbox by

using Squid as a forwarding proxy with a custom allowlist
forcing all traffic from my VM sandbox over the proxy with proxy configuration and nftables

This post is part of a series.

Part 1: I sandboxed my coding agents. You should too.
Part 2: I sandboxed my coding agents. Now I control their network. (this post)

In a previous post, I described my solution for setting up a development sandbox for my coding agents. My solution entails using a Lima virtual machine (VM) on MacOS and limiting the user capabilities within that machine to the very minimum. Since my user within the VM has only the necessary permissions, a coding agent that runs using my user’s permissions will also be similarly limited. That post details the first necessary step for creating a sandbox that severely limits the amount of code that an agent will be able to access. This provides protection against the first of the lethal trifecta by preventing access to private data as much as absolutely possible.

However, the solution does not protect against the other two members of the lethal trifecta: exposure to untrusted content and the ability to communicate externally because the virtual machine that I am using as a sandbox is still connected to the internet. Any arbitrary HTTP request can be sent, allowing information to leak out of the sandbox. The result of that arbitrary request will be added to the system context without verification which can expose our agents to untrusted content.

Up until now, I have kept myself in the loop, using the built-in network policy tools in Codex and Claude Code to explicitly approve each connection to the internet. However, I’d really like to take myself out of the loop and define specific rules for websites that can be accessed freely without my explicit consent.

This post details my solution to the problem. I want to add the caveat that with this post I am leaning outside of my comfort zone. I am a full-stack developer, and I normally shy away from Linux configuration and networking. I’ve had to retrieve much that I had forgotten (what does the TLS handshake look like again?) and learn much that I didn’t know before. But if the advent of AI doesn’t force us to sometimes venture out from what we are comfortable doing, then what will?

I used AI for analyzing different solutions and AI helped me generate the configuration I am sharing in this post. But at no point did I let an agent loose on either the computer or the sandbox to configure the system itself. I’ve read the documentation, double and triple checked everything, and actively asked every question that I could think of. If you spot anything that could be improved upon, please reach out and let me know. As with the first post, I’ve also asked our INNOQ security experts to vet my approach as well.

Run proxy on the host to only allow access to specific domains

The first step was to set up a proxy on the host machine to allow me to monitor any outgoing traffic from my development sandbox and set up rules about which websites are allowed and which are denied.

My journey to finding a solution which worked for me was fraught with bumbling and stumbling.

I found out that I absolutely did not want to manually approve every single IP address in a firewall installed on my host, especially since I couldn’t tell which domain those IP addresses were mapping to.

At one point the AI suggested extending a Man-in-the-Middle proxy to review every request. This could be a valid approach if you know what you are doing and could be beneficial to be able to inspect and monitor the content of the requests which are being sent to and from the sandbox. This would, however, require creating a self-signed certificate and configure all of the trust stores on the client to trust that certificate, which is something that I preferred not doing.

The solution I eventually settled on is a forward proxy: a proxy which intercepts a request and forwards it on to the destination. Using a forward proxy, I was able to intercept any CONNECT method that the sandbox sends when trying to build a TLS connection to the target server, which allows me to check that request against a specific allowlist of domains. If it doesn’t match, the connection will be dropped.

In my desire to not reinvent the wheel, I settled on using the proxy Squid which markets itself primarily as a caching proxy, but can also be used as a CONNECT-only allowlist proxy. Since I’m on MacOS, I installed Squid using Homebrew. Then I added the following to my Squid configuration (for me installed in /opt/homebrew/etc/squid.conf) and restarted the service:

############################################
# Custom: CONNECT-only allowlist proxy
############################################

# dev proxy should listen on port 8888
http_port 8888

# Only allow CONNECT to standard TLS port 443
acl SSL_ports port 443
acl CONNECT method CONNECT
http_access deny CONNECT !SSL_ports

# Only allow proxy use from the sandbox network
acl vmnet src 127.0.0.1/32

# Destination domain allowlist
acl allowed_domains dstdomain "/opt/homebrew/etc/squid/allowed_domains.txt"

# Allow only: sandbox net + CONNECT + allowlisted domains
http_access allow vmnet CONNECT allowed_domains

# Block everything else
http_access deny all

The allowed_domains.txt list can be easily configured to allow only specific domains or wildcards:

example.org
.openai.com

Configure VM to use proxy on host

Having the proxy set up on the host is all well and good, but if the development sandbox doesn’t use the proxy it isn’t worth much. The next step was to get the environment set up so that the tools would actually route their traffic over the proxy on the host.

Many tools (e.g. curl or coding agents like codex and claude) understand the following configuration:

export HOST_IP="<My Host IP>"
export PROXY_PORT="8888"

export HTTPS_PROXY="http://$HOST_IP:$PROXY_PORT"
export HTTP_PROXY="$HTTPS_PROXY"
export NO_PROXY="localhost,127.0.0.1"

For Gradle, I had to additionally configure my ~/.gradle/gradle.properties to contain the following:

systemProp.http.proxyHost=<My Host IP>
systemProp.http.proxyPort=8888

systemProp.https.proxyHost=<My Host IP>
systemProp.https.proxyPort=8888

systemProp.http.nonProxyHosts=localhost|127.0.0.1
systemProp.https.nonProxyHosts=localhost|127.0.0.1

Force sandbox to only communicate over the proxy

The setup so far enables the development sandbox to communicate with the proxy over the host, but it doesn’t actually force all traffic from the sandbox over the proxy. If the coding agents play by the rules, this could provide some layer of protection but it isn’t absolute

In order to do this, I installed nftables and created the following /etc/nftables-proxy-egress.nft configuration:

table inet sandbox {
  chain output {
    type filter hook output priority 0; policy drop;

    # Allow loopback traffic
    oif "lo" accept

    # Allow established/related connections
    ct state established,related accept

    # allow DNS out (udp/tcp 53).
    # (this policy could be tightened to allow DNS only to specific IPs)
    udp dport 53 accept
    tcp dport 53 accept
    
    # Allow local Docker networks (for Testcontainers, DBs, etc.)
    ip daddr 172.17.0.0/16 accept
    ip daddr 172.18.0.0/16 accept

    # Allow traffic to the proxy
    ip daddr <My Host Ip> tcp dport 8888 accept
  }
}

I loaded them with sudo nft -f /etc/nftables-proxy.egress.nft and tested that internet access apart from my defined allowlist was not allowed (e.g. using curl with the proxy environment variables disabled). I also used sudo nft list ruleset to check that the sandbox rules were loaded.

Unfortunately, the rules did not survive a reboot. It is possible to make the rules persistent, but that caused some issues for me with my setup because Docker also adds rules to nftables which need to be active before my sandbox rules in order for my integration tests (which use testcontainers) to work properly. To get around this, I created a small service on the Linux machine which runs after docker and inserts the rules into the system. This is my /etc/systemd/system/nftables-proxy-egress.service:

[Unit]
Description=Apply nftables proxy egress rules
After=network-online.target docker.service
Wants=network-online.target
OnFailure=proxy-egress-console-alert.service

[Service]
Type=oneshot
ExecStartPre=/usr/sbin/nft -c -f /etc/nftables-proxy-egress.nft
ExecStart=/usr/sbin/nft -f /etc/nftables-proxy-egress.nft

[Install]
WantedBy=multi-user.target

The nft -c check causes the system to fail fast, and the OnFailure directive calls proxy-egress-console-alert.service upon failure which I can then use to alert me if loading the rules fails. Otherwise, I might not notice if the service fails and doesn’t load the networking rules and would be using my system without the assurance that my VM would direct all traffic over the host.

Here is my /etc/systemd/system/proxy-egress-console-alert.service which just creates a file /var/lib/nft-egress-failed when something goes wrong.

[Unit]
Description=Console alert if nftables proxy egress fails

[Service]
Type=oneshot
ExecStart=/usr/bin/touch /var/lib/nft-egress-failed

I then added a few lines to my ~/.bashrc which add a 🚨🚨 NFT-EGRESS-FAILED 🚨🚨 flashing message to my prompt to alert me that loading the egress failed:

if [ -f /var/lib/nft-egress-failed ]; then

  PS1='\[\033[5;1;31m\]🚨🚨 NFT-EGRESS-FAILED 🚨🚨\[\033[0m\]\n\[\033[1;31m\]\u@\h:\w\$ \[\033[0m\]'

fi

Once I had my service designed, I then reloaded and restarted the service:

sudo systemctl daemon-reload
sudo systemctl restart nftables-proxy-egress.service

A colleague suggested taking the whole network offline on failure, but when I tried that I managed to brick the whole VM because Lima’s shell command uses ssh over the network to communicate with the VM, so I’m sticking with the flashing prompt message for now. Here I also want to note that having to fiddle with nftables in order to route all of the traffic over a proxy on the host may well be a limitation of the virtual machine technology that I chose. If you are using a virtual machine that can be provisioned by Vagrant (e.g. Virtual Box) then there is a plugin which would allow you to define that declaratively.

As a final quick check, it is important to make sure that nft, the /etc/nftables-proxy-egress.nft configuration, and the services can only be modifiable by root! Otherwise it would be theoretically possible for the agent to modify them to open up the network with your permissions.

Monitor traffic and modify allowlist

The final step is to monitor the traffic from the virtual machine and extend the allowlist to let any requests through that we need for day to day development. The logs for Squid can be accessed on the command line (for me they are found at /opt/homebrew/var/logs/access.log)

sudo tail -f /opt/homebrew/var/logs/access.log

Any request which is dropped will be logged with a TCP_DENIED method. If you want to allow requests to that URL, you can modify your allowed_domains.txt to include the new domain and restart Squid

brew services restart squid

In my day to day practice, I’ve found that I have only rarely had to modify this list. I have 12 URLs in my allowlist, and that seems sufficient. I do have package registries for Gradle and npm in the list, which theoretically opens me up for supply chain attacks, even though the risk is very much minimized and I finally feel comfortable letting down my guardrails to see what my agents can do.

Agents for Coding, Chatbots for Search

One of the reasons that the list of URLs is so small is that by default the coding agents will only retrieve the application dependencies (from maven or npm repositories) and communicate with the LLM models directly over the APIs from the different providers (e.g. OpenAI or Anthropic). By default, the coding agents will not perform web search unless specifically requested to do so, which means that in performing the coding tasks, they are retrieving information only from the different models and not from any random sites on the internet. This reduces the risk for prompt injection.

Setting up a proxy in front of the coding agents also adds friction for me if I were to want to activate the web search features for my agents because I would have to temporarily disable or modify the proxy and keep myself in the loop to approve any web request that the agent would want to make. This is something I don’t want to do, because I often have longer running tasks in the sandbox where I want the proxy to remain active.

In practice what this means is that I exclusively rely on the models with no web search for any programming task, and for tasks where I do want to activate web search and find the most up-to-date information, I use chatbots in the browser. From the lethal trifecta perspective this is ideal. The long running tasks without direct supervision only have access to a limited subset of data and extremely limited network access. The shorter research tasks requiring web search do have access to the internet, but take place in the browser with extremely limited access to data.

Possible next steps: Putting icing on the top of the cake

For my current threat model, I am satisfied with the solution that I’ve presented in this article. There are, however, a few other steps that you could take to improve the sandbox solution even more. One of them would be to integrate a network analysis and threat detection software like Suricata into the sandbox solution in order to be alerted directly about any fishy behavior and more easily figure out what went wrong. In a similar vein, a digital forensic software like Velociraptor could be used to collect detailed information about what is actually going on within the sandbox. This would provide insights into the processes that are running and help to figure out if any malicious behavior has occurred. I also do want to tighten down the DNS configuration in the nftables configuration to prevent data exfiltration via DNS tunneling.

Loosening the reins on the programming agents

At this point, I have provided mitigations for each of the lethal trifecta. A virtual machine with as few privileges as possible provides a sandbox for my coding agents, ensuring they do not have any critical data that could leak should the agent become compromised. Forcing all of the traffic through a proxy on the host with a very strict allowlist also severely limits any exposure to untrusted content and strictly limits the ability to communicate with the outside world.

With this basis in place, I finally feel comfortable loosening the strict guardrails on my coding agents. Running codex --yolo and claude --dangerously-skip-permissions has been really exciting because it allows me to create detailed multi-step tasks for my agents to execute without having to keep myself in the loop and monitor all of the things that they are doing. This in turn gives me more time that I can use for other tasks, without suffering a mental overload from having to switch my focus too often. However, I want to reiterate here that I consider loosening these guardrails only to be acceptable because I have set up my sandbox and network policy to provide even more protection than those provided by the agent tooling itself.

My current solution is still very basic, but I believe it provides a solid foundation that will allow me to add features without sacrificing security.

Blog Post