Rate Limiting with Spring Boot

Rate limiting is an architectural tactic for a server to limit access to an API. It helps to:

protect against server overload due to clients that call the server in a short time frame too often
increase the fairness of how clients use server resources
allow pricing schemes for different amounts of requests

Spring Boot 3.0 does not provide rate limiting out of the box. In most cases, for you as an application developer, it might not be necessary to take care of rate-limiting as this might be addressed through your infrastructure, e.g. if you use a reverse proxy (like HA-Proxy). The infrastructure might provide a global rate limit, while rate limiting at the application level provides just a local rate limit. So, if you have n instances and each has its own rate limiting then your actual limit might be n times higher. You may check this blog post if you have multiple instances of your server.

However, adding rate limit protection to your application allows fine control per HTTP endpoint under the control of developers if the infrastructure is maintained by someone else. If you are looking for a small solution without additional dependencies for a single instance application, the solution suggested by this blog post might be an option.

The idea

We define an annotation that you can add to any HTTP endpoint that should have a rate limit protection. We define an aspect for methods with that annotation that counts HTTP requests per sender IP address. If the rate limit is exceeded we throw an exception. In the exception handling we return an HTTP Status code of 429. The rate limit configuration will be possible using properties in the Spring configuration file.

Starting Point: A plain Controller

Let’s assume we have a Spring Boot Server, you have added the spring-boot-starter-aop 2.x or higher dependency and you have an endpoint similar like this one:

package com.innoq.test;

import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.RestController;
import com.innoq.test.WithRateLimitProtection;

import javax.validation.Valid;

@RestController
@RequestMapping("/api")
public class MyRestController 
    @PostMapping
    public MyResponse processRequest(@Valid @RequestBody final MyRequest request) {
        return myService.process(request);
    }
}

Let’s assume, MyResponse, and MyRequest are simple JavaBeans. Spring Boot will use the ObjectMapper of Jackson to translate the incoming JSON to an instance of MyRequest. After the call MyResponse will be translated to JSON again. You can omit the @Valid annotation if you are not using Spring Boot Bean Validation.

Detecting the rate limit

We start by writing an annotation @WithRateLimitProtection that allows to mark HTTP endpoints like the above processRequest one that should have a rate limit protection. We define the annotation like this:

package com.innoq.test;

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

@Retention(RetentionPolicy.RUNTIME)
@Target(ElementType.METHOD)
public @interface WithRateLimitProtection {
}

Now, we can add this annotation to the controller that should have a rate limit protection:

@RestController
@RequestMapping("/api")
public class MyRestController 
    @PostMapping
    @WithRateLimitProtection
    public MyResponse processRequest(@Valid @RequestBody final MyRequest request) {
        return myService.process(request);
    }
}

If the rate limit is exceeded at the endpoint, a RateLimitException should be thrown that we define like this:

package com.innoq.test.exceptions;

import org.springframework.http.HttpStatus;
import org.springframework.web.bind.annotation.ResponseStatus;

@ResponseStatus(value = HttpStatus.TOO_MANY_REQUESTS)
public class RateLimitException extends RuntimeException {

    public RateLimitException(final String message) {
        super(message);
    }

    public ApiErrorMessage toApiErrorMessage(final String path) {
        return new ApiErrorMessage(HttpStatus.TOO_MANY_REQUESTS.value(), HttpStatus.TOO_MANY_REQUESTS.name(), this.getMessage(), path);
    }
}

where ApiErrorMessage will be translated to a JSON body in the response, such that our JSON API answers with JSON also in case of error and not with the Spring default, i.e. an HTML page:

package com.innoq.test;

import com.fasterxml.jackson.annotation.JsonInclude;

import java.time.Clock;
import java.time.LocalDateTime;
import java.util.UUID;

@JsonInclude(JsonInclude.Include.NON_EMPTY)
public class ApiErrorMessage {

    private final UUID          id        = UUID.randomUUID();
    private final int           status;
    private final String        error;
    private final String        message;
    private final LocalDateTime timestamp = LocalDateTime.now(Clock.systemUTC());
    private final String        path;

    public UUID getId() {
        return id;
    }

    public int getStatus() {
        return status;
    }

    public String getError() {
        return error;
    }

    public String getMessage() {
        return message;
    }

    public LocalDateTime getTimestamp() {
        return timestamp;
    }

    public String getPath() {
        return path;
    }

}

Let’s define an aspect that implements the rate limiting using Spring AOP. The aspect is called before the marked endpoint method is called:

package com.innoq.test;

import com.innoq.test.exceptions.RateLimitException;
import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.annotation.Before;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Component;
import org.springframework.web.context.request.RequestContextHolder;
import org.springframework.web.context.request.ServletRequestAttributes;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ConcurrentHashMap;

@Aspect
@Component
public class RateLimitAspect {

    public static final String ERROR_MESSAGE = "To many request at endpoint %s from IP %s! Please try again after %d milliseconds!";
    private final ConcurrentHashMap<String, List<Long>> requestCounts = new ConcurrentHashMap<>();

    @Value("${APP_RATE_LIMIT:#{200}}")
    private int rateLimit;

    @Value("${APP_RATE_DURATIONINMS:#{60000}}")
    private long rateDuration;

    /**
     * Executed by each call of a method annotated with {@link WithRateLimitProtection} which should be an HTTP endpoint.
     * Counts calls per remote address. Calls older than {@link #rateDuration} milliseconds will be forgotten. If there have
     * been more than {@link #rateLimit} calls within {@link #rateDuration} milliseconds from a remote address, a {@link RateLimitException}
     * will be thrown.
     * @throws RateLimitException iff rate limit for a given remote address has been exceeded
     */
    @Before("@annotation(com.innoq.test.WithRateLimitProtection)")
    public void rateLimit() {
        final ServletRequestAttributes requestAttributes = (ServletRequestAttributes) RequestContextHolder.currentRequestAttributes();
        final String key = requestAttributes.getRequest().getRemoteAddr();
        final long currentTime = System.currentTimeMillis();
        requestCounts.putIfAbsent(key, new ArrayList<>());
        requestCounts.get(key).add(currentTime);
        cleanUpRequestCounts(currentTime);
        if (requestCounts.get(key).size() > rateLimit) {
            throw new RateLimitException(String.format(ERROR_MESSAGE, requestAttributes.getRequest().getRequestURI(), key, rateDuration));
        }
    }

    private void cleanUpRequestCounts(final long currentTime) {
        requestCounts.values().forEach(l -> {
            l.removeIf(t -> timeIsTooOld(currentTime, t));
        });
    }

    private boolean timeIsTooOld(final long currentTime, final long timeToCheck) {
        return currentTime - timeToCheck > rateDuration;
    }
}

The @Before annotation, aka advice, informs AspectJ to call this method before the WithRateLimitProtection annotated method is executed. We use Spring’s RequestContextHolder to fetch information on the request that called the annotated endpoint. We are especially interested in the remote address and we obtain the current time in milliseconds from the system. Per remote address, we maintain a list of current times that we store in a ConcurrentHashMap in RAM. In my use case, there was only a limited group of calling remote addresses, and time stamps that are older than rateDuration are deleted through cleanUpRequestCounts in each call so that we are not running out of memory here. If requestCounts contains for the current remote address more entries than rateLimit we throw a RateLimitException with information who has violated the rate limit. To configure the rate limit, you add in your application.properties or application.yml:

app.rate.limit=200
app.rate.durationinms=60000

So we allow up to 200 calls per minute from the same remote address. As usual with Spring Boot, you can override the values using also system environment variable, e.g. export APP_RATE_LIMIT=100. Moreover, we have hard-coded a default in the Java code if those properties have not been specified.

Handling Rate Limit Exceptions

So far, having written the code above, a RateLimitException will be thrown if there are too many requests. We need to add exception handling in Spring that translates the thrown RateLimitException to an HTTP 429 response with some information on the error in its response body. For that, we write an exception handler:

package com.innoq.test.exceptions;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.core.Ordered;
import org.springframework.core.annotation.Order;
import org.springframework.http.HttpStatus;
import org.springframework.http.ResponseEntity;
import org.springframework.web.bind.annotation.ControllerAdvice;
import org.springframework.web.bind.annotation.ExceptionHandler;

import javax.servlet.http.HttpServletRequest;

@ControllerAdvice
@Order(Ordered.HIGHEST_PRECEDENCE)
public class RateLimitExceptionHandler {
    private static final Logger LOG = LoggerFactory.getLogger(RateLimitExceptionHandler.class);
    @ExceptionHandler(RateLimitException.class)
    public ResponseEntity<ApiErrorMessage> handleInvalidFieldsInValidJson(final RateLimitException rateLimitException, final HttpServletRequest request) {
        final ApiErrorMessage apiErrorMessage = rateLimitException.toApiErrorMessage(request.getRequestURI());
        logIncomingCallException(rateLimitException, apiErrorMessage);
        return new ResponseEntity<>(apiErrorMessage, HttpStatus.TOO_MANY_REQUESTS);
    }

    private static void logIncomingCallException(final RateLimitException rateLimitException, final ApiErrorMessage apiErrorMessage) {
        LOG.error(String.format("%s: %s", apiErrorMessage.getId(), rateLimitException.getMessage()), rateLimitException);
    }
}

Conclusion

This blog post has shown how you limit requests per remote address for a Spring Boot server using just Spring AOP. If your requirements do not allow you to use the small simple solution above and you can add dependencies, then you may check out Resilience4J as an alternative. Alternatively, if you have multiple instances of your server, it might be interesting also to read this blog post, which describes a solution with bucket4j for distributed rate limiting.

Blog Post