Docker Swarm, persistent connections and Spring Boot graceful shutdown


The problem

Spring Boot containers still receiving requests after a shutdown is triggered in Docker Swarm, leading to errors.

Background

We have a production environment with Docker Swarm, based mainly on Spring Boot microservices. The containers communicate to each other using HTTP (with RestTemplate) through the Docker Swarm overlay network.

The ingress internal load balancing (VIP) is used by calling the services by service name. To ensure 100% availability, a rolling update of the services is used, using 2 instances of each service-container.

Analyses and Experiment

The issue is easily reproduced. You just need 2 services. A ‘server’ service with just 1 endpoint:

and a ‘client’ service which calls this endpoint through RestTemplate:

The application context shutdown needs to be delayed a bit to emulate a practical situation. I use a simple dummy component for that:

Now, this is built to Java jars and Docker images. We also use a health check in the Docker file with Spring health to ensure the component is only called after it is fully ‘up’:

This is then deployed to a Docker Swarm with docker stack deploy stack  using the following compose file:

I then use JMeter to get continues load on the client ‘hello’ endpoint on port 8080, while updating the server service:

It doesn’t take long to get errors in JMeter response and if I look closely at the (debug) logging I find the error in a ‘server’ service component:

Failed to complete request: org.springframework.beans.factory.BeanCreationNotAllowedException: Error creating bean with name ‘controller’: Singleton bean creation not allowed while singletons of this factory are in destruction (Do not request a bean from a BeanFactory in a destroy method implementation!)

Exactly the error we see in our production environment.

When I look at the time of this error I discover it is after the start of shutdown of the component. Is Docker still sending requests to the component after the SIGTERM  command has been sent? Or is a connection still open and being reused?

The latter seems to be the case… To proof it, we redeploy the client component with connection keep alive turned off. This can be done using the jvm system property:

System.setProperty("http.keepAlive", "false"); as Resttemplate is using Java’s HttpUrlConnection.

It works! The error is gone after this modification.

Of course, there is a penalty for not using persistent connections. The client has to create a connection for every call and the handshake is done every time, leading to more latency. Also, more connections may be used in total.

However, I did some measurements with JMeter in the Swarm and I saw no substantial performance loss.

Now you can argue that this is a bug in Spring Boot. It would be better if Spring Boot would close all (unused) connections as the first step in its shutdown process.

Indeed there is an issue registered in Spring for this: https://github.com/spring-projects/spring-boot/issues/4657. Where a fix is proposed by ‘wilkinsona’. Implementing his suggestion in above setup also fixes the problem.

But it has to be done manually in every Spring Boot component in the Swarm.

Conclusion

To summarize, the ‘keepalive’ (or persistent) connections are conflicting with the use of the rolling updates when using Spring Boot containers and calling them with RestTemplate. The existing connections to a component are still used while in (graceful) shutdown.

The two possible solutions for this are:

• Disabling the persistent connections as shown above or using -Dhttp.keepAlive=false. It can also be set per endpoint using the Connection header. Find a good description here.

• Implementing the fix described in https://github.com/spring-projects/spring-boot/issues/4657.

Solved.

Een reactie plaatsen

Het e-mailadres wordt niet gepubliceerd. Vereiste velden zijn gemarkeerd met *