Preventing PID 1 Zombie Reaping Problem in Docker

By richardtylee

Our team had been seeing instability in many of our Docker environments on ElasticBeanstalk.  This usually meant we had to rebuild our environments to get it working again.  While researching possible causes, we came across a post about the PID 1 Zombie Reaping problem.  I won't be going into detail on why this is a problem as the post covers it pretty thoroughly.  Here was our problem: on deploys, zombie processes gets left behind when we kill a container's process to start a new one.

To resolve this issue, we must understand the difference between eval and exec.  In eval, the process spawns a child process; in exec, you stay in the same process, which is what we want.

The CMD instruction in the Dockerfile uses commands based on two different forms:

CMD command param1 param2

does eval

CMD ["command","param1","param2"]

does exec

Reference to the CMD instruction can be found here.

An important note is that this is not a complete solution to our problem.  If command was a start script, we need to also exec the final command in that script.  Otherwise, you will get stuck in the script.

For example, if the start script ended with:

unicorn -c docker/config/unicorn.rb

it would need to be changed to:

exec unicorn -c docker/config/unicorn.rb

Thanks to Tung, Benson and Eddie for explaining this to me.