How to Make Docker Images Smaller: App Code Separation

By richardtylee

Application code usually takes up hundreds of megabytes of space with code files, images, etc. For example, in one of our images, the application code is about 400 MB. To separate this code from the image, we first create a baseline image that we want to use our Rails application. This includes all the libraries and applications we typical want for our rails images like mysql-client, vim, htop, etc.  We'll call it docker_rails and here is the Dockerfile .

Next, we build an image for the web application, railsapp. To differentiate this from the current railsapp repo, we are calling it docker_railsapp .

Let's look at the Dockerfile and go through how this works:

FROM richardtylee/docker_rails
 
ADD start_railsapp.sh start_railsapp.sh
 
EXPOSE 3000
 
ENV HOME / 
ENV ROLE app
 
CMD ./start_railsapp.sh

Line 1: Use the richardtylee/docker_rails as our base image

Line 10: Runs a script to clone and start railsapp.

Now let's take a look at the start_railsapp.sh script:

#!/bin/bash

# Shallow clone railsapp repo 
git clone --depth 1 http://github.com/richardtylee/railsapp.git /app
 
cd /app
 
# TODO: Figure out bundle cache
bundle install
 
# Start rails server 
rails s

Line 4: Shallow clone of railsapp.  Shallow meaning it only gets the latest code, not the whole git history.  This saves space and time.

Line 9: Installs the gems.  We can save time saving the gems in a cache, but that's outside the scope of this proof-of-concept.

Now, we build the image.

REPOSITORY                     TAG                 VIRTUAL SIZE
richardtylee/docker_railsapp   latest              919.3 MB
richardtylee/docker_rails      latest              919.3 MB

As we can see, docker_railsapp is the same size as docker_rails and no size is added from railsapp.

There ares some pros and cons to this app code separation approach.

Pros

  • Smaller image sizes.
  • Separation of concerns.  Removes redundancy of code being in both Docker Hub image and Github repo.  Dockerfile is separated from app code.
  • On an EC2 deploy, docker pull step will be shorter because there is less to download. This will probably be minimal.

Cons

  • Add dependency on Github. Github is generally reliable, so this may be minor.
  • Deploying on an EC2 instance will take longer, as git clone is run after docker pull.
  • Rollbacks will also take longer as git clone is called.

The above example uses a public repo.  You want to use a private repo, SSH access needs to be set up in the Dockerfile.  Here's a code snippet:

ADD railsapp_id_rsa /root/.ssh/id_rsa
 
# Create known_hosts
RUN touch /root/.ssh/known_hosts && \
    ssh-keyscan github.com >> /root/.ssh/known_hosts

This was set up using these instructions. It copies the ssh deploy key for railsapp and allows the container to clone the railsapp repo.

Props to Sid Patel pairing with me on this concept and Tung Nguyen for the code review and mentorship.