If you're like me, you're probably using Containers to deploy and run applications. Containers are incredibly convenient to package and run your applications, consistently, with all the necessary dependencies. However, most containers I would typically build, were built off an operating system Docker base-image. Early in my container adoption days, it was Ubuntu 14.04, which would typically result in massive images, that were nothing less than 1GB each. More recently, as the Docker community started using Alpine Linux to build minimal images, I started to use Alpine too.
Alpine is essentially
busybox with package manager. Alpine was much lighter than an Ubuntu, but it was still a full-fledged Operating system image
However, one thing was common in both situations. My Docker images were wayyy larger than they needed to be. That's because I was running an entire OS image, with package managers, shells, myriad utils that were completely unnecessary for my applications. My apps were typically just a Python or NodeJS app or single binary that I need running in a container. Running an entire OS, just for this, was not only overkill from a performance perspective, but from a security perspective as well.
I think the logic behind this is pretty simple. The more code you run in your application environment, the higher the chance of a vulnerability creeping into said environment. This directly correlates with the possibility of pwnage of said environment, with an attacker being to leverage existing exploits against the programs running in the OS.
This is when I found
distroless from Google.
Distroless images allow you to package only your application and its dependencies in a container image and run the container with a really light footprint.
distroless container images come with NO package manager, shell and other programs that come with a typical OS container image, thereby not only reducing unnecessary code, but reducing attack surface with it.
Since v 17.05, Docker supports multi-stage build images. Multi-Stage builds in Docker, is an effort in composing minimal images. When building docker images, its a challenge keeping the size of the image down. Each instruction in a
Dockerfile adds a layer (and code) to the docker image. Multi-Stage builds allow you to
build the program in one container image, and copy over only the artifacts required from the first container image to a target image, which is what is used to run your program. Here's an example of multi-stage builds from the docker website
FROM golang:1.7.3 WORKDIR /go/src/github.com/alexellis/href-counter/ RUN go get -d -v golang.org/x/net/html COPY app.go . RUN CGO_ENABLED=0 GOOS=linux go build -a -installsuffix cgo -o app . FROM alpine:latest RUN apk --no-cache add ca-certificates WORKDIR /root/ COPY --from=0 /go/src/github.com/alexellis/href-counter/app . CMD ["./app"]
In the above example, you'll observe that there are 2
FROM statements. In the first
build image, we are downloading dependencies to run a golang program and in the second
target image, we are using the artifacts from the first
build image, with the
--from=0 instruction, to copy the artifacts and run them in the target image.
Distroless containers leverage the same feature, where I can build my Python package in a
build container image, which doesn't need to be hardened or minimal and copy all the "built" artifacts to a target container that runs the minimal
distroless container with the application and its runtime dependencies.
In this example, I am going to demonstrate
distroless with a Python 2.7, Flask Web Application.
In the first example, I will run my Python App on an
ubuntu:14.04 docker image
In the subsequent example, I will run the same Python app on a distroless
python2.7 container image.
FROM ubuntu:14.04 ADD . /app WORKDIR /app RUN apt-get update && apt-get install -y python-pip RUN pip install -r requirements.txt CMD ["python", "app.py"]
This is a pretty simple
Dockerfile. I am using the base image
ubuntu 14.04, copying files to a workdir
/app, installing my dependencies and running my python program on the ubuntu 14.04 container
FROM python:2.7-slim AS build-env ADD . /app WORKDIR /app RUN pip install --upgrade pip RUN pip install -r ./requirements.txt FROM gcr.io/distroless/python2.7 COPY --from=build-env /app /app COPY --from=build-env /usr/local/lib/python2.7/site-packages /usr/local/lib/python2.7/site-packages WORKDIR /app ENV PYTHONPATH=/usr/local/lib/python2.7/site-packages CMD ["app.py"]
distroless image is a stark contrast from the Ubuntu example, where I am building my program and dependencies in the first
build container image, which I am calling
build-env and I am referencing the artifacts of
build-env in a target, distroless container, which has been build based on the
gcr.io/distroless/python2.7 image, which is meant to only run python 2.7 applications and its related dependencies. I am using Docker 18.09, where I am leveraging multi-stage builds for this.
After I built these images, I pushed them to the Docker Hub and used Clair (Container Vulnerability Scanning Tool) to scan them for vulnerabilities. In addition, I also verified results with a commercial container security scanning service, Anchore. The results were quite revealing.
When I scanned the ubuntu 14.04 built Flask App, I found a bunch of flaws right out the gate. Some medium and some low, but all of the flaws pertaining to packages and apps that had NOTHING to do with my python application.
Running the same scan on the Distroless Container image gave me 0 findings both with Clair and Anchore. This is reasonably clear evidence of the fact that the
distroless container didnt have any unnecessary packages that could lead to more vulnerabilities (thus exploits) being identified.
In addition, I saw that my python app in the distroless container image was 22MB and with the Ubuntu 14.04 image, was a whopping 273 MB, which is a pretty significant amount of bloat, just because I added an OS image to the mix.