angle-uparrow-clockwisearrow-counterclockwisearrow-down-uparrow-leftatcalendarcard-listchatcheckenvelopefolderhouseinfo-circlepencilpeoplepersonperson-fillperson-plusphoneplusquestion-circlesearchtagtrashx

Reducing the size of a Python application Docker image using Python wheels

With Docker multi-stage builds and Python wheels we compile the dependencies in the first stage and install them in the second stage

10 March 2019 Updated 30 August 2019
post main image

When using docker we want the size of the docker image to be minimal. Why? Many reasons. Memory footprint is one especially when running many Python Flask websites on an ISPConfig3 server.  Fortunately, from docker 17.05 we can use multistage builds. Using this we reduce our image size from 376MB to 211MB!

Below are two images we can use:

> docker images

python 3.6-alpine 1837080c5e87 5 weeks ago 74.4MB
python 3.6.7 1ec4d11819ad 2 months ago 918MB

74 MB vs 918 MB is a huge difference. Of course the small size comes at a cost. Many programs/scripts have been removed from the alpine image so sometimes we may run into problems. 

Using Alpine when debugging we can add programs like telnet and netstat in a very easy way:

> apk add busybox-extras

For this blog Flask app the requirements.txt is:

alembic==1.0.0
asn1crypto==0.24.0
Babel==2.6.0
beautifulsoup4==4.6.3
cffi==1.11.5
Click==7.0
cryptography==2.3.1
Flask==1.0.2
Flask-Babel==0.11.2
Flask-Login==0.4.1
Flask-Session==0.3.1
Flask-WTF==0.14.2
gunicorn==19.9.0
html2text==2018.1.9
idna==2.7
itsdangerous==0.24
Jinja2==2.10
Mako==1.0.7
MarkupSafe==1.0
Pillow==5.3.0
pycparser==2.19
PyMySQL==0.9.2
python-dateutil==2.7.3
python-editor==1.0.3
python-magic==0.4.15
python-slugify==1.2.6
pytz==2018.5
six==1.11.0
SQLAlchemy==1.2.12
Unidecode==1.0.22
Werkzeug==0.14.1
WTForms==2.2.1

When using python:3.6-alpine we run into errors when building our image. In this case, for cffi and pillow:

  • 'No working compiler found' error.
  • 'The headers or library files could not be found for jpeg, a required dependency when compiling Pillow from source.'

Solutions are here and involves adding more programs/code to the image:

  • CFFI dependencies bloats Docker image #458
    https://github.com/gliderlabs/docker-alpine/issues/458
    Add to Dockerfile: RUN apk add --no-cache curl python3 pkgconfig python3-dev openssl-dev libffi-dev musl-dev make gcc
  • How to install pillow, psycopg, pylibmc packages in python:alpine image
    https://blog.sneawo.com/blog/2017/09/07/how-to-install-pillow-psycopg-pylibmc-packages-in-pythonalpine-image/
    Add to Dockerfile: RUN apk add --no-cache jpeg-dev zlib-dev

Unfortunately our image has now grown to 376MB!

Rewrite Dockerfile for multi-stage

From docker 17.05 we can use multi-stage builds. The idea is that we build some utilities and then copy the result to final container image.

Rewriting the Dockerfile from:

FROM python:3.6-alpine
MAINTAINER Peter Mooring peterpm@xs4all.nl peter@petermooring.com

# create and set working directory
RUN mkdir -p /home/flask/app/web
WORKDIR /home/flask/app/web

# install package dependencies
COPY requirements.txt ./
# Solve 'No working compiler found' error, 
# see: https://github.com/gliderlabs/docker-alpine/issues/458
RUN apk add --no-cache curl python3 pkgconfig python3-dev openssl-dev libffi-dev musl-dev make gcc \
# Solve 'The headers or library files could not be found for jpeg, a required dependency when compiling Pillow from source.', 
# see https://blog.sneawo.com/blog/2017/09/07/how-to-install-pillow-psycopg-pylibmc-packages-in-pythonalpine-image/
    jpeg-dev zlib-dev \
    libmagic \
  && pip install --no-cache-dir -r ./requirements.txt \
  && rm -rf /var/cache/apk/*

# copy app code into container
COPY . ./

# create group and user used in this container
RUN addgroup flaskgroup && adduser -D flaskuser -G flaskgroup
RUN chown -R flaskuser:flaskgroup /home/flask

USER flaskuser

to:

FROM python:3.6-alpine as base
MAINTAINER Peter Mooring peterpm@xs4all.nl peter@petermooring.com

RUN mkdir /svc
WORKDIR /svc
COPY requirements.txt .

# install package dependencies
# COPY requirements.txt /requirements.txt, requirements.txt already copied 
# Solve 'No working compiler found' error, 
# see: https://github.com/gliderlabs/docker-alpine/issues/458
# Solve 'The headers or library files could not be found for jpeg, a required dependency when compiling Pillow from source.', 
# see https://blog.sneawo.com/blog/2017/09/07/how-to-install-pillow-psycopg-pylibmc-packages-in-pythonalpine-image/


RUN rm -rf /var/cache/apk/* && \
    rm -rf /tmp/*

RUN apk update

# Instead, I run python setup.py bdist_wheel first, then run pip wheel -r requirements.txt for pypi packages.

RUN apk add --update \
    curl \
    python3 \ 
    pkgconfig \ 
    python3-dev \
    openssl-dev \ 
    libffi-dev \ 
    musl-dev \
    make \ 
    gcc \
    jpeg-dev zlib-dev \
    libmagic \
    && rm -rf /var/cache/apk/* \
    && pip wheel -r requirements.txt --wheel-dir=/svc/wheels

# the wheels are now here: /svc/wheels

FROM python:3.6-alpine

RUN apk add --no-cache \
    jpeg-dev zlib-dev \
    libmagic

COPY --from=base /svc /svc

WORKDIR /svc
RUN pip install --no-index --find-links=/svc/wheels -r requirements.txt

# after installation, remove wheels, does not free up space, probably because we are in new layer, too bad is some 20MB
#RUN rm -R *

# create and set working directory
RUN mkdir -p /home/flask/app/web
WORKDIR /home/flask/app/web

# copy app code into container
COPY . ./

# create group and user used in this container
RUN addgroup flaskgroup && adduser -D flaskuser -G flaskgroup && chown -R flaskuser:flaskgroup /home/flask

USER flaskuser

The Python wheels directory /svc/wheels:

-rw-r--r-- 1 root root 8098645 Feb 15 12:56 Babel-2.6.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 81299 Feb 15 12:56 Click-7.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 91364 Feb 15 12:56 Flask-1.0.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 9267 Feb 15 12:56 Flask_Babel-0.11.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 4936158 Feb 15 12:56 Flask_CKEditor-0.4.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 15935 Feb 15 12:57 Flask_Login-0.4.1-py2.py3-none-any.whl
-rw-r--r-- 1 root root 7535 Feb 15 12:56 Flask_Session-0.3.1-py2.py3-none-any.whl
-rw-r--r-- 1 root root 14903 Feb 15 12:56 Flask_WTF-0.14.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 126381 Feb 15 12:56 Jinja2-2.10-py2.py3-none-any.whl
-rw-r--r-- 1 root root 76583 Feb 15 12:57 Mako-1.0.7-py3-none-any.whl
-rw-r--r-- 1 root root 29273 Feb 15 12:57 MarkupSafe-1.0-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 1101554 Feb 15 12:57 Pillow-5.3.0-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 47758 Feb 15 12:56 PyMySQL-0.9.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 1144841 Feb 15 12:57 SQLAlchemy-1.2.12-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 235421 Feb 15 12:56 Unidecode-1.0.22-py2.py3-none-any.whl
-rw-r--r-- 1 root root 166353 Feb 15 12:56 WTForms-2.2.1-py2.py3-none-any.whl
-rw-r--r-- 1 root root 322863 Feb 15 12:56 Werkzeug-0.14.1-py2.py3-none-any.whl
-rw-r--r-- 1 root root 158276 Feb 15 12:56 alembic-1.0.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 101571 Feb 15 12:56 asn1crypto-0.24.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 90375 Feb 15 12:56 beautifulsoup4-4.6.3-py3-none-any.whl
-rw-r--r-- 1 root root 385610 Feb 15 12:56 cffi-1.11.5-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 813672 Feb 15 12:57 cryptography-2.3.1-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 112930 Feb 15 12:56 gunicorn-19.9.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 21118 Feb 15 12:56 html2text-2018.1.9-py3-none-any.whl
-rw-r--r-- 1 root root 58213 Feb 15 12:56 idna-2.7-py2.py3-none-any.whl
-rw-r--r-- 1 root root 10622 Feb 15 12:57 itsdangerous-0.24-py3-none-any.whl
-rw-r--r-- 1 root root 111031 Feb 15 12:57 pycparser-2.19-py2.py3-none-any.whl
-rw-r--r-- 1 root root 211414 Feb 15 12:56 python_dateutil-2.7.3-py2.py3-none-any.whl
-rw-r--r-- 1 root root 6686 Feb 15 12:57 python_editor-1.0.3-py3-none-any.whl
-rw-r--r-- 1 root root 5543 Feb 15 12:56 python_magic-0.4.15-py2.py3-none-any.whl
-rw-r--r-- 1 root root 4595 Feb 15 12:57 python_slugify-1.2.6-py2.py3-none-any.whl
-rw-r--r-- 1 root root 510974 Feb 15 12:56 pytz-2018.5-py2.py3-none-any.whl
-rw-r--r-- 1 root root 10702 Feb 15 12:56 six-1.11.0-py2.py3-none-any.whl

Summary

Before: 376M, after: 211MB.

We did this using multi-stage:

Stage1:

  • Build Python wheels for requirements.txt

Stage2:

  • Copy Python wheels from stage1
  • Install dependencies using Python wheels
  • Do other things like creating user, copying code

Unfortunately we cannot run using the directory /svc/wheels from stage1 or remove this directory from the result. That would save another 25M! 

Links / credits

Building Minimal Docker Containers for Python Applications
https://blog.realkinetic.com/building-minimal-docker-containers-for-python-applications-37d0272c52f3

How do I reduce a python (docker) image size using a multi-stage build? (**python specific**)
https://stackoverflow.com/questions/48543834/how-do-i-reduce-a-python-docker-image-size-using-a-multi-stage-build-pytho

Leveraging Docker multi-stage builds in Python development
https://www.merixstudio.com/blog/docker-multi-stage-builds-python-development/

Lighter Python images using multi-stage Dockerfile
https://lekum.org/post/multistage-dockerfile/

Smaller Python Docker Containers with Multi-Stage Builds and Python Wheels
https://softwarejourneyman.com/docker-python-install-wheels.html

Use multi-stage builds
https://docs.docker.com/develop/develop-images/multistage-build/

Leave a comment

Comment anonymously or log in to comment.

Comments (1)

Leave a reply

Reply anonymously or log in to reply.

avatar

You can use
RUN pip install --no-cache /wheels/* \
&& rm -rf /wheels/*
to delete wheels