Reducing the size of a Python application Docker image using Python wheels
With Docker multi-stage builds and Python wheels we compile the dependencies in the first stage and install them in the second stage
When using docker we want the size of the docker image to be minimal. Why? Many reasons. Memory footprint is one especially when running many Python Flask websites on an ISPConfig3 server. Fortunately, from docker 17.05 we can use multistage builds. Using this we reduce our image size from 376MB to 211MB!
Below are two images we can use:
> docker images
python 3.6-alpine 1837080c5e87 5 weeks ago 74.4MB
python 3.6.7 1ec4d11819ad 2 months ago 918MB
74 MB vs 918 MB is a huge difference. Of course the small size comes at a cost. Many programs/scripts have been removed from the alpine image so sometimes we may run into problems.
Using Alpine when debugging we can add programs like telnet and netstat in a very easy way:
> apk add busybox-extras
For this blog Flask app the requirements.txt is:
alembic==1.0.0
asn1crypto==0.24.0
Babel==2.6.0
beautifulsoup4==4.6.3
cffi==1.11.5
Click==7.0
cryptography==2.3.1
Flask==1.0.2
Flask-Babel==0.11.2
Flask-Login==0.4.1
Flask-Session==0.3.1
Flask-WTF==0.14.2
gunicorn==19.9.0
html2text==2018.1.9
idna==2.7
itsdangerous==0.24
Jinja2==2.10
Mako==1.0.7
MarkupSafe==1.0
Pillow==5.3.0
pycparser==2.19
PyMySQL==0.9.2
python-dateutil==2.7.3
python-editor==1.0.3
python-magic==0.4.15
python-slugify==1.2.6
pytz==2018.5
six==1.11.0
SQLAlchemy==1.2.12
Unidecode==1.0.22
Werkzeug==0.14.1
WTForms==2.2.1
When using python:3.6-alpine we run into errors when building our image. In this case, for cffi and pillow:
- 'No working compiler found' error.
- 'The headers or library files could not be found for jpeg, a required dependency when compiling Pillow from source.'
Solutions are here and involves adding more programs/code to the image:
- CFFI dependencies bloats Docker image #458
https://github.com/gliderlabs/docker-alpine/issues/458
Add to Dockerfile: RUN apk add --no-cache curl python3 pkgconfig python3-dev openssl-dev libffi-dev musl-dev make gcc - How to install pillow, psycopg, pylibmc packages in python:alpine image
https://blog.sneawo.com/blog/2017/09/07/how-to-install-pillow-psycopg-pylibmc-packages-in-pythonalpine-image/
Add to Dockerfile: RUN apk add --no-cache jpeg-dev zlib-dev
Unfortunately our image has now grown to 376MB!
Rewrite Dockerfile for multi-stage
From docker 17.05 we can use multi-stage builds. The idea is that we build some utilities and then copy the result to final container image.
Rewriting the Dockerfile from:
FROM python:3.6-alpine
MAINTAINER Peter Mooring peterpm@xs4all.nl peter@petermooring.com
# create and set working directory
RUN mkdir -p /home/flask/app/web
WORKDIR /home/flask/app/web
# install package dependencies
COPY requirements.txt ./
# Solve 'No working compiler found' error,
# see: https://github.com/gliderlabs/docker-alpine/issues/458
RUN apk add --no-cache curl python3 pkgconfig python3-dev openssl-dev libffi-dev musl-dev make gcc \
# Solve 'The headers or library files could not be found for jpeg, a required dependency when compiling Pillow from source.',
# see https://blog.sneawo.com/blog/2017/09/07/how-to-install-pillow-psycopg-pylibmc-packages-in-pythonalpine-image/
jpeg-dev zlib-dev \
libmagic \
&& pip install --no-cache-dir -r ./requirements.txt \
&& rm -rf /var/cache/apk/*
# copy app code into container
COPY . ./
# create group and user used in this container
RUN addgroup flaskgroup && adduser -D flaskuser -G flaskgroup
RUN chown -R flaskuser:flaskgroup /home/flask
USER flaskuser
to:
FROM python:3.6-alpine as base
MAINTAINER Peter Mooring peterpm@xs4all.nl peter@petermooring.com
RUN mkdir /svc
WORKDIR /svc
COPY requirements.txt .
# install package dependencies
# COPY requirements.txt /requirements.txt, requirements.txt already copied
# Solve 'No working compiler found' error,
# see: https://github.com/gliderlabs/docker-alpine/issues/458
# Solve 'The headers or library files could not be found for jpeg, a required dependency when compiling Pillow from source.',
# see https://blog.sneawo.com/blog/2017/09/07/how-to-install-pillow-psycopg-pylibmc-packages-in-pythonalpine-image/
RUN rm -rf /var/cache/apk/* && \
rm -rf /tmp/*
RUN apk update
# Instead, I run python setup.py bdist_wheel first, then run pip wheel -r requirements.txt for pypi packages.
RUN apk add --update \
curl \
python3 \
pkgconfig \
python3-dev \
openssl-dev \
libffi-dev \
musl-dev \
make \
gcc \
jpeg-dev zlib-dev \
libmagic \
&& rm -rf /var/cache/apk/* \
&& pip wheel -r requirements.txt --wheel-dir=/svc/wheels
# the wheels are now here: /svc/wheels
FROM python:3.6-alpine
RUN apk add --no-cache \
jpeg-dev zlib-dev \
libmagic
COPY --from=base /svc /svc
WORKDIR /svc
RUN pip install --no-index --find-links=/svc/wheels -r requirements.txt
# after installation, remove wheels, does not free up space, probably because we are in new layer, too bad is some 20MB
#RUN rm -R *
# create and set working directory
RUN mkdir -p /home/flask/app/web
WORKDIR /home/flask/app/web
# copy app code into container
COPY . ./
# create group and user used in this container
RUN addgroup flaskgroup && adduser -D flaskuser -G flaskgroup && chown -R flaskuser:flaskgroup /home/flask
USER flaskuser
The Python wheels directory /svc/wheels:
-rw-r--r-- 1 root root 8098645 Feb 15 12:56 Babel-2.6.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 81299 Feb 15 12:56 Click-7.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 91364 Feb 15 12:56 Flask-1.0.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 9267 Feb 15 12:56 Flask_Babel-0.11.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 4936158 Feb 15 12:56 Flask_CKEditor-0.4.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 15935 Feb 15 12:57 Flask_Login-0.4.1-py2.py3-none-any.whl
-rw-r--r-- 1 root root 7535 Feb 15 12:56 Flask_Session-0.3.1-py2.py3-none-any.whl
-rw-r--r-- 1 root root 14903 Feb 15 12:56 Flask_WTF-0.14.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 126381 Feb 15 12:56 Jinja2-2.10-py2.py3-none-any.whl
-rw-r--r-- 1 root root 76583 Feb 15 12:57 Mako-1.0.7-py3-none-any.whl
-rw-r--r-- 1 root root 29273 Feb 15 12:57 MarkupSafe-1.0-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 1101554 Feb 15 12:57 Pillow-5.3.0-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 47758 Feb 15 12:56 PyMySQL-0.9.2-py2.py3-none-any.whl
-rw-r--r-- 1 root root 1144841 Feb 15 12:57 SQLAlchemy-1.2.12-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 235421 Feb 15 12:56 Unidecode-1.0.22-py2.py3-none-any.whl
-rw-r--r-- 1 root root 166353 Feb 15 12:56 WTForms-2.2.1-py2.py3-none-any.whl
-rw-r--r-- 1 root root 322863 Feb 15 12:56 Werkzeug-0.14.1-py2.py3-none-any.whl
-rw-r--r-- 1 root root 158276 Feb 15 12:56 alembic-1.0.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 101571 Feb 15 12:56 asn1crypto-0.24.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 90375 Feb 15 12:56 beautifulsoup4-4.6.3-py3-none-any.whl
-rw-r--r-- 1 root root 385610 Feb 15 12:56 cffi-1.11.5-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 813672 Feb 15 12:57 cryptography-2.3.1-cp36-cp36m-linux_x86_64.whl
-rw-r--r-- 1 root root 112930 Feb 15 12:56 gunicorn-19.9.0-py2.py3-none-any.whl
-rw-r--r-- 1 root root 21118 Feb 15 12:56 html2text-2018.1.9-py3-none-any.whl
-rw-r--r-- 1 root root 58213 Feb 15 12:56 idna-2.7-py2.py3-none-any.whl
-rw-r--r-- 1 root root 10622 Feb 15 12:57 itsdangerous-0.24-py3-none-any.whl
-rw-r--r-- 1 root root 111031 Feb 15 12:57 pycparser-2.19-py2.py3-none-any.whl
-rw-r--r-- 1 root root 211414 Feb 15 12:56 python_dateutil-2.7.3-py2.py3-none-any.whl
-rw-r--r-- 1 root root 6686 Feb 15 12:57 python_editor-1.0.3-py3-none-any.whl
-rw-r--r-- 1 root root 5543 Feb 15 12:56 python_magic-0.4.15-py2.py3-none-any.whl
-rw-r--r-- 1 root root 4595 Feb 15 12:57 python_slugify-1.2.6-py2.py3-none-any.whl
-rw-r--r-- 1 root root 510974 Feb 15 12:56 pytz-2018.5-py2.py3-none-any.whl
-rw-r--r-- 1 root root 10702 Feb 15 12:56 six-1.11.0-py2.py3-none-any.whl
Summary
Before: 376M, after: 211MB.
We did this using multi-stage:
Stage1:
- Build Python wheels for requirements.txt
Stage2:
- Copy Python wheels from stage1
- Install dependencies using Python wheels
- Do other things like creating user, copying code
Unfortunately we cannot run using the directory /svc/wheels from stage1 or remove this directory from the result. That would save another 25M!
Links / credits
Building Minimal Docker Containers for Python Applications
https://blog.realkinetic.com/building-minimal-docker-containers-for-python-applications-37d0272c52f3
How do I reduce a python (docker) image size using a multi-stage build? (**python specific**)
https://stackoverflow.com/questions/48543834/how-do-i-reduce-a-python-docker-image-size-using-a-multi-stage-build-pytho
Leveraging Docker multi-stage builds in Python development
https://www.merixstudio.com/blog/docker-multi-stage-builds-python-development/
Lighter Python images using multi-stage Dockerfile
https://lekum.org/post/multistage-dockerfile/
Smaller Python Docker Containers with Multi-Stage Builds and Python Wheels
https://softwarejourneyman.com/docker-python-install-wheels.html
Use multi-stage builds
https://docs.docker.com/develop/develop-images/multistage-build/
Leave a comment
Comment anonymously or log in to comment.
Comments (1)
Leave a reply
Reply anonymously or log in to reply.
You can use
RUN pip install --no-cache /wheels/* \
&& rm -rf /wheels/*
to delete wheels
Recent
- Hiding database UUID primary keys of your web application
- Don't Repeat Yourself (DRY) with Jinja2
- SQLAlchemy, PostgreSQL, maximum number of rows per user
- Show the values in SQLAlchemy dynamic filters
- Secure data transfer with Public Key encryption and pyNaCl
- rqlite: a high-availability and distributed SQLite alternative
Most viewed
- Using Python's pyOpenSSL to verify SSL certificates downloaded from a host
- Using UUIDs instead of Integer Autoincrement Primary Keys with SQLAlchemy and MariaDb
- Connect to a service on a Docker host from a Docker container
- Using PyInstaller and Cython to create a Python executable
- SQLAlchemy: Using Cascade Deletes to delete related objects
- Flask RESTful API request parameter validation with Marshmallow schemas