This article describes how to build a Python application Docker image, optimized for production use, using multi-stage builds.

The idea is to use a separate build stage to build the dependencies (where the build process itself may require other dependencies, like build tools, compilers, header files, etc.), then copy the exact required artifacts (built dependencies) to the final stage, which can then be used to run the application.

The final image will only contain the application runtime dependencies, and not the build requirements. A smaller image with less software packages, means faster deployments, less storage space, and smaller attack surface.

Dependencies of a Python application can be categorized as:

  • Base dependencies: required to bootstrap/setup the application, or used by helper scripts for initialization, etc. These dependencies are not directly used by the application, but needed in all stages. The application dependency manager itself (like poetry, pipenv, etc.) is also a base dependency.
  • Application dependencies: required to run the app itself, like frameworks, database drivers, etc.
  • Development dependencies: required during the test or development phase, like test runners, static checkers and linters, document generation tools, etc.

In this article poetry is used as the dependency manager, but the same approach can be used with other dependency managers as well. Also as another base dependency psycopg2 is used (let's assume it's required by initialization scripts, to check for database port availability before starting the application), with both runtime and buildtime native dependencies, to demonstrate the approach for such libraries.

The application dependencies are defined in the pyproject.toml file, and the development dependencies are defined in the dev dependency group. The production build will only contains the application dependencies, and the development build will contain both application and development dependencies.

Image Build Stages

The build stages should be defained in away to provide this functionality:

  • Build requirements won't be included in the final image
  • Production build won't include development dependencies

There maybe different approaches to define the image build stages, for example:

  • Define one stage for build and one for runtime. Simple and straightforward, however the issue of separating the production and development dependencies still remains. One solution is to use build arguments to pass the dependency manager options to install the desired dependencies based on the build target. When building the production image, only application dependencies are built, and when building the development image, development dependencies are installed as well.
  • Define more complicated stages to build and run the application for production and development environments. One approach could be to add extra build stages, where the first stage builds the application dependencies, the second stage is used to run application in production, and the last stage installs the development dependencies. Then different stages are used as build targets for different environments. The development stage may still require and include build dependencies (unless this issue is addressed explicitly).

Different approaches have different trade-offs, and the best approach depends on the use case. Using more specific stages may help to utilize the build cache more efficiently (or may not, depending on the build pipeline setup, etc.), but it may also make the Dockerfile more complicated.

Regardless of such details, in most cases reducing the final image size provides the most benefit, which can be achieved by both approaches. In this article the first approach is introduced, using build arguments to pass the dependency manager options to install the desired dependencies for the build target.

The Dockerfile layout is as follows:

  • Base stage: defines the base image for other stages, pinning the python version and venv/app paths
  • Build stage: installs all build dependencies, builds the application dependencies for the target environment, and installs them in a virtualenv
  • Run stage: Sets the application itself up to use the ready virtualenv (pre-built during the build stage)

Build arguments allow customizing the build, for example the base Python version, the virtualenv path, etc, but also the dependency manager options to install the desired dependencies for the build target (production or development).

# Define build args (used for all stages) here
# These args can be overridden during build by --build-args NAME=VAL
# see: https://docs.docker.com/engine/reference/builder/#arg

ARG PYTHON_VERSION=3.11-bookworm

# pin base dependency versions to build the base image
# poetry as the base dependency
ARG POETRY_VERSION=1.6.1
# psycopg2 as a package with native dependencies and needed outside of the virtualenv for various reasons
ARG PSYCOPG2_VERSION=2.9.8

# PYTHON_WHEEL_PATH is used to store the built wheels
ARG PYTHON_WHEEL_PATH=/var/cache/python/wheels

# use a common path for all poetry data and virtualenvs to share them between stages
ARG POETRY_DATA_DIR=/usr/local/poetry/data
# poetry stores the virtualenvs by default under the cache path, maybe it's better to keep all poetry data in one place
ARG POETRY_CACHE_DIR=/usr/local/poetry/cache

# set POETRY_INSTALL_OPTS='--no-root --with dev' to build with dev dependencies
ARG POETRY_INSTALL_OPTS='--no-root'  # not installing the app itself in the venv to keep venv and app in separate paths

################################################################################
# Base stage
################################################################################
# base stage defines the base for other stages, to set defaults and define build args

FROM python:${PYTHON_VERSION} as base-python

ARG PYTHON_WHEEL_PATH

ARG POETRY_VERSION
ENV POETRY_VERSION=${POETRY_VERSION}

ARG PSYCOPG2_VERSION
ENV PSYCOPG2_VERSION=${PSYCOPG2_VERSION}

ARG POETRY_DATA_DIR
ENV POETRY_DATA_DIR=${POETRY_DATA_DIR}
ARG POETRY_CACHE_DIR
ENV POETRY_CACHE_DIR=${POETRY_CACHE_DIR}

ENV APP_PATH=/opt/myapp
WORKDIR ${APP_PATH}

RUN mkdir -p ${PYTHON_WHEEL_PATH} \
    && mkdir -p ${POETRY_DATA_DIR} \
    && mkdir -p ${POETRY_CACHE_DIR} \
    && mkdir -p ${APP_PATH}

################################################################################
# Build stage
################################################################################

FROM base-python as app-build-stage

# install all build dependencies to build wheels
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        build-essential \
        libpq-dev  # libq-dev is needed to build psycopg2

# explicitly download base dependencies (poetry, etc.) wheels, then install them
# keep the wheels so they're available for the next stage
RUN pip wheel --wheel-dir ${PYTHON_WHEEL_PATH} poetry==${POETRY_VERSION} \
    && pip wheel --wheel-dir ${PYTHON_WHEEL_PATH} psycopg2==${PSYCOPG2_VERSION} \
    && pip install --no-cache-dir --no-index --find-links=${PYTHON_WHEEL_PATH} ${PYTHON_WHEEL_PATH}/poetry* \
    && pip install --no-cache-dir --no-index --find-links=${PYTHON_WHEEL_PATH} ${PYTHON_WHEEL_PATH}/psycopg2*

# copy application dependency specs
COPY pyproject.toml poetry.lock ${APP_PATH}/

# build and install the application dependencies in a virtualenv
# cleanup cached files as build is done, since these paths are copied into the next stage.
# no fail on cleanup.
RUN poetry install ${POETRY_INSTALL_OPTS} \
    && (poetry cache clear --all --no-interaction PyPI || true) \
    && (poetry cache clear --all --no-interaction _default_cache || true)

################################################################################
# Run stage
################################################################################

FROM base-python as app-run-stage

# install required runtime dependencies, and cleanup cached files for a smaller layer
RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        # psycopg2 runtime dependencies
        libpq5 \
  # cleaning up unused files
  && apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false \
  && rm -rf /var/lib/apt/lists/*

# copy all base dependency wheel files from build stage, then install them
COPY --from=app-build-stage ${PYTHON_WHEEL_PATH} ${PYTHON_WHEEL_PATH}
RUN pip install --no-cache-dir --no-index --find-links=${PYTHON_WHEEL_PATH} ${PYTHON_WHEEL_PATH}/poetry* \
    && pip install --no-cache-dir --no-index --find-links=${PYTHON_WHEEL_PATH} ${PYTHON_WHEEL_PATH}/psycopg2* \
    && rm -rf ${PYTHON_WHEEL_PATH}

# now copy the application dependencies (pre-built virutalenv)
COPY pyproject.toml poetry.lock ${APP_PATH}/
COPY --from=app-build-stage ${POETRY_DATA_DIR} ${POETRY_DATA_DIR}
COPY --from=app-build-stage ${POETRY_CACHE_DIR} ${POETRY_CACHE_DIR}

# now the copy application code itself, which may change more often than dependencies
COPY . .

Using this setup we can build the app image for production (using default build args and dependencies):

docker build --target app-run-stage -t myapp .

And for development, using:

docker build --target app-run-stage --build-arg POETRY_INSTALL_OPTS='--no-root --with dev' -t myapp-dev .

Or using docker-compose, the production file (not a complete example):

---
version: '3'

services:
  myapp:
    build:
      context: .
      target: app-run-stage
      args:
        POETRY_INSTALL_OPTS: '--no-root'  # install main dependencies only in production build
    image: myapp
    restart: unless-stopped

And the development override file:

---
version: '3'

services:
  myapp:
    build:
      context: .
      target: app-run-stage
      args:
        POETRY_INSTALL_OPTS: '--no-root --with dev'  # install dev dependencies during test/devel build
      image: myapp-dev