Skip to main content

Docker

Introduction

  • 100+ Docker Concepts you Need to Know • Fireship 📺
  • Docker Tutorial for Beginners • mCoding 📺
  • Docker Tutorial for Beginners • Programming with Mosh 📺
    • You can “Dockerize” any app by adding a Dockerfile to it
    • The Dockerfile contains instructions Docker uses package the app into an image that contains everything necessary to run the app
    • Once Docker has an image, docker can use it to create a container, which is a running process that has its own file system provided by the image
    • Images can be pushed to the Docker Hub registry (like npm/github for docker) and downloaded and run anywhere
    • See my docker-hello-world for a quick starter example
    • docker run <image> will look for it locally and pull and run it from Docker Hub if not found locally
    • if using ubuntu image, the relevant package manager is apt
      • start by running apt update to update the local db of available packages
      • then apt list or apt search to find your package, and apt install to install it
    • in linux, everything is a file, including folders and running processes
  • Inbox:

General

Containers

  • containers are just processes running in a restricted area
  • they are nothing like a virtual machine
  • images vs containers:
    • image = binaries, libraries and source code that make up your application
    • container = running instance of that image (can have many)
  • docker container commands:
    • Many commands do tab completion, so don’t forget to try that out! For example docker container stop <tab> will display all running containers and cycle through their IDs
    • docker container --help = all docker container subcommands
    • docker container run = start container
    • docker container run --publish 80:80 --detach nginx
      • nginx = docker engine looks for an image called “nginx” (and pulls down latest version from docker hub if not found locally)
      • container run = starts a new container from the nginx image
      • --publish 80:80 = opens port 80 on the host IP and routes its traffic to the container IP (port 80)
      • --detach = run in the background (outputs the unique ID of the container on this run)
    • docker container ls = list running containers
    • docker container ls -a
      • list all running and stopped containers (that haven’t been removed)
      • will include all randomly generated container IDs and names (even if the all come from the same image, each time you run the container, the ID and name are different)
    • docker container run --publish 80:80 --detach --name webhost nginx
      • --name = replace automatic name with custom name
    • docker container stop <first few characters of ID>
    • docker container logs <container-name>
    • docker container rm <first few ID chars> <first few ID chars>
      • cleans up all containers
      • to stop a running container, add -f

Networking

  • virtual networks
  • containers can always talk to each other on the default bridge network using the container name as host nane
  • to listen to a host port, use -p to map host port to container port
  • docker container run --publish 80:80
    • HOST:CONTAINER
    • forward host port 80 to container port 80
  • docker container port <container>
    • list container ports and what host ports they map to
  • docker network ls = list virtual networks
  • docker network inspect <network> = see network config
  • Inbox:

Images

  • A series of file system changes and metadata
  • A container is a single write layer on top of an image
  • Layers
    • Images can have many
    • Defined in Dockerfile
    • docker history <image> lists the locally cached layers for each image (in reverse order of Dockerfile) and when each layer was last updated
    • Locally cached image layers are reused across images, since each layer has a unique SHA
    • Layers are stored, rather than entire images
  • docker inspect <image> shows its metadata
  • Tags
    • labels that point to an image id
    • Many labels can point to the same one
    • Images are cached by ID, so there’s no double downloading regardless of how many tags docker image ls shows we have that point to the same id
    • Tags point to a specific image commit
    • docker image tag <image repo> <tag> creates a tag
      • can use this to fork an existing image
    • docker image push <image repo> pushes local image layers to docker hub
  • Dockerfile
    • Adding a Dockerfile to an existing app is a way to make the app better!
    • It takes whatever may have previously been complex and hard to remember and makes it clear and predictable
  • Dockerfile layer types
    • FROM = distribution; usually a minimal one; required; pin version instead of using latest
    • WORKDIR
      • changes the current working directory
      • preferred over RUN cd <path>
    • COPY
      • copy folder/file from host to container
    • ENV = set environment variables; key-value options
    • RUN
      • run CLI commands
      • common to cognate are a bunch so they act as one kayer
      • common to symlink any log files output by the app to stdout and stderr (best practice is to forward all app logs there rather than having your container publish its own logs)
    • EXPOSE
      • define ports it’s possible to connect to
      • still need -p to open them
    • CMD
      • default command that runs on startup; required unless one already included in FROM image
    • for build efficiency, keep the layers that change least often at the top and the ones that change most often at the bottom
    • Docker Step By Step: Containerizing Zookeeper • Gradually adds layers to a Dockerfile • Kevin Sookocheff 📖
  • docker image build -t <tag> .
    • build image layers and cache them
    • on reruns, rebuild only layers that have changed

Volumes

  • Containers are usually immutable and ephemeral
    • immutable infrastructure = we don’t update a running container;
    • we update the app by deploying a new container
  • The problem of persistent data
    • We want databases and other long-lived data produced by the container process to outlive the container
  • Data Volumes
    • a special location outside the container to store data
    • creates a new path on the host
    • need to be manually deleted (don’t go away when container stops)
    • can create with a VOLUME layer in Dockerfile
    • can name a volume by preceding it with a label: <name>:<path>
  • Bind Mounts
    • map a container path to a host path
    • no new path needed (using an existing one)
    • host files overwrite any in container
    • can’t use in Dockerfile (only in docker container run or docker-compose.yaml)
    • Bind mounts | Docker Docs
  • Volumes vs Bind Mounts:
    • Docker recommends named volumes over bind mounts due to their extended functionality and lack of reliance on the host’s file structure
    • bind mounts are basically the same as named volumes except that you can specify the host path (which is helpful if you need to access the files)
    • they can be less portable than volumes to other hosts (or to Kubernetes) since they make assumptions about the paths available on the host
    • they also require matching host and container permissions
    • docker volumes are at risk of being permanently deleted when running docker prune, while bind mounts are in no danger of accidental deletion (so may be a safer choice for data that needs to persist and cannot be regenerated)
    • What Is The Difference Between Binding Mounts And Volumes While Handling Persistent Data In Docker Containers? - Stack Overflow
    • Docker: Volumes Vs Bind Mounts
  • Troubleshooting file permissions:
    • When setting a Dockerfile’s USER, use numbers, which work better in Kubernetes than using names.
    1. Use the command ps aux in each container to see a list of processes and usernames. The process needs a matching user ID or group ID to access the files in question.
      • If ps doesn’t work in your container, you may need to install it. In debian-based images with apt, you can add it with apt-get update && apt-get install procps
    2. Find the UID/GID in each containers /etc/passwd and /etc/group to translate names to numbers. You’ll likely find there a miss-match, where one containers process originally wrote the files with its UID/GID and the other containers process is running as a different UID/GID.
    3. Figure out a way to ensure both containers are running with either a matching user ID or group ID. This is often easier to manage in your own custom app (when using a language base image like python or node) rather than trying to change a 3rd party app’s container (like nginx or postgres)… but it all depends. This may mean creating a new user in one Dockerfile and setting the startup user with USER. (see USER docs) The node default image has a good example of the commands for creating a user and group with hard-coded IDs:
RUN groupadd --gid 1000 node \
  && useradd --uid 1000 --gid node --shell /bin/bash --create-home node USER 1000:1000
Resources

Logging

Healthchecks

  • supported in Dockerfile, compose, swarm and docker run
  • You provide a custom command to test the health of the container (every 30 sec by default)
    • e.g. curl localhost
  • Docker engine with exec the healthcheck command in the container
  • It expects exit 0 (OK) or exit 1 (Error)
    • make sure error is 1 (e.g. by returning false on a failure)
  • Healthcheck states = starting, healthy, unhealthy
  • Appears in:
    • docker container ls
    • docker container inspect
  • Swarm stack will replace unhealthy containers and service updates will wait for health before continuing rollout
  • Options = interval, timeout, start-period, retries
  • healthcheck | Compose file version 3 reference | Docker Docs
  • HEALTHCHECK | Dockerfile reference | Docker Docs

Docker Compose

  • Docker Compose is…
    1. A YAML file that declares the containers, networks, volumes, environment variables, images etc (replaces multiple docker container run commands)
    2. A CLI tool for local development and testing of the YAML file
  • Benefits (compared to a bunch of docker container run commands):
    • easier to configure relationships between containers
    • a way to save all docker container run options in one easy-to-read file
    • can spin up the entire container collection with one command
  • YAML file:
    • can have any file name (not just docker-compose.yaml)
    • can be versioned with version: <number>
      • if not specified, v1 is assumed
    • can be used in prod with swarm
    • docker compose --help = all docker compose commands
    • you can refactor any docker container run command into a docker-compose.yaml service block
version: '3'
 
services: # containers
  service_name: # becomes the DNS name inside the network
    image: # required if not using "build"; if using build, becomes the name of the built image
    command: # optional; replaces default CMD in the image
    environment: # optional; same as docker run -e
    volumes: # optional; same as docker run -v
    depends_on: # optional; list of services that should start first
    restart: # optional; policy for automatically restarting if it stops
  service_2_name:
 
volumes: # optional; same as docker volume create
 
networks: # optional; same as docker network create

Container Orchestration

  • making many servers act like one
  • Benefits increase with # servers * their change rate
    • With few servers that don’t change very often, the complexity of orchestration may not be needed
  • Which orchestrator to choose?
    • Docker Swarm + Kubernetes are the big ones that can run in multiple environments and platforms
  • Docker Swarm
    • Stacks = composing multiple swarms
      • docker stack services <stack-name>
        • see all services in stack and how many replicas of each are running
      • docker stack ps <stack-name>
        • see all process running in stack and which node is running each one
      • see a visualization in the browser of the stack and its nodes and services at <stack-ip>:8080
      • instead of updating a single running service (after changing it in the compose file), rerun docker stack deploy <stack-name>
      • Secrets in stacks:
        • Secrets Storage
        • built in feature
        • swarm raft db is encrypted on disk
        • only stored on disk if manager nodes
        • secrets get to containers via TLS connection between managers and workers
        • first, put them in db
          • docker secret create <secret-name-in-db> <file-containing-secret.txt>
          • docker secret ls to see a list of secrets
        • then, assign to a service
        • only containers in assigned services can see them (unlike env vars)
        • they look like files in the container, but are actually in RAM only
        • local docker compose can use file-based secrets, but it isn’t secure like swarm secrets are; it’s a swarm only feature that relies on the swarm db; can be used locally anyway via a workaround, but not ideal for production (where compose isn’t used anyway)
        • (Move whole secrets topic to its own section with subsections for CLI vs docker file be compose vs swarm?)
    • Updating running services:
      • When there’s an update to a service, all of its containers are replaced in a rolling fashion
      • Limits downtime
      • Similar options as for create command, suffixed with -add or -rm
      • Includes rollback and healthcheck options
      • e.g. docker service update --image myapp:1.2.1 myservice
        • update image version
      • e.g. docker service scale web=8 api=6
        • change number of replicas for two services at once
      • docker stack deploy -c file.yml mystack
        • deploy updates and roll them out across containers
  • Multipass orchestrates virtual Ubuntu instances
  • Kubernetes
    • Released by Google in 2015, now maintained by a large community and offered by many cloud vendors as a custom distribution
    • Runs on top of docker as a set of APIs in containers
    • Provides API/CLI to manage containers across servers
  • Kubernetes or Swarm?
    • Swarm is easier to deploy and manage
      • Can be managed by a small team
      • 80/20 solution
      • Comes with Docker, so only dealing with one vendor
      • Runs everywhere Docker runs
      • Easy to troubleshoot (same approach as Docker)
    • Kubernetes has more features and wider support
      • all cloud platforms will deploy and manage K8s for you

Container Registries

  • Docker Hub
    • The most popular public image registry
    • Basically Docker Registry plus lightweight image building
    • Can auto-build images on commit by linking GitHub repo
    • After enabling automatic builds for an image, use Repository Links to automatically rebuilds when an upstream image changes (i.e. images referenced in FROM layers)
    • Use webhooks to trigger another service to do something after an image is updated
  • Docker Store
  • Docker Cloud
    • Use Swarms feature to connect local computer to a swarm
  • Docker Registry
    • open source
    • no UI
    • web api + storage options (local or cloud)
    • open by default, but can be configured to use TLS and auth
    • install with docker run -p 5000:5000 registry (it’s an image itself)
    • add an existing hub or local image to your registry by retagging it:
      • docker tag <image> <ip>:<port>/<new-image>
    • then remove original from your local cache (after stopping any containers)
      • docker image rm <image>
    • and pull the new version from your registry:
      • docker pull <image>
    • works the same way in swarm
    • more convenient to use a hosted registry instead of rolling your own (even for private images) unless there’s a really good reason to maintain your own
  • Third party registries
    • Google Cloud, AWS, Azure, etc all have their own registry options that are well integrated with their other tools

Other uses

Inbox