This the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Dockerfile basics

Basic Dockerfile orientation.

Just a bit of background and then Dockerfile familiarization through a few simple examples.

1 - Overview

The examples in this section of the guide will

  • Introduce you to the basics of writing a Dockerfile to containerize an app.
  • Introduce you to common Docker CLI commands for manipulating images and containers.

What does containerizing an app actually mean?

When we talk about containerizing an app, what we mean by that is that the app will be configured to execute in a virtualized environment, called a container, that isolates it from the rest of the host machine (and, conversely, insulates the host machine from it):

  • From the perspective of the host computer, processes in a container effectively execute in a sandbox that restricts access and imposes resource (cpu, memory, network, file system, etc.) consumption limits on that set of processes.
  • From the perspective of any processes running inside a container, it appears as if they have exclusive use of the set of computer’s resources in the environment they’ve been provided with.

In essence, a container is a logical abstraction based on Linux isolation features available in the kernel for governing access and consumption limits over system resources for a group of one or more processes when they are executed.

In general, there is one primary process that runs in a container (such as a web server or database), although multiple, generally related, processes (such as an agent or worker process) can run within the same container.

🤓 Nerd note

If containers sound a bit like virtual machines to you, that’s completely understandable.

To clarify the container concept further, however, if you’re somewhat familiar with Linux and virtual machines, then understand that when it comes to containers, it’s the Linux kernel that provides the system abstraction, whereas for virtual machines it’s the hypervisor that provides it.

Processes in containers can start so quickly because they’re just Linux processes using the standard system call interface to the shared host kernel; the kernel is responsible for virtualizing access to system resources.

Virtual machines do not share the host kernel space and need to boot up their own copies of an entire operating system before they can run a target application. This often takes several minutes. Access to physical resources is virtualized by the hypervisor, not the kernel.

For modern cloud computing, container technology helps to make applications scale quickly according to needs, but it’s also interesting to note that most clouds run containers on virtual machines that are provisioned for your use, as this is generally more cost effective and pragmatic then provisioning dedicated physical machines.

What does this entail?

To run a container, you first need an image.

This is somewhat analogous to the concept that to run a program, you first need to have an executable binary. You either need to install it from some source or you need to build it yourself. In a similar vein, you need to get the image from somewhere or you need build it.

Docker does a number of things to make working with container technology easier for developers.

Similar to package managers, like apt and yum, Docker launch a container ecosystem by creating a public registry (Docker Hub) where you can find repositories for various images. Nowadays, you can use many other registries as well, such as Container Registry, Artifact Registry, and GitHub Container Registry.

The other way is that Docker defines an instruction set that you can use use in a text file, called a Dockerfile, for building your own image.

An image is used to package up all the bits necessary for the runtime environment for a container – minus the Linux kernel, which is provided by the host computer. This is a big difference from virtual machines, which must also package up a host operating system that needs to boot up before running anything else.

Below is a simplified view of a file system with all the bits for the Linux kernel, distribution, runtime dependencies (such as Node.js, Python, etc.), and application plus any static resources.

file system

When you build an image, this is what it comprises:

image

As you’ll find out soon, you often can find (or build separately) a base image that you can use as a foundation for your application image, as shown below:

image

A Docker image need only contain as much of a Linux distribution above and beyond the kernel to provide the runtime support necessary for executing your application. Some Docker images include a full Linux distribution, some include just a very small subset of one, and some don’t need anything at all to run the intended executable if there are no dependencies on any shared system runtime files.

By the you’re finished with this guide, you’ll be familiar with strategies for choosing appropriate base images as well as for creating your own images to suit your requirements.

What’s next

While it’s possible to create Docker images interactively, as a developer you’ll want to understand how to create them programmatically using a Dockerfile. A Dockerfile is a text file that uses specific instructions for building a Docker image.

Coming up next you’ll write a short Dockerfile. You’ll use the Dockerfile to build a Docker image that incorporates your program.

Then you’ll use the image to launch a container. Your program executes “inside” this container.

🧐 If you want to know more…

See this informative blog post if you’re interested in learning more about the history of container technology.

1.1 - Dockerfile intro

A Dockerfile is a text file that contains instructions for building an image that defines the runtime environment for each container launched using that image.

In the upcoming examples, you will be introduced to the following instructions:

FROM

The FROM instruction is used to declare the base image for the new image. This makes it possible to compose images from other existing images that daisy chain the dependencies your app might require, such as:

  • A language runtime, such as Node.js, that depends on another base image like
  • A Linux distribution, such as Ubuntu.

COPY

Normally what distinguishes your image from its base image are the files that you add to it. You might be building an image intended to be used as a more specialized base image for yet another image, or you might be building an image for a program that you’ve created and want to deploy.

Either way, you can will normally use the COPY instruction to copy files from your host system into the image.

🏆 Best practice

There is also an ADD instruction that is similar, but we won’t be covering it since using COPY is preferred. See this best practices section to learn more.

ENTRYPOINT and CMD

These instructions are used to specify the command that should be run when a container is launched, along with any default options. There are differences between the effect of using the two instructions, and there are reasons why it is often ideal to use them together.

The helloworld example will introduce ENTRYPOINT, and the time example will expand and clarify both the ENTRYPOINT and CMD instructions when used separately and together.

🤓 Nerd note

Keep in mind, of course, that if you’re building an image that you intend to be used as a specialized base image for other images, it’s perfectly normal not to set either an ENTRYPOINT or a CMD.

2 - Hello World

Let’s start simple. This example exists in the basics/hello directory in the examples repo, but it’s so simple you might want to create your own version by following the instructions below.

Create a unique directory for this example, such as hello, and then change directory to it.

mkdir hello
cd hello

Important! You’ll create a unique directory for every example. A directory must contain its own distinctDockerfile as well as any other files ( symbolic links won’t work) that will need to be copied into the generated Docker image.

You’re going to write a small program – just a basic shell script to print the obligatory Hello World greeting.

Next you’ll write a short Dockerfile. You’ll use the Dockerfile to build a Docker image that incorporates your program.

Finally you’ll use the image to launch a container. Your program executes inside the container.

The container is a logical abstraction based on Linux features that simply runs the process for your program in a way that isolates it from the rest of the system.

2.1 - The shell script

Create a simple shell script to print a greeting.

The script will print Hello and the name supplied as the script’s first argument (defaulting to World if none).

#!/bin/sh

NAME=${1:-world}
echo "Hello, $NAME!"

Set the executable bit and run the script in your terminal.

chmod +x hello.sh
hello.sh
Hello, world!

Try the script passing a name as an argument.

hello.sh Docker
Hello, Docker!

2.2 - The Dockerfile

Create a Dockerfile.

FROM alpine
COPY hello.sh /
ENTRYPOINT [ "/hello.sh" ]

Build a Docker image with the following command. The trailing dot (.) indicates that the path to the Dockerfile is the current working directory.

docker build -t hello .
[+] Building 8.0s (8/8) FINISHED
 => [internal] load build definition from Dockerfile                                                                 0.9s
 => => transferring dockerfile: 97B                                                                                  0.0s
 => [internal] load .dockerignore                                                                                    1.1s
 => => transferring context: 2B                                                                                      0.0s
 => [internal] load metadata for docker.io/library/alpine:latest                                                     2.6s
 => [auth] library/alpine:pull token for registry-1.docker.io                                                        0.0s
 => [internal] load build context                                                                                    0.7s
 => => transferring context: 84B                                                                                     0.0s
 => [1/2] FROM docker.io/library/alpine@sha256:e1c082e3d3c45cccac829840a25941e679c25d438cc8412c2fa221cf1a824e6a      2.1s
 => => resolve docker.io/library/alpine@sha256:e1c082e3d3c45cccac829840a25941e679c25d438cc8412c2fa221cf1a824e6a      0.3s
 => => sha256:e1c082e3d3c45cccac829840a25941e679c25d438cc8412c2fa221cf1a824e6a 1.64kB / 1.64kB                       0.0s
 => => sha256:b06a5cf61b2956088722c4f1b9a6f71dfe95f0b1fe285d44195452b8a1627de7 528B / 528B                           0.0s
 => => sha256:bb3de5531c18f185667b0be0e400ab24aa40f4440093de82baf4072e14af3b84 1.49kB / 1.49kB                       0.0s
 => => sha256:552d1f2373af9bfe12033568ebbfb0ccbb0de11279f9a415a29207e264d7f4d9 2.71MB / 2.71MB                       0.6s
 => => extracting sha256:552d1f2373af9bfe12033568ebbfb0ccbb0de11279f9a415a29207e264d7f4d9                            0.1s
 => [2/2] COPY hello.sh /                                                                                            0.7s
 => [3/3] RUN chmod +x /hello.sh                                                                                     1.2s
 => exporting to image                                                                                               0.6s
 => => exporting layers                                                                                              0.5s
 => => writing image sha256:35be16e40caf802a0fb0d7a09dca7bce970162b62635d870134e64b754c67967                         0.1s
 => => naming to docker.io/library/hello

What just happened?

You can see a bunch of things happened in the output. In a nutshell, the docker build command submitted the Dockerfile and the rest of the contents of the directory (anything that isn’t ignored by a .dockerignore file, if present) to the Docker daemon running on your system. Most of the output reflects the administrative details of the process used to create the final image.

An image is composed of cacheable layers. In the output that begins with => [1/3] FROM docker.io/library/alpine we can see that Docker pulled the alpine base image from the alpine repository at a remote registry (docker.io) comprised of a number of layers that will provide the Linux distribution for our program. This operating environment will run on top of the Linux kernel on your host system (if you’re running on a Mac or Windows machine, this will be in a virtual machine running on your system).

For the line of output that contains => [2/3] COPY hello.sh /, Docker executes the COPY instruction that tells the Docker daemon to copy hello.sh from the root of the build context (the set of files that the Docker CLI docker build command sent to the Docker daemon) into the root (/) of the image.

In the line that contains => [3/3] RUN chmod +x /hello.sh, Docker updates the image by running the Linux command to set the executable bit on the program. If you set the executable bit on the script when you tested it, it’s likely that the file was copied correctly with the bit set, but you shouldn’t assume this. Always ensure you set any permission bits explicitly.

The final ENTRYPOINT instruction in our Dockerfile doesn’t add any layers to the image, which is why there’s no corresponding line that says [4/4]. It adds metadata that is used to specify the process that want to execute when a container is created.

Try this

Run the docker image ls command:

docker image ls hello
REPOSITORY   TAG       IMAGE ID       CREATED          SIZE
hello        latest    1cb37f5aaa1b   2 minutes ago    5.34MB

You can confirm that the image was created, check its size (the bulk of the image comes from the Alpine Linux distribution), and note that the image ID fragment (02bcec72c45a) corresponds with the last layer that was created.

2.3 - Run a container

Now that you have an image, you can launch a container to run your program.

docker container run --rm hello
Hello, world!

The --rm option tells Docker to remove the container once the process running in it terminates. It’s generally a good idea to clean up resources when no longer needed.

One reason not to automatically remove a container when the running process terminates is to inspect the logs after.

docker container run --name helloctr hello

You’ll see the same the output as before. This time we gave the container a name using the --name option to make it easy to refer to it. You can inspect everything the process wrote to stdout and stderr with the docker logs command.

docker container logs helloctr
Hello, world!

The logs show the same output that was printed to the terminal.

Even though the process exited and the container is no longer usable, you can see that it still exists with the docker container ls command.

docker container ls --all
CONTAINER ID   IMAGE  COMMAND       CREATED          STATUS                      PORTS      NAMES
c038996a5e75   hello  "/hello.sh"   20 seconds ago   Exited (0) 18 seconds ago              helloctr

The --all option was needed to show containers that have exited.

You can remove the container with the docker container rm command.

docker container rm helloctr
helloctr

The output indicates the name of the container that was removed.

The program accepts a name argument. You can supply arguments to the docker run command when you run a container.

docker container run --rm hello Docker
Hello, Docker!

The original versions of the docker container commands are still available and can be used as convenient shortcuts.

docker container run   =>  docker run
docker container logs  =>  docker logs
docker container ls    =>  docker ps
docker container rm    =>  docker rm

2.4 - Summary

The helloworld example introduced you to the following:

3 - Time

Navigate to the basics/time directory in the examples repo.

Take a look at the get-time.sh bash script.

#!/bin/bash

# Ref: See section on Daytime Protocol
# https://www.nist.gov/pml/time-and-frequency-division/time-distribution/internet-time-service-its

# first arg is frequency, default is 5 seconds
freq=${1:-5}
echo "Fetch UTC(NIST) time every ${freq} seconds..."

while true; do
  if dt=$(cat </dev/tcp/time.nist.gov/13 | tail -n 1); then
    if [[ "$dt" =~ .*"UTC(NIST)".* ]]; then
      d=$(echo "$dt" | cut -d " " -f 2)
      t=$(echo "$dt" | cut -d " " -f 3)
      echo "$d $t"
    fi
  fi
  sleep "${freq}"
done

Using a loop, this program reads a TCP socket on a Linux-based system for fetching the current time from the NIST Internet time service and prints a formatted version of the response to the terminal.

The program accepts an argument specifying the frequency for fetching the time in seconds, defaulting to 5 if not provided.

You can test the program on your own machine.

chmod +x get-time.sh
./get-time.sh 1
Fetch UTC(NIST) time every 1 seconds...
21-09-02 18:15:02
21-09-02 18:15:03
21-09-02 18:15:04
21-09-02 18:15:12
^C

😲 Note

If this doesn’t work on your machine, don’t worry – you’ve just experienced one of the big reasons why containers are so useful!

Continue to the next page to containerize the time app so we can run it in a standard Linux environment that will work everywhere Docker is supported.

3.1 - Dockerfile with CMD

In the Dockerfile for the previous example, you used the ENTRYPOINT instruction to specify the process to execute in the container. This time you’ll use the CMD instruction instead and then explore the differences between the two.

FROM debian
COPY get-time.sh /usr/local/bin
RUN chmod +x /usr/local/bin/get-time.sh
CMD [ "/usr/local/bin/get-time.sh" ]

Build a Docker image to run this.

docker build -f Dockerfile.using_cmd -t get-time .

Since Docker looks for Dockerfile by default and you’re going to try using a few different Dockerfiles, you need to use the -f option to be explicit about which one to use for the build.

Now print the time using a container.

docker run -it --rm --name timectr get-time
Fetch UTC(NIST) time every 5 seconds...
21-09-02 19:28:45
21-09-02 19:28:51

We used a new option for the docker run command: -it. This is actually a combined option using two short flags, -i and -t. The first allows us to pass input to the container over stdin; the second attaches a pseudo-terminal, which means for one thing it can respond to the SIGINT signal when you press Ctrl-C on your keyboard to terminate the process.

The container appears to behave the same way as when we used ENTRYPOINT before. However, one difference is that CMD acts more like a default. You can easily override the default command, as shown below.

docker run -it --rm --name timectr get-time echo hello
hello

The get-time.sh script is no longer executed; instead, the command echo with the argument hello was executed in the container. Since the base image for the get-time image was debian, the standard echo command exists in the path when the container is launched.

Because the argument that you provide in the docker run command replaces the default command, you can’t just provide the frequency option like this (since there is no 2 command).

# this won't work!
docker run -it --rm --name timectr get-time 2

Instead, you need to supply the entire command to override the default one:

docker run -it --rm --name timectr get-time get-time.sh 2
Fetch UTC(NIST) time every 2 seconds...
21-09-02 19:29:42
21-09-02 19:29:44
^C

Since /usr/local/bin is in the path for the debian image, it wasn’t necessary to fully qualify the path for get-time.sh. The fully-qualified path was used in the CMD instruction is often a good practice for clarity.

Let’s revisit using ENTRYPOINT next.

3.2 - Dockerfile with ENTRYPOINT

Let’s rebuild the get-time image using an ENTRYPOINT.

FROM debian
COPY get-time.sh /usr/local/bin
RUN chmod +x /usr/local/bin/get-time.sh
ENTRYPOINT [ "/usr/local/bin/get-time.sh" ]

Rebuild the Docker image.

docker build -f Dockerfile.using_entrypoint -t get-time .

Now run a container.

docker run -it --rm --name timectr get-time
Fetch UTC(NIST) time every 5 seconds...
21-09-02 19:30:53
^C

So far the behavior seems the same. However, unlike with CMD, arguments don’t override the ENTRYPOINT; instead, they are passed to. This behavior is more aligned with our expectations for containers that run specific programs.

docker run -it --rm --name timectr get-time 2
Fetch UTC(NIST) time every 2 seconds...
21-09-02 19:33:26
21-09-02 19:33:28
^C

3.3 - Dockerfile with both

What if we want to override the container to behave the way it does with an ENTRYPOINT, but we want to override the default frequency of 5 seconds?

You use both ENTRYPOINT and CMD.

FROM debian
COPY get-time.sh /usr/local/bin
RUN chmod +x /usr/local/bin/get-time.sh
ENTRYPOINT [ "/usr/local/bin/get-time.sh" ]
CMD [ "2" ]

Rebuild the Docker image.

docker build -f Dockerfile.using_entrypoint -t get-time .

Run a container.

docker run -it --rm --name timectr get-time
Fetch UTC(NIST) time every 2 seconds...
21-09-02 19:30:53
^C

What happens here is that the value for CMD acts as a default argument that will be passed to the command specified by ENTRYPOINT. You can still pass an argument when you create the container and it will just override CMD’s default value.

docker run -it --rm --name timectr get-time 3
Fetch UTC(NIST) time every 3 seconds...

3.4 - Summary

The time example introduced you to the difference between ENTRYPOINT and CMD, and how you can use both together.

In summary, use CMD when you might want to run other processes in the container. Some containers are specifically designed to make multiple tool commands available, so using CMD is convenient.

Use ENTRYPOINT when you want a container to behave like a specific tool and you want to easily supply arguments without inadvertently overriding the default command.

Use ENTRYPOINT and CMD together when you want the behavior of using ENTRYPOINT, but want to specify potentially more appropriate default arguments using CMD that will override the containerized program’s defaults.

🤯

The nice thing about this combination is that you can still override the CMD defaults (which override the program’s defaults) when you launch a container!

For options that should not be overridden, they should be specified as part of the ENTRYPOINT instruction. Any options set using CMD or when running a container will be appended to the ENTRYPOINT options.

Overriding the ENTRYPOINT when running a container

Finally, as with CMD, you can even override the ENTRYPOINT when you run a container. It just takes requires a little more effort on the command line.

You use the --entrypoint option to specify the command (ex, echo), and then supply an argument after the image name (ex, hello) as usual.

docker run -it --rm --name timectr --entrypoint echo get-time hello
hello