Introduction

You would expect problem in Docker environment because there are so many factors that can go wrong. This article covers commands, common scenarios and tips how to successfully troubleshoot potential Docker issues. It is hard to cover all scenarios, things can go wrong at so many levels, but for you we will try to distinguish and group common categories – where the logs are stored in the first place, how to enter container, what are the useful command for troubleshoot and many more. Let’s start with Docker troubleshooting.

Docker container logs

First line of defense in Docker troubleshooting is to know where the logs are. If there is no logs things can get complicated. By default, Docker stores its logs in following format:

/var/lib/docker/containers/[container-id]/[container-id]-json. log.

This is default JSON output which stores logs on host machine. This is how you can check the default logging driver:

[root@swmanager ~]# docker info –format ‘{{.LoggingDriver}}’

json-file

JSON file is just one of the logging drivers you can use: none, local, syslog, journald, gelf, fluentd, splunk, amazon and GCP.

Docker container automation started with specific ID (note the ID):

You can change it by configuring /etc/docker/daemon.json file:

{

“log-driver”: “splunk”

}

This line tells Docker to use splunk as default logging destination.

[root@swmanager ~]# docker run -dit -p 5000:5000 automation

f544bfad10f312c3befbc96cfdb3e2461f55579addff9c5a6b67b11f99c35b2a

Go to the /var/lib/docker/containers and find directory name with ID above. Logs are available there:

[root@swmanager f544bfad10f312c3befbc96cfdb3e2461f55579addff9c5a6b67b11f99c35b2a]# cat f544bfad10f312c3befbc96cfdb3e2461f55579addff9c5a6b67b11f99c35b2a-json.log

{“log”:” * Serving Flask app \”__init__\” (lazy loading)\r\n”,”stream”:”stdout”,”time”:”2020-08-17T13:22:37.407939247Z”}

{“log”:” * Environment: production\r\n”,”stream”:”stdout”,”time”:”2020-08-17T13:22:37.407991902Z”}

……

Logs can be analyzed with useful command docker logs which shows the command’s STDOUT and STDERR. But if your logging drivers sends logs to remote location or database, docker logs will not work as expected. For Docker service you can use docker service log command.

Start and analyze mysql container logs:

[root@swmanager /]# docker run -dit -p 3306:3306 –name mysql -e MYSQL_ROOT_PASSWORD=password mysql:5.7

72b4567c09c2f9eef3cdd03c3acc0acf472aa337402a59932a4ae5dc8a756311

[root@swmanager /]# docker logs mysql

2020-08-31 12:45:18+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.31-1debian10 started.

2020-08-31 12:45:18+00:00 [Note] [Entrypoint]: Switching to dedicated user ‘mysql’

2020-08-31 12:45:18+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.31-1debian10 started.

2020-08-31 12:45:18+00:00 [Note] [Entrypoint]: Initializing database files

There are useful flags you can use like –details (extra details), –follow (follow log output),–since and –until.

Types of logging

Completely different story is how logs are logged. Default way is blocking mode. These mode will write logs to its destination not caring about application work. These can bring latency and performance issues if logs are sent to remote location. If Docker uses default JSON format which stores logs on local system, impact is not high.

Second type of logging (non-blocking) does not have impact on application work because its sends logs to memory ring buffer. This way each process has separated stream and everything looks good. But downside is that memory buffer can be full and logs can leak. Default memory size of memory ring buffer is 1MB.

Because non-blocking mode is not default you have to define parameter –log.opt mode to non-blocking. Also you can tweak memory ring buffer size (our example 2M):

[root@swmanager ~]# docker run -dit –name mysql –log-opt mode=non-blocking –log-opt max-buffer-size=4m -e MYSQL_ROOT_PASSWORD=my-secret-pw mysql:5.7

d765e45169947d6e461374b8ca595de40e5a73ac58d03a52425a5c7150d1bfd5

Docker troubleshooting commands

There are few handy commands to troubleshooting potential issues on Docker containers. Command docker container logs <container> will display log output of desired container:

[root@swmanager ~]# docker container logs mysql

2020-09-24 06:01:21+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.31-1debian10 started.

2020-09-24 06:01:21+00:00 [Note] [Entrypoint]: Switching to dedicated user ‘mysql’

2020-09-24 06:01:21+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.31-1debian10 started.

2020-09-24 06:01:21+00:00 [Note] [Entrypoint]: Initializing database files

2020-09-24T06:01:21.799449Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use –explicit_defaults_for_timestamp server option (see documentation for more details).

2020-09-24T06:01:22.560336Z 0 [Warning] InnoDB: New log files created, LSN=45790

2020-09-24T06:01:22.681348Z 0 [Warning] InnoDB: Creating foreign key constraint system tables.

2020-09-24T06:01:22.788340Z 0 [Warning] No existing UUID has been found, so we assume that this is the first time that this server has been started. Generating a new UUID: 5f85edc3-fe2b-11ea-b777-0242ac110003.

2020-09-24T06:01:22.793316Z 0 [Warning] Gtid table is not ready to be used. Table ‘mysql.gtid_executed’ cannot be opened.

2020-09-24T06:01:25.148931Z 0 [Warning] CA certificate ca.pem is self signed.

2020-09-24T06:01:25.493338Z 1 [Warning] root@localhost is created with an empty password ! Please consider switching off the –initialize-insecure option.

If you want to see complete container configuration you can use command docker container inspect <container>. In our example we are checking if container is running:

[root@swmanager ~]# docker container inspect mysql | grep -i running

“Status”: “running”,

“Running”: true,

Command docker events covers full logging on operations performed on container, images, daemons, services, volumes and more.

Let’s display all docker events after specified date:

[root@swmanager ~]# docker events –since ‘2020-09-23’

2020-09-24T07:54:44.191241177+02:00 container create 24fb97271fdf5ca46c2f98e1bda66dfdc0265499952ccdcf416b3ae9551ef560 (image=centos, name=awesome_spence, org.label-schema.build-date=20200611, org.label-schema.license=GPLv2, org.label-schema.name=CentOS Base Image, org.label-schema.schema-version=1.0, org.label-schema.vendor=CentOS)

2020-09-24T07:54:44.383378689+02:00 network connect a37fd6c73c1681e55ffabecf0cf6ec46d4524764275107c417041f941944ebe2 (container=24fb97271fdf5ca46c2f98e1bda66dfdc0265499952ccdcf416b3ae9551ef560, name=bridge, type=bridge)

All events on mysql image since specific date:

[root@swmanager ~]# docker events -f “image”=”mysql:5.7” –since ‘2020-09-23’

2020-09-24T08:01:16.359790543+02:00 container destroy 8e9af83ffde5e08feaadc04e10ae4f14c928c11abd41a289a8f01907fd5a0dc1 (image=mysql:5.7, name=mysql)

2020-09-24T08:01:19.790819485+02:00 container create d765e45169947d6e461374b8ca595de40e5a73ac58d03a52425a5c7150d1bfd5 (image=mysql:5.7, name=mysql)

2020-09-24T08:01:21.343794131+02:00 container start d765e45169947d6e461374b8ca595de40e5a73ac58d03a52425a5c7150d1bfd5 (image=mysql:5.7, name=mysql)

Specific network can be inspected with:

[root@swmanager ~]# docker network inspect bridge

[

{

“Name”: “bridge”,

“Id”: “a37fd6c73c1681e55ffabecf0cf6ec46d4524764275107c417041f941944ebe2”,

“Created”: “2020-09-24T07:53:06.82258442+02:00”,

“Scope”: “local”,

“Driver”: “bridge”,

Docker swarm services can also be inspected with few commands. Please check Docker swarm article how to set up the service. We are running mysql service in one replica:

[root@swmanager ~]# docker service create –name mysql –replicas 1 -e MYSQL_ROOT_PASSWORD=my-secret-pw mysql:5.7

3pktdampz2f5mzt77yek920an

overall progress: 1 out of 1 tasks

1/1: running [==================================================>]

verify: Service converged

Command Docker service ps <container> will display basic information of service (on which worker is running, desired states, name, etc.):

[root@swmanager ~]# docker service ps mysql

ID NAME IMAGE NODE DESIRED STATE CURRENT STATE ERROR PORTS

ksdi8jfubdzb mysql.1 mysql:5.7 swworker2 Running Running about a minute ago

Like we used Docker logs command for plain docker containers, here we can use Docker service logs <container>:

[root@swmanager ~]# docker service logs mysql

mysql.1.ksdi8jfubdzb@swworker2 | 2020-09-24 06:28:51+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.31-1debian10 started.

mysql.1.ksdi8jfubdzb@swworker2 | 2020-09-24 06:28:51+00:00 [Note] [Entrypoint]: Switching to dedicated user ‘mysql’

mysql.1.ksdi8jfubdzb@swworker2 | 2020-09-24 06:28:51+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.31-1debian10 started.

mysql.1.ksdi8jfubdzb@swworker2 | 2020-09-24 06:28:51+00:00 [Note] [Entrypoint]: Initializing database files

mysql.1.ksdi8jfubdzb@swworker2 | 2020-09-24T06:28:51.976853Z 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use –explicit_defaults_for_timestamp server option (see documentation for more details).

To display swarm members with its basic information use following command:

[root@swmanager ~]# docker node ls

ID HOSTNAME STATUS AVAILABILITY MANAGER STATUS ENGINE VERSION

uhcwhi00nw008n14pase5twd1 * swmanager.local Ready Active Leader 18.09.1

viuunheq5bm2xcmc5hvlnqwr2 swworker1.local Ready Active 18.09.1

1zgfm15evbtd5l4njk53bu63h swworker2 Ready Active 18.09.1

To see complete service configuration:

[root@swmanager ~]# docker service inspect mysql

[

{

“ID”: “3pktdampz2f5mzt77yek920an”,

“Version”: {

“Index”: 890

},

“CreatedAt”: “2020-09-24T06:28:50.503417901Z”,

“UpdatedAt”: “2020-09-24T06:28:50.503417901Z”,

“Spec”: {

“Name”: “mysql”,

Inside Docker container

Good troubleshooting approach is to connect to container and check what is going on inside. There are several ways to do it: commands exec and attach and SSH into the container.

Run container (in our example CentOS):

[root@swmanager ~]# docker run -it –name centos -d centos /bin/bash

Check that container is running:

[root@swmanager ~]# docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

67056cff24e9 centos “/bin/bash” 4 seconds ago Up 1 second centos

Attach command

Use attach command to connect to container;

[root@swmanager ~]# docker attach centos

[root@67056cff24e9 /]#

Check file systems and IP configuration:

[root@67056cff24e9 log]# df –h

Filesystem Size Used Avail Use% Mounted on

overlay 6.2G 6.1G 124M 99% /

tmpfs 64M 0 64M 0% /dev

tmpfs 411M 0 411M 0% /sys/fs/cgroup

/dev/mapper/cl-root 6.2G 6.1G 124M 99% /etc/hosts

shm 64M 0 64M 0% /dev/shm

tmpfs 411M 0 411M 0% /proc/asound

tmpfs 411M 0 411M 0% /proc/acpi

tmpfs 411M 0 411M 0% /proc/scsi

tmpfs 411M 0 411M 0% /sys/firmware

[root@67056cff24e9 log]# ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000

link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

inet 127.0.0.1/8 scope host lo

valid_lft forever preferred_lft forever

20: eth0@if21: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default

link/ether 02:42:ac:11:00:03 brd ff:ff:ff:ff:ff:ff link-netnsid 0

inet 172.17.0.3/16 brd 172.17.255.255 scope global eth0

valid_lft forever preferred_lft forever

Run Ctrl P and Ctrl-Q to exit container without stopping it:

[root@67056cff24e9 log]# read escape sequence

[root@swmanager ~]# docker ps

CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES

67056cff24e9 centos “/bin/bash” About a minute ago Up About a minute centos

7a24bd538b49 mysql:5.7 “docker-entrypoint.s…” 6 minutes ago Up 6 minutes 3306/tcp, 33060/tcp mysql.1.e14q7a7xqt3h64huwtjc2

Exec command

With exec command you can also get into container:

[root@swmanager ~]# docker exec -it centos /bin/bash

[root@67056cff24e9 /]# ls

bin dev etc home lib lib64 lost+found media mnt opt proc root run sbin srv sys tmp usr var

SSH

Check the container ip address:

[root@swmanager ~]# docker inspect centos | grep -i ipaddress

“SecondaryIPAddresses”: null,

“IPAddress”: “172.17.0.3”,

“IPAddress”: “172.17.0.3”,

In our example SSH connection is refused because there is no running ssh service inside container:

[root@swmanager ~]# ssh root@172.17.0.3

ssh: connect to host 172.17.0.3 port 22: Connection refused

Docker copy file from container to host

Docker troubleshooting is all about identifying problems inside container. There is a way to copy files for analysis from container to host and vice versa. Files can be send from running or stopped container. Paths are relative to container root (path) and the current directory on host where docker command is run. User and group ownership is applied from destination (host) side. But with parameter –a you can preserves container user and group rights. Command used for copying files is docker cp:

Run mysql container:

docker run -dit -p 3306:3306 –name mysql -e MYSQL_ROOT_PASSWORD=password mysql:5.7

Copy log file from container to current directory:

[root@swmanager tmp]# docker cp mysql:/var/log/dpkg.log .

[root@swmanager tmp]# ls

dpkg.log

If you don’t specify –a parameter destination host will take over ownership

[root@swmanager tmp]# docker exec -it mysql /bin/bash

root@b815e68950a0:/var/log# useradd joe

root@b815e68950a0:/var/log# chown joe:joe dpkg.log

ls -la

-rw-r–r– 1 joe 28636 Sep 10 06:54 dpkg.log

Owner of dpkg.log was Joe, now is root user:

[root@swmanager tmp]# docker cp mysql.1.x6rxpcur4vj6l4z931kf2ywyt:/var/log/dpkg.log .

[root@swmanager tmp]# ls -la

total 1

-rw-r–r– 1 root root 28636 Sep 10 08:54 dpkg.log

You can rename log file at destination:

[root@swmanager tmp]# docker cp mysql.1.x6rxpcur4vj6l4z931kf2ywyt:/var/log/dpkg.log mysql.log

Also copy file from host back to container:

[root@swmanager tmp]# docker cp mysql.log mysql.1.x6rxpcur4vj6l4z931kf2ywyt:/var/log/

Check that log file is there:

[root@swmanager tmp]# docker exec -it mysql.1.x6rxpcur4vj6l4z931kf2ywyt /bin/bash

root@b815e68950a0:/# ls -la /var/log/mysql.log

-rw-r–r– 1 root root 28636 Sep 10 06:54 /var/log/mysql.log

Troubleshoot the Docker daemon

Docker daemon starts automatically when machine starts. This ensures that dockerd daemon always runs by default. You can manually stop and start docker service:

[root@swmanager ~]# systemctl stop docker

Typing dockerd will start daemon in foreground.

[root@swmanager ~]# dockerd

INFO[2020-11-02T12:07:52.620216713+01:00] parsed scheme: “unix” module=grpc

INFO[2020-11-02T12:07:52.620320249+01:00] scheme “unix” not registered, fallback to default scheme module=grpc

INFO[2020-11-02T12:07:52.620393949+01:00] parsed scheme: “unix” module=grpc

INFO[2020-11-02T12:07:52.620408835+01:00] scheme “unix” not registered, fallback to default scheme module=grpc

….

Ctrl-C will terminate service.

Configuration of daemon service can be done with configuration file (preferred) or putting flags when running docker daemon.

Configuration file is located at /etc/docker/daemon.json. Usually if not touched it is not created. Let’s put debug mode on:

{

„debug:“ true

}

Restart docker daemon:

[root@swmanager ~]# systemctl daemon-reload

[root@swmanager ~]# systemctl restart docker

[root@swmanager ~]# systemctl stop docker

Same thing can be achieved with flags:

[root@swmanager ~]# dockerd –debug=true

DEBU[2020-11-02T12:14:37.426522035+01:00] Listener created for HTTP on unix (/var/run/docker.sock)

DEBU[2020-11-02T12:14:37.431237114+01:00] Golang’s threads limit set to 5580

With flag –help you can see all available options:

[root@swmanager ~]# dockerd –help

Please make sure that there are no conflicts between docker configuration file and flags which can result in error.

Log level can be set with –log level flag. Values are debug, info, warn, error, fatal. Default is info.

[root@swmanager ~]# dockerd –log-level=”warn”

ERRO[0000] time=”2020-11-02T11:28:42Z” level=info msg=”listening on /run/docker/plugins/sshfs.sock” plugin=113af6aff2fae68d7134adcc71abb652827c37a87433e0689a

Stack trace

In the events of unresponsive daemon you can force flush of full stack trace with

kill -SIGUSR1 $(pidof dockerd)

ps -aux | grep -i docker # to check dockerd process id

How to check full stack trace?

journalctl -u docker.service

On CentOS check /var/log/messages.

Systemctl status docker

Out of memory

Containers by default don’t have any resource constraints. That can lead to unexpected situations where hosts can run out of memory, resulting in the system crash. Linux kernel has mechanism to deal with out of memory situations. Kernel throws out of memory exception and starts killing processes which have lowest OOM (Out of memory) priority. Docker daemon has high OOM but individual containers have very low. This can lead to situation where random containers will be shut down but dockerd daemon will continue to run. It is not recommended to tweak default OOM settings.

To prevent memory situations ensure that your app us running on host with adequate resources, limit the container resources. Best approach is limit the service. There are two types of limits hard and soft. Container will never go above hard limit, but will go above soft limit till certain conditions are met like out of memory exceptions.

Override binaries

Large number of images do not contain classic troubleshooting commands like traceroute, netcat, ip, telnet. Then it is hard to troubleshoot container without right commands. But you can always override binaries and mount host binary folders to container binary folder. It will override the default container binary folders like /bin, /sbin and /lib.

Run container in classic way:

[root@swmanager ~]# docker run -dit –name centos centos

Go inside container:

[root@swmanager ~]# docker attach centos

Check that command netstat is not available:

[root@6ce63969192d /]# netstat

bash: netstat: command not found

Now, run same image overriding the binary folder /bin:

[root@swmanager ~]# docker run -it -v /bin:/bin centos /bin/bash

Command netstat is now available:

[root@84317829b5bd /]# netstat

Active Internet connections (w/o servers)

Proto Recv-Q Send-Q Local Address Foreign Address State

Active UNIX domain sockets (w/o servers)

Proto RefCnt Flags Type State I-Node Path

Active Bluetooth connections (w/o servers)

Proto Destination Source State PSM DCID SCID IMTU OMTU Security

Proto Destination Source State Channel