Docker Swarm? It is about time. Because we need some kind of cluster and orchestration tool for all these containers. There is no gain of one or more containers on a single host where we can’t run anything serious and productive in a practical way. In this chapter, we will try to present you the basics of Docker Swarm, what is used for with few good examples. Let’s go…
What is Docker Swarm?
Docker Swarm is a tool for cluster creation, orchestration and management of swarm nodes. Nodes are physical or virtual machines that make up a Swarm cluster. Swarm makes running containers highly available. It offers a whole range of benefits such as high container availability, redundancy, scaling, service upgrades, load balancing and much more. The big advantage of swarming over regular containers is that any configuration such as network and storage modification can be done live without degrading the service. It is important to note that it is possible to run standalone containers in Docker swarm that are not managed through the Docker swarm manager.
Docker Swarm components
Docker architecture consists of managers (membership and delegation) and workers (swarm services). Docker node can be both manager and worker at the same time. Docker services are the core of Docker swarm. We can think have services as a set of containers that run on Docker swarm and make up some functional entity, i.e. they provide some service. These services are run with defined resource needs like network, storage, number of replicas, and list of open ports to the outside world, etc. The task of the Docker manager is to keep the service state, which means that if one of the nodes fails unexpectedly, the container is recreated on the first available node.
There are two types of nodes: manager and worker. You can run Docker swarm on your laptop, but production nodes run on multiple production machines and cloud. Request to create a service is sent to the node manager, who then distributes the tasks to the worker nodes that do the installation and configuration of the service. The manager make sure that the cluster status is maintained. Workers report to the node manager the status of the service through the agent installed on each node.
Manager node is in charge of maintaining cluster status and service scheduling. In a test environment, it is possible to have one manager instance running, but an odd number of manager nodes larger than one is always recommended for production. It is important to know how many manager outages can be tolerated according to total number of managers in the cluster. It is calculated by the formula (n-1) / 2. For example, in a cluster with three manager nodes, a single node outage can be tolerated (3-1/2=1). A minimum of seven manager nodes in production is recommended. Manager node maintains the cluster status via an internally distributed datastore.
Worker nodes receives commands from the manager node and creates services. By default, all manager nodes are also worker nodes. If you do not want containers to be created on manager nodes, set the availability of the manager to a drain where the manager will be excluded when instantiating the container. Each worker can be promoted to manager node with the Docker node promote command.
Services and tasks
When swarm manager receives a request through the API, it allocates services to worker nodes depending on the number of replicas defined. For example, if there is a request for three replicas, the manager creates three separate tasks on three workers who execute it independently of one another. If one of the tasks fails, the manager creates another task because it has to maintain the cluster state. Each task has its own natural status flow: assigned, running, and so on. Service is a description of what something should look like while the task is the one doing the job.
Replicated and global services
We can recognize two types of services by type of delivery: replicated and global. Replicated services are all services for which we define a set of identical tasks, i.e… replicas that we want to deliver. For example, installing a web server in three copies can fall into this category. It is important to note that the number of replicas may not correspond to the number nodes. Having four nodes and three replicas leaves one node without given service. With global services, the swarm manager runs one task on just every node. One example is the antivirus software that should be run on each node.
The assignment does the job described by service. There are several types of task statuses like new, istandby and started. Use the command Docker service ps NAMEOFSERVICE to detect the status of the task.
The Swarm manager uses ingress load balancing to expose the service to an external world through published ports. It is the same technique used with regular container port forwarding. For swarms, this is called the Published Port, which is assigned automatically if not defined when creating the service. An important swarm component is the internal DNS, which assigns a service name to each service. Last but not least is ingress load balancing that balances traffic to individual service containers based on DNS names.
Examples will cover creating a swarm cluster, adding manager and worker nodes, installing services, and managing swarm.
- 3 node (physical or virtual machines) – 1 manager, 2 workers
- Open ports between nodes: 2377 (TCP), 7946 / TCP / UDP) and 4789 (UDP)
- IP address of manager node available to host
- Docker installed
Example1 (Docker swarm initialization)
Initialize Docker swarm with Docker swarm init:
Please add the advertise-addr parameter and the IP address where the swarm manager will advertise itself so that other nodes can reach it.
Please copy the token you get from the output of the command to add a worker to the swarm later.
With the Docker info command, we can get a lot of information together with the role of the current node:
If you want ot check node list use docker node ls:
We can see that this node is leader manager because we initialized swarm from it.
To check which containers the service includes and how containers are scheduled on nodes, use the command Docker service ps IMESERVIS:
Example2 (Adding node to swarm)
When the swarm is active it’s time to add two workers to the swarm cluster. Use the token from the previous output (example 1).
Let’s check swarm cluster members:
We can see two members: manager and worker.
Repeat all for second worker:
Primjer3 (Service creation)
Create service with two replicas with Docker service create:
Notice the replicas parameter where we have defined that two identical centos service replicas will be created.
View service status with Docker service ls:
For more details Docker service inspect centos:
Example 3 (service scaling)
If you want to increase the number of containers in the service, this can be done with Docker service scale command.
Service with two replicas are now scaling to three replicas:
We can decrease the number of replicas in the same way:
Example 4 (Delete service)
Delete service with Docker service rm:
Example 5 (Rolling updates)
During rolling updates we use command Docker service create and following parameters:
We always use the update-delay parameter, which tells us how much we want to wait before we start upgrading another container in the service. This is true only when first container is upgraded, up and running and in active state. By default, the number of tasks is set to 1, so an update delay waits for one container to update for 10s and then moves to another. If you want the update to be done on multiple containers at the same time, we set the update parralelism to higher values. Initially, if at any time the service upgrade fails, the entire process is not executed. This can be changed with the update-failure-action parameter.
Example includes mysql service which runs on older 5.7 version:
We created mysql service (5.7) and that’s why we specified image name mysql: 5.7, two replicas and update delay of 10 seconds.
Analyze service settings:
We can see our defined parameters together with some defaults values like parralleism=1.
It is time to perform rolling update to latest mysql version:
What is really going on here:
- Because we have installed mysql service in two replicas, one replica was stopped
- An update schedule for the stopped container (task) is started
- The upgraded container is started, delay period of 10 seconds is started only when task status of previous container is RUNNING.
- Next container (task) is shut down and the upgrade started
- If the status is FAILED at any time, the upgrade is suspended
Let’s check service details (notice that now we have latest mysql image used):
Example6 (service restart)
Example 7 (putting node in drain mode)
If you want the node to be temporary or permanently excluded from the swarm cluster, for example maintenance operations, a drain operation is performed. When the node is in drain mode, the swarm manager no longer assigns tasks to it and distributes all active nodes in that worker to others. Let’s see how it works.
Check the current status and container placement:
Now there are three replicas and each container is placed on another node.
Let’s put worker 2 in drain mode:
We can see drain status for worker2:
We can see that there is no containers running on worker2:
Two things happened:
- All containers on the drain node are recreated on another node
- Swarm manager no longer considers wokrer2 node as active and does not assign any tasks to it
Bring the node back to normal mode:
Swarm cluster in routing mesh mode
Routing mesh is a Docker swarm cluster functionality that offers service load balancing and port forwarding to the outside world. Routing mesh is exclusively used for inbound traffic, source traffic originating from the outside world. Destination is service running in the Docker swarm container. Make sure that the following ports are opened – 4789 (UDP), 7946 (TCP/UDP) as well as any ports that must be opened to communicate with external services such as load balancers.
When making ports accessible, we define two parameters: published (the port we want to make available) and target (the port that runs in the container itself). If we omit the published port, swarm will assign a random port. When accessing a published port, the internal Docker swarm balancer forwards traffic to the active container to its original port.
Example (Service and published ports)
We will create a service with popular load balancer nginx with two replicas. The internal nginx port is 80. We will map it to host port 8080, which in our case will be that published port:
You can later change port configuration:
External load balancer
Production environments should always include external load balancer that will be an additional component in balancing the system. This requires opening ports between the swarm nodes and the external load balancer.