
Mastering Docker Swarm Monitoring: A Step-by-Step Guide with Prometheus and Grafana
In a distributed environment like Docker Swarm, maintaining visibility into your cluster’s health and performance isn’t a luxury—it’s a necessity. Without proper monitoring, you’re flying blind, unable to proactively address resource bottlenecks, identify failing services, or optimize your infrastructure. Fortunately, a powerful, open-source stack exists to provide deep insights into every layer of your Swarm.
This guide will walk you through setting up a robust monitoring solution for your Docker Swarm cluster using Prometheus for data collection and Grafana for visualization. By the end, you’ll have a real-time dashboard displaying critical metrics for both your cluster nodes and the containers running on them.
The Core Components of Our Monitoring Stack
To achieve comprehensive observability, we will deploy a suite of specialized tools, each with a distinct role:
- Prometheus: The heart of our system. Prometheus is a time-series database that pulls (scrapes) metrics from configured targets, stores them efficiently, and allows for powerful querying using its native language, PromQL.
- Node Exporter: This tool provides crucial host-level metrics. It runs on every node in the Swarm and exposes a wide range of hardware and OS metrics, such as CPU usage, memory consumption, disk space, and network I/O.
- cAdvisor (Container Advisor): Developed by Google, cAdvisor provides deep insights into container performance. It collects, aggregates, and exposes resource usage and performance data for every running container, giving you a granular view of your applications.
- Grafana: The visual front-end. Grafana connects to Prometheus as a data source and transforms raw metrics into beautiful, intuitive, and highly customizable dashboards with graphs, charts, and alerts.
Prerequisites
Before we begin, you should have a functioning Docker Swarm cluster with at least one manager and one worker node. All subsequent steps will be performed from your manager node.
Step 1: Create a Dedicated Network for Monitoring
Isolating your monitoring components on their own network is a security best practice. It ensures that monitoring traffic is contained and services can communicate reliably.
Create a dedicated overlay network that can span across all nodes in the Swarm:
docker network create --driver=overlay monitoring
Step 2: Configure Prometheus to Discover Targets
Prometheus needs to know where to find Node Exporter and cAdvisor to scrape their metrics. We’ll create a configuration file named prometheus.yml that uses Docker’s built-in DNS for service discovery.
Create the prometheus.yml file with the following content:
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'node-exporter'
dns_sd_configs:
- names: ['tasks.node-exporter']
type: 'A'
port: 9100
- job_name: 'cadvisor'
dns_sd_configs:
- names: ['tasks.cadvisor']
type: 'A'
port: 8080
This configuration tells Prometheus to scrape itself, and more importantly, to look for services named tasks.node-exporter and tasks.cadvisor to collect metrics from them. Docker Swarm’s internal DNS will resolve these names to the IP addresses of the respective containers.
Step 3: Define the Full Monitoring Stack with Docker Compose
Now, we’ll bring all the components together in a single docker-compose.yml file. This file defines each service, its image, network connections, and deployment configuration.
Create a file named docker-compose.yml:
version: '3.7'
services:
prometheus:
image: prom/prometheus:latest
networks:
- monitoring
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
deploy:
placement:
constraints:
- node.role == manager
node-exporter:
image: prom/node-exporter:latest
networks:
- monitoring
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
deploy:
mode: global # This ensures it runs on every node
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
networks:
- monitoring
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker/:/var/lib/docker:ro
deploy:
mode: global # This ensures it runs on every node
grafana:
image: grafana/grafana:latest
networks:
- monitoring
ports:
- "3000:3000"
volumes:
- grafana-data:/var/lib/grafana
deploy:
placement:
constraints:
- node.role == manager
volumes:
prometheus-data:
grafana-data:
networks:
monitoring:
external: true
Key configurations in this file:
- Deployment Modes: Node Exporter and cAdvisor are deployed in
globalmode, meaning Docker Swarm will automatically run one instance on every node in the cluster. - Volume Mounts: We use named volumes (
prometheus-data,grafana-data) for data persistence. This ensures your metrics and dashboards survive container restarts. We also mount host directories into Node Exporter and cAdvisor to allow them to access host-level and container-level data. - Placement Constraints: Prometheus and Grafana are constrained to run only on manager nodes for centralized access and control.
Step 4: Deploy the Monitoring Stack
With the configuration files in place, deploy the entire stack with a single command from your manager node:
docker stack deploy -c docker-compose.yml monitor
You can check the status of your newly deployed services by running docker stack ps monitor. Wait for all services to show a Running state.
Step 5: Configure Grafana and Visualize Your Data
Your monitoring backend is now running. The final step is to configure Grafana to visualize the metrics collected by Prometheus.
Access Grafana: Open your web browser and navigate to
http://<YOUR_SWARM_MANAGER_IP>:3000. The default login credentials are admin / admin. You will be prompted to change the password on your first login.Add Prometheus as a Data Source:
- Click the gear icon on the left sidebar to go to Configuration > Data Sources.
- Click “Add data source” and select Prometheus.
- In the URL field, enter
http://prometheus:9090. Since Grafana and Prometheus are on the same Docker network, they can communicate using their service names. - Click “Save & Test”. You should see a green confirmation message.
Import a Dashboard:
- The best way to get started is by importing a pre-built community dashboard.
- Click the “+” icon on the left sidebar and select “Import”.
- For a great Node Exporter dashboard, enter the ID
1860into the “Import via grafana.com” field and click “Load”. - On the next screen, select your Prometheus data source from the dropdown menu and click “Import”.
You will instantly be presented with a comprehensive dashboard showing detailed real-time metrics for every node in your Docker Swarm cluster, including CPU, memory, disk usage, and network statistics. You can find other dashboards for cAdvisor (e.g., ID 13989) and other services on the official Grafana Dashboards website.
Key Security and Operational Tips
- Secure Grafana: Always change the default admin password. For production environments, consider setting up authentication via OAuth or LDAP.
- Set Up Alerts: Now that you have data, configure alerting rules in Grafana. You can set up notifications via Slack, email, or other channels to be proactively informed about issues like high CPU usage or low disk space.
- Manage Data Retention: By default, Prometheus stores data for 15 days. You can configure this retention period using the
--storage.tsdb.retention.timeflag in the Prometheus service definition to match your storage capacity and needs.
By following this guide, you have successfully deployed a powerful, scalable, and open-source monitoring solution that provides the critical visibility needed to operate a healthy and efficient Docker Swarm cluster.
Source: https://kifarunix.com/monitor-docker-swarm-node-metrics-using-grafana/


