Monitoring Docker Swarm Node Metrics with Grafana

10/11/2025

0 Views 0

SaveSavedRemoved 0

Monitoring Docker Swarm Node Metrics with Grafana

Mastering Docker Swarm Monitoring: A Step-by-Step Guide with Prometheus and Grafana

In a distributed environment like Docker Swarm, maintaining visibility into your cluster’s health and performance isn’t a luxury—it’s a necessity. Without proper monitoring, you’re flying blind, unable to proactively address resource bottlenecks, identify failing services, or optimize your infrastructure. Fortunately, a powerful, open-source stack exists to provide deep insights into every layer of your Swarm.

This guide will walk you through setting up a robust monitoring solution for your Docker Swarm cluster using Prometheus for data collection and Grafana for visualization. By the end, you’ll have a real-time dashboard displaying critical metrics for both your cluster nodes and the containers running on them.

The Core Components of Our Monitoring Stack

To achieve comprehensive observability, we will deploy a suite of specialized tools, each with a distinct role:

Prometheus: The heart of our system. Prometheus is a time-series database that pulls (scrapes) metrics from configured targets, stores them efficiently, and allows for powerful querying using its native language, PromQL.
Node Exporter: This tool provides crucial host-level metrics. It runs on every node in the Swarm and exposes a wide range of hardware and OS metrics, such as CPU usage, memory consumption, disk space, and network I/O.
cAdvisor (Container Advisor): Developed by Google, cAdvisor provides deep insights into container performance. It collects, aggregates, and exposes resource usage and performance data for every running container, giving you a granular view of your applications.
Grafana: The visual front-end. Grafana connects to Prometheus as a data source and transforms raw metrics into beautiful, intuitive, and highly customizable dashboards with graphs, charts, and alerts.

Prerequisites

Before we begin, you should have a functioning Docker Swarm cluster with at least one manager and one worker node. All subsequent steps will be performed from your manager node.

Step 1: Create a Dedicated Network for Monitoring

Isolating your monitoring components on their own network is a security best practice. It ensures that monitoring traffic is contained and services can communicate reliably.

Create a dedicated overlay network that can span across all nodes in the Swarm:

docker network create --driver=overlay monitoring

Step 2: Configure Prometheus to Discover Targets

Prometheus needs to know where to find Node Exporter and cAdvisor to scrape their metrics. We’ll create a configuration file named prometheus.yml that uses Docker’s built-in DNS for service discovery.

Create the prometheus.yml file with the following content:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    dns_sd_configs:
      - names: ['tasks.node-exporter']
        type: 'A'
        port: 9100

  - job_name: 'cadvisor'
    dns_sd_configs:
      - names: ['tasks.cadvisor']
        type: 'A'
        port: 8080

This configuration tells Prometheus to scrape itself, and more importantly, to look for services named tasks.node-exporter and tasks.cadvisor to collect metrics from them. Docker Swarm’s internal DNS will resolve these names to the IP addresses of the respective containers.

Step 3: Define the Full Monitoring Stack with Docker Compose

Now, we’ll bring all the components together in a single docker-compose.yml file. This file defines each service, its image, network connections, and deployment configuration.

Create a file named docker-compose.yml:

version: '3.7'

services:
  prometheus:
    image: prom/prometheus:latest
    networks:
      - monitoring
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    deploy:
      placement:
        constraints:
          - node.role == manager

  node-exporter:
    image: prom/node-exporter:latest
    networks:
      - monitoring
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    deploy:
      mode: global # This ensures it runs on every node

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    networks:
      - monitoring
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
    deploy:
      mode: global # This ensures it runs on every node

  grafana:
    image: grafana/grafana:latest
    networks:
      - monitoring
    ports:
      - "3000:3000"
    volumes:
      - grafana-data:/var/lib/grafana
    deploy:
      placement:
        constraints:
          - node.role == manager

volumes:
  prometheus-data:
  grafana-data:

networks:
  monitoring:
    external: true

Key configurations in this file:

Deployment Modes: Node Exporter and cAdvisor are deployed in global mode, meaning Docker Swarm will automatically run one instance on every node in the cluster.
Volume Mounts: We use named volumes (prometheus-data, grafana-data) for data persistence. This ensures your metrics and dashboards survive container restarts. We also mount host directories into Node Exporter and cAdvisor to allow them to access host-level and container-level data.
Placement Constraints: Prometheus and Grafana are constrained to run only on manager nodes for centralized access and control.

Step 4: Deploy the Monitoring Stack

With the configuration files in place, deploy the entire stack with a single command from your manager node:

docker stack deploy -c docker-compose.yml monitor

You can check the status of your newly deployed services by running docker stack ps monitor. Wait for all services to show a Running state.

Step 5: Configure Grafana and Visualize Your Data

Your monitoring backend is now running. The final step is to configure Grafana to visualize the metrics collected by Prometheus.

Access Grafana: Open your web browser and navigate to http://<YOUR_SWARM_MANAGER_IP>:3000. The default login credentials are admin / admin. You will be prompted to change the password on your first login.
Add Prometheus as a Data Source:
- Click the gear icon on the left sidebar to go to Configuration > Data Sources.
- Click “Add data source” and select Prometheus.
- In the URL field, enter http://prometheus:9090. Since Grafana and Prometheus are on the same Docker network, they can communicate using their service names.
- Click “Save & Test”. You should see a green confirmation message.
Import a Dashboard:
- The best way to get started is by importing a pre-built community dashboard.
- Click the “+” icon on the left sidebar and select “Import”.
- For a great Node Exporter dashboard, enter the ID 1860 into the “Import via grafana.com” field and click “Load”.
- On the next screen, select your Prometheus data source from the dropdown menu and click “Import”.

You will instantly be presented with a comprehensive dashboard showing detailed real-time metrics for every node in your Docker Swarm cluster, including CPU, memory, disk usage, and network statistics. You can find other dashboards for cAdvisor (e.g., ID 13989) and other services on the official Grafana Dashboards website.

Key Security and Operational Tips

Secure Grafana: Always change the default admin password. For production environments, consider setting up authentication via OAuth or LDAP.
Set Up Alerts: Now that you have data, configure alerting rules in Grafana. You can set up notifications via Slack, email, or other channels to be proactively informed about issues like high CPU usage or low disk space.
Manage Data Retention: By default, Prometheus stores data for 15 days. You can configure this retention period using the --storage.tsdb.retention.time flag in the Prometheus service definition to match your storage capacity and needs.

By following this guide, you have successfully deployed a powerful, scalable, and open-source monitoring solution that provides the critical visibility needed to operate a healthy and efficient Docker Swarm cluster.

Source: https://kifarunix.com/monitor-docker-swarm-node-metrics-using-grafana/