Docker Monitoring and Logging Guide

Proper monitoring and logging are essential for maintaining healthy containerized applications. Docker Compose provides several approaches to collect, aggregate, and analyze logs and metrics. This guide covers various monitoring and logging strategies for Docker environments.

Logging Configuration

Docker provides several logging drivers that can be configured in your Docker Compose files. The right logging approach depends on your application's scale, complexity, and monitoring requirements.

Basic Logging Setup

The simplest way to configure logging is using the built-in json-file driver with rotation settings to prevent log files from consuming too much disk space:

# docker-compose.yml
version: '3.8'

services:
  web:
    image: nginx:alpine
    logging:
      driver: "json-file"
      options:
        max-size: "10m"
        max-file: "3"
    ports:
      - "80:80"

  app:
    build: .
    logging:
      driver: "syslog"
      options:
        syslog-address: "tcp://localhost:514"
        tag: "myapp"

Centralized Logging with ELK Stack

For production environments, the ELK Stack (Elasticsearch, Logstash, Kibana) provides a robust centralized logging solution. This setup allows you to collect logs from multiple containers and services, parse them, and visualize them through a web interface:

# docker-compose.logging.yml
version: '3.8'

services:
  elasticsearch:
    image: docker.elastic.co/elasticsearch/elasticsearch:7.14.0
    environment:
      - discovery.type=single-node
      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    volumes:
      - es_data:/usr/share/elasticsearch/data
    ports:
      - "9200:9200"
    networks:
      - logging

  logstash:
    image: docker.elastic.co/logstash/logstash:7.14.0
    volumes:
      - ./logstash/config:/usr/share/logstash/pipeline
    ports:
      - "5000:5000"
    depends_on:
      - elasticsearch
    networks:
      - logging

  kibana:
    image: docker.elastic.co/kibana/kibana:7.14.0
    ports:
      - "5601:5601"
    environment:
      - ELASTICSEARCH_HOSTS=http://elasticsearch:9200
    depends_on:
      - elasticsearch
    networks:
      - logging

  # Application services with logging
  web:
    image: nginx:alpine
    logging:
      driver: "gelf"
      options:
        gelf-address: "udp://localhost:12201"
        tag: "nginx"
    depends_on:
      - logstash
    networks:
      - app
      - logging

  app:
    build: .
    logging:
      driver: "gelf"
      options:
        gelf-address: "udp://localhost:12201"
        tag: "app"
    depends_on:
      - logstash
    networks:
      - app
      - logging

volumes:
  es_data:

networks:
  app:
  logging:

Logging with Fluentd

Fluentd is a lightweight alternative to the ELK Stack for log collection and processing. It can collect logs from various sources, process them, and forward them to multiple destinations like Elasticsearch, S3, or cloud logging services:

# docker-compose.fluentd.yml
version: '3.8'

services:
  fluentd:
    build: ./fluentd
    volumes:
      - ./fluentd/conf:/fluentd/etc
      - ./logs:/var/log
    ports:
      - "24224:24224"
      - "24224:24224/udp"
    networks:
      - logging

  web:
    image: nginx:alpine
    logging:
      driver: "fluentd"
      options:
        fluentd-address: localhost:24224
        tag: httpd.access
    ports:
      - "80:80"
    depends_on:
      - fluentd
    networks:
      - app
      - logging

networks:
  app:
  logging:

Monitoring with Prometheus and Grafana

While logging focuses on events and messages, monitoring captures metrics and performance data over time. Prometheus and Grafana are powerful open-source tools for container monitoring.

Complete Monitoring Stack

This stack provides comprehensive monitoring with:

Prometheus for metrics collection and storage
Grafana for visualization and dashboards
Node Exporter for host-level metrics
cAdvisor for container-level metrics

# docker-compose.monitoring.yml
version: '3.8'

services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--web.console.libraries=/etc/prometheus/console_libraries'
      - '--web.console.templates=/etc/prometheus/consoles'
      - '--storage.tsdb.retention.time=200h'
      - '--web.enable-lifecycle'
    networks:
      - monitoring

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
      - ./grafana/provisioning:/etc/grafana/provisioning
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=grafana
      - GF_USERS_ALLOW_SIGN_UP=false
    depends_on:
      - prometheus
    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.rootfs=/rootfs'
      - '--path.sysfs=/host/sys'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    ports:
      - "9100:9100"
    networks:
      - monitoring

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    container_name: cadvisor
    restart: unless-stopped
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /sys:/sys:ro
      - /var/lib/docker:/var/lib/docker:ro
    ports:
      - "8080:8080"
    networks:
      - monitoring

  # Your application services with metrics
  web:
    image: nginx:alpine
    ports:
      - "80:80"
    volumes:
      - ./nginx/nginx.conf:/etc/nginx/nginx.conf
    networks:
      - app
      - monitoring

volumes:
  prometheus_data:
  grafana_data:

networks:
  app:
  monitoring:

Prometheus Configuration

# prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node-exporter'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'cadvisor'
    static_configs:
      - targets: ['cadvisor:8080']

  - job_name: 'nginx'
    static_configs:
      - targets: ['web:9113']

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

Application Performance Monitoring (APM)

Metrics and logs provide valuable information, but for complex microservices architectures, distributed tracing helps understand request flows and identify bottlenecks.

Jaeger for Distributed Tracing

Jaeger is an open-source distributed tracing system that helps track requests as they flow through distributed systems:

# docker-compose.tracing.yml
version: '3.8'

services:
  jaeger:
    image: jaegertracing/all-in-one:latest
    ports:
      - "16686:16686"
      - "14268:14268"
    environment:
      - COLLECTOR_OTLP_ENABLED=true
    networks:
      - tracing

  # Application with tracing
  app:
    build: .
    environment:
      - JAEGER_AGENT_HOST=jaeger
      - JAEGER_AGENT_PORT=6831
      - JAEGER_SAMPLER_TYPE=const
      - JAEGER_SAMPLER_PARAM=1
    depends_on:
      - jaeger
    networks:
      - app
      - tracing

networks:
  app:
  tracing:

Health Checks and Service Discovery

Docker supports built-in health checks to monitor the health of your containers. These checks help with automatic restarts, service discovery, and preventing routing to unhealthy containers.

Advanced Health Monitoring

Health checks can be configured for various services to ensure they're responding correctly:

# docker-compose.health.yml
version: '3.8'

services:
  web:
    image: nginx:alpine
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s
    ports:
      - "80:80"

  db:
    image: postgres:13-alpine
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U ${POSTGRES_USER} -d ${POSTGRES_DB}"]
      interval: 10s
      timeout: 5s
      retries: 5
    environment:
      - POSTGRES_DB=myapp
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password

  redis:
    image: redis:6-alpine
    healthcheck:
      test: ["CMD", "redis-cli", "ping"]
      interval: 10s
      timeout: 3s
      retries: 3

  # Health check aggregator
  healthcheck:
    image: willfarrell/docker-healthcheck
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    ports:
      - "8000:8000"

Log Management Commands

Once your logging infrastructure is set up, you'll need to know how to access, filter, and manage your logs effectively.

Useful Logging Commands

Here are essential Docker Compose commands for working with logs:

# View logs for all services
docker-compose logs

# Follow logs in real-time
docker-compose logs -f

# View logs for specific service
docker-compose logs web

# View logs with timestamps
docker-compose logs -t

# Limit log output
docker-compose logs --tail=50

# View logs since specific time
docker-compose logs --since="2024-01-01T12:00:00"

# Export logs to file
docker-compose logs > application.log

# Search logs for errors
docker-compose logs | grep -i error

# Monitor multiple services
docker-compose logs -f web db redis

Monitoring Commands

In addition to dashboard-based monitoring, Docker provides CLI commands for real-time resource monitoring and troubleshooting.

Container Resource Monitoring

Use these commands for quick insights into container performance:

# Real-time resource usage
docker stats $(docker-compose ps -q)

# Detailed container inspection
docker-compose top

# Check container health status
docker-compose ps

# Monitor specific service
docker stats container_name

# Export metrics to file
docker stats --no-stream $(docker-compose ps -q) > metrics.txt

Alerting Configuration

Monitoring is only effective if you're notified when things go wrong. Alertmanager integrates with Prometheus to handle alerts and route them to the appropriate notification channels.

Alertmanager Setup

This setup allows you to define alerting rules and notification channels:

# docker-compose.alerting.yml
version: '3.8'

services:
  alertmanager:
    image: prom/alertmanager:latest
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml
      - alertmanager_data:/alertmanager
    command:
      - '--config.file=/etc/alertmanager/alertmanager.yml'
      - '--storage.path=/alertmanager'
      - '--web.external-url=http://localhost:9093'
    networks:
      - monitoring

volumes:
  alertmanager_data:

networks:
  monitoring:
    external: true

Sample Alert Rules

# prometheus/alert_rules.yml
groups:
- name: docker_alerts
  rules:
  - alert: ContainerDown
    expr: up == 0
    for: 5m
    labels:
      severity: critical
    annotations:
      summary: "Container {{ $labels.instance }} is down"
      description: "Container {{ $labels.instance }} has been down for more than 5 minutes."

  - alert: HighMemoryUsage
    expr: (container_memory_usage_bytes / container_spec_memory_limit_bytes) * 100 > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High memory usage on {{ $labels.name }}"
      description: "Container {{ $labels.name }} is using {{ $value }}% of memory."

  - alert: HighCPUUsage
    expr: rate(container_cpu_usage_seconds_total[5m]) * 100 > 80
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "High CPU usage on {{ $labels.name }}"
      description: "Container {{ $labels.name }} CPU usage is above 80%."

Best Practices for Monitoring and Logging

Structured Logging: Use JSON format for logs to enable better parsing
Log Rotation: Configure log rotation to prevent disk space issues
Centralized Logging: Use centralized logging solutions for production
Metrics Collection: Implement application-specific metrics
Alerting: Set up alerts for critical issues
Regular Monitoring: Monitor resource usage and performance trends
Log Security: Ensure logs don't contain sensitive information
Backup Monitoring Data: Regularly backup monitoring and logging data

This monitoring and logging setup provides comprehensive observability for your Docker Compose applications, enabling you to track performance, debug issues, and maintain high availability. By implementing these practices, you'll create a robust monitoring strategy that helps ensure the reliability and performance of your containerized applications.

Conclusion

Setting up proper monitoring and logging for your Docker environment is a critical aspect of maintaining production-ready applications. By combining logging solutions like ELK or Fluentd with metrics-based monitoring tools like Prometheus and Grafana, you can achieve complete visibility into your application's health and performance.

Remember that the specific tools and configurations you choose should align with your application's scale, complexity, and specific needs. Start with the basics and expand your monitoring capabilities as your application grows.

Table of Contents