Grafana
Grafana is an open-source platform for monitoring and observability that allows you to query, visualize, alert on, and understand your metrics no matter where they are stored. It's particularly powerful in containerized environments where it can aggregate data from multiple sources and provide unified dashboards for comprehensive system monitoring.
Table of Contents
- Overview
- Key Features
- Installation Methods
- Configuration
- Data Sources
- Dashboard Management
- Alerting
- Security
- Container Deployment
- Best Practices
- Troubleshooting
- Resources
Overview
Grafana serves as a centralized visualization platform that connects to various data sources to create comprehensive dashboards and monitoring solutions. In containerized environments, Grafana excels at:
- Unified Monitoring: Consolidating metrics from multiple container platforms
- Real-time Visualization: Displaying live data through customizable dashboards
- Alerting: Proactive notification system for critical events
- Multi-tenancy: Supporting multiple teams and organizations
- Extensibility: Plugin ecosystem for enhanced functionality
Architecture Components
- Grafana Server: Core application handling UI, API, and dashboard rendering
- Database: Stores dashboards, users, and configuration (SQLite, MySQL, PostgreSQL)
- Data Sources: External systems providing metrics and logs
- Plugins: Extensions for additional functionality
- Authentication: User management and access control
Key Features
Visualization Capabilities
- Multiple Panel Types: Time series, bar charts, heatmaps, tables, stat panels
- Interactive Dashboards: Drill-down capabilities and dynamic filtering
- Template Variables: Dynamic dashboard content based on selections
- Annotations: Contextual information overlay on graphs
- Alerting Integration: Visual indicators for alert states
Data Source Support
- Time Series Databases: Prometheus, InfluxDB, Graphite
- Logging Platforms: Elasticsearch, Loki, Splunk
- Cloud Platforms: CloudWatch, Azure Monitor, Google Cloud Monitoring
- Databases: MySQL, PostgreSQL, Microsoft SQL Server
- Application Monitoring: Jaeger, Zipkin, New Relic
Advanced Features
- Team Management: Role-based access control
- Dashboard Sharing: Public dashboards and snapshot sharing
- API Access: Programmatic dashboard and configuration management
- Provisioning: Configuration as code for automated deployments
- Plugin Ecosystem: Extensive marketplace for additional functionality
Installation Methods
Docker Container (Recommended)
# Basic Grafana container
docker run -d \
--name grafana \
-p 3000:3000 \
-v grafana-storage:/var/lib/grafana \
grafana/grafana:latest
# With environment variables
docker run -d \
--name grafana \
-p 3000:3000 \
-e "GF_SECURITY_ADMIN_USER=admin" \
-e "GF_SECURITY_ADMIN_PASSWORD=your-password" \
-v grafana-storage:/var/lib/grafana \
grafana/grafana:latest
Docker Compose
version: '3.8'
services:
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_USERS_ALLOW_SIGN_UP=false
volumes:
- grafana-data:/var/lib/grafana
- ./config/grafana.ini:/etc/grafana/grafana.ini
- ./dashboards:/var/lib/grafana/dashboards
- ./provisioning:/etc/grafana/provisioning
networks:
- monitoring
restart: unless-stopped
volumes:
grafana-data:
networks:
monitoring:
driver: bridge
Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
containers:
- name: grafana
image: grafana/grafana:latest
ports:
- containerPort: 3000
env:
- name: GF_SECURITY_ADMIN_USER
value: "admin"
- name: GF_SECURITY_ADMIN_PASSWORD
valueFrom:
secretKeyRef:
name: grafana-secret
key: password
volumeMounts:
- name: grafana-storage
mountPath: /var/lib/grafana
- name: grafana-config
mountPath: /etc/grafana
volumes:
- name: grafana-storage
persistentVolumeClaim:
claimName: grafana-pvc
- name: grafana-config
configMap:
name: grafana-config
---
apiVersion: v1
kind: Service
metadata:
name: grafana-service
namespace: monitoring
spec:
selector:
app: grafana
ports:
- port: 3000
targetPort: 3000
type: LoadBalancer
Package Manager Installation
# Ubuntu/Debian
sudo apt-get install -y software-properties-common
sudo add-apt-repository "deb https://packages.grafana.com/oss/deb stable main"
wget -q -O - https://packages.grafana.com/gpg.key | sudo apt-key add -
sudo apt-get update
sudo apt-get install grafana
# CentOS/RHEL
sudo yum install https://dl.grafana.com/oss/release/grafana-9.0.0-1.x86_64.rpm
# Enable and start service
sudo systemctl enable grafana-server
sudo systemctl start grafana-server
Configuration
Main Configuration File
The primary configuration is handled through grafana.ini
:
[default]
# Paths
data = /var/lib/grafana
logs = /var/log/grafana
plugins = /var/lib/grafana/plugins
provisioning = /etc/grafana/provisioning
[server]
# Protocol (http, https, socket)
protocol = http
http_addr = 0.0.0.0
http_port = 3000
domain = localhost
root_url = %(protocol)s://%(domain)s:%(http_port)s/
[database]
type = sqlite3
host = 127.0.0.1:3306
name = grafana
user = root
password =
[session]
provider = file
provider_config = sessions
[security]
admin_user = admin
admin_password = admin
secret_key = SW2YcwTIb9zpOOhoPsMm
disable_gravatar = false
[users]
allow_sign_up = false
allow_org_create = false
auto_assign_org = true
auto_assign_org_role = Viewer
[auth.anonymous]
enabled = false
[smtp]
enabled = true
host = localhost:587
user =
password =
from_address = admin@grafana.localhost
Environment Variables
Key environment variables for container deployments:
# Security
GF_SECURITY_ADMIN_USER=admin
GF_SECURITY_ADMIN_PASSWORD=secure_password
GF_SECURITY_SECRET_KEY=your_secret_key
# Server settings
GF_SERVER_HTTP_PORT=3000
GF_SERVER_DOMAIN=your-domain.com
GF_SERVER_ROOT_URL=https://your-domain.com
# Database
GF_DATABASE_TYPE=postgres
GF_DATABASE_HOST=postgres:5432
GF_DATABASE_NAME=grafana
GF_DATABASE_USER=grafana
GF_DATABASE_PASSWORD=password
# Users
GF_USERS_ALLOW_SIGN_UP=false
GF_USERS_AUTO_ASSIGN_ORG=true
GF_USERS_AUTO_ASSIGN_ORG_ROLE=Viewer
# Authentication
GF_AUTH_ANONYMOUS_ENABLED=false
GF_AUTH_DISABLE_LOGIN_FORM=false
Provisioning
Grafana supports configuration as code through provisioning:
Data Sources Provisioning
Create /etc/grafana/provisioning/datasources/datasources.yml
:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
editable: false
- name: Loki
type: loki
access: proxy
url: http://loki:3100
editable: false
- name: InfluxDB
type: influxdb
access: proxy
url: http://influxdb:8086
database: monitoring
user: grafana
password: password
editable: false
Dashboard Provisioning
Create /etc/grafana/provisioning/dashboards/dashboards.yml
:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
updateIntervalSeconds: 10
allowUiUpdates: true
options:
path: /var/lib/grafana/dashboards
Data Sources
Prometheus Integration
Prometheus is the most common data source for container monitoring:
# Prometheus configuration
- name: Prometheus
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
jsonData:
timeInterval: "30s"
httpMethod: "POST"
queryTimeout: "60s"
Common Queries for Container Monitoring
# CPU Usage by Container
sum(rate(container_cpu_usage_seconds_total[5m])) by (name)
# Memory Usage by Container
sum(container_memory_usage_bytes) by (name)
# Network Traffic
sum(rate(container_network_receive_bytes_total[5m])) by (name)
# Disk I/O
sum(rate(container_fs_reads_bytes_total[5m])) by (name)
InfluxDB Integration
For time-series data storage:
- name: InfluxDB
type: influxdb
url: http://influxdb:8086
database: monitoring
user: grafana
password: ${INFLUXDB_PASSWORD}
jsonData:
timeInterval: "15s"
httpMode: "GET"
Elasticsearch/Loki for Logs
- name: Loki
type: loki
url: http://loki:3100
jsonData:
maxLines: 1000
derivedFields:
- name: "traceID"
matcherRegex: "trace_id=(\\w+)"
url: "${__value.raw}"
Dashboard Management
Dashboard Best Practices
- Consistent Layout: Use standard panel sizes and arrangements
- Meaningful Names: Clear titles and descriptions for all panels
- Appropriate Time Ranges: Set sensible default time windows
- Template Variables: Use variables for dynamic filtering
- Alert Integration: Include alert status indicators
Template Variables
// Query variable for container selection
label_values(container_cpu_usage_seconds_total, name)
// Custom variable for environment
production,staging,development
// Ad-hoc filters for dynamic exploration
Panel Configuration Examples
Time Series Panel
{
"targets": [
{
"expr": "rate(container_cpu_usage_seconds_total{name=~\"$container\"}[5m])",
"legendFormat": "{{name}}",
"refId": "A"
}
],
"title": "Container CPU Usage",
"type": "timeseries",
"fieldConfig": {
"defaults": {
"unit": "percent",
"min": 0,
"max": 100
}
}
}
Stat Panel
{
"targets": [
{
"expr": "count(up{job=\"node-exporter\"})",
"refId": "A"
}
],
"title": "Active Nodes",
"type": "stat",
"options": {
"colorMode": "background",
"graphMode": "area",
"justifyMode": "center"
}
}
Dashboard Organization
- Folders: Organize dashboards by team, environment, or service
- Tags: Use consistent tagging for easy discovery
- Permissions: Set appropriate access levels
- Favorites: Star frequently used dashboards
- Search: Implement searchable naming conventions
Alerting
Alert Rule Configuration
Grafana's unified alerting system provides comprehensive monitoring:
# Example alert rule
groups:
- name: container_alerts
rules:
- alert: HighCPUUsage
expr: rate(container_cpu_usage_seconds_total[5m]) > 0.8
for: 5m
labels:
severity: warning
service: "{{ $labels.name }}"
annotations:
summary: "High CPU usage detected"
description: "Container {{ $labels.name }} CPU usage is above 80%"
Notification Channels
Slack Integration
{
"type": "slack",
"settings": {
"url": "https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK",
"channel": "#alerts",
"username": "Grafana",
"title": "Alert: {{ .GroupLabels.alertname }}",
"text": "{{ .CommonAnnotations.description }}"
}
}
Email Notifications
{
"type": "email",
"settings": {
"addresses": "admin@company.com;ops@company.com",
"subject": "Grafana Alert: {{ .GroupLabels.alertname }}",
"message": "{{ .CommonAnnotations.description }}"
}
}
Webhook Integration
{
"type": "webhook",
"settings": {
"url": "https://your-webhook-endpoint.com/alerts",
"httpMethod": "POST",
"maxAlerts": 10
}
}
Alert Rule Best Practices
- Meaningful Thresholds: Set realistic alert thresholds
- Appropriate Timing: Use proper evaluation intervals
- Clear Labels: Include relevant context in alert labels
- Escalation Paths: Define different severity levels
- Documentation: Provide runbook links in annotations
Security
Authentication Methods
Local Authentication
[auth]
disable_login_form = false
[users]
allow_sign_up = false
auto_assign_org = true
auto_assign_org_role = Viewer
LDAP Integration
[auth.ldap]
enabled = true
config_file = /etc/grafana/ldap.toml
allow_sign_up = true
LDAP configuration (/etc/grafana/ldap.toml
):
[[servers]]
host = "ldap.company.com"
port = 389
use_ssl = false
start_tls = false
ssl_skip_verify = false
bind_dn = "cn=grafana,ou=services,dc=company,dc=com"
bind_password = 'password'
search_filter = "(uid=%s)"
search_base_dns = ["ou=users,dc=company,dc=com"]
[servers.attributes]
name = "givenName"
surname = "sn"
username = "uid"
member_of = "memberOf"
email = "email"
OAuth Integration
[auth.google]
enabled = true
client_id = your_google_client_id
client_secret = your_google_client_secret
scopes = https://www.googleapis.com/auth/userinfo.profile https://www.googleapis.com/auth/userinfo.email
auth_url = https://accounts.google.com/o/oauth2/auth
token_url = https://accounts.google.com/o/oauth2/token
allowed_domains = company.com
Role-Based Access Control
# Team permissions
teams:
- name: "Infrastructure Team"
permissions:
- action: "dashboards:read"
scope: "dashboards:*"
- action: "dashboards:write"
scope: "folders:infrastructure:*"
- name: "Development Team"
permissions:
- action: "dashboards:read"
scope: "dashboards:*"
- action: "dashboards:write"
scope: "folders:applications:*"
Security Best Practices
- Change Default Credentials: Never use default admin/admin
- Use HTTPS: Enable TLS for production deployments
- Regular Updates: Keep Grafana updated with security patches
- Access Control: Implement least-privilege access
- Network Security: Restrict network access appropriately
- Audit Logging: Enable and monitor audit logs
Container Deployment
Complete Monitoring Stack
Docker Compose example with Prometheus, Grafana, and supporting services:
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
container_name: prometheus
ports:
- "9090:9090"
volumes:
- ./prometheus/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.path=/prometheus'
- '--web.console.libraries=/etc/prometheus/console_libraries'
- '--web.console.templates=/etc/prometheus/consoles'
- '--storage.tsdb.retention.time=200h'
- '--web.enable-lifecycle'
networks:
- monitoring
restart: unless-stopped
node-exporter:
image: prom/node-exporter:latest
container_name: node-exporter
ports:
- "9100:9100"
volumes:
- /proc:/host/proc:ro
- /sys:/host/sys:ro
- /:/rootfs:ro
command:
- '--path.procfs=/host/proc'
- '--path.rootfs=/rootfs'
- '--path.sysfs=/host/sys'
- '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
networks:
- monitoring
restart: unless-stopped
cadvisor:
image: gcr.io/cadvisor/cadvisor:latest
container_name: cadvisor
ports:
- "8080:8080"
volumes:
- /:/rootfs:ro
- /var/run:/var/run:rw
- /sys:/sys:ro
- /var/lib/docker:/var/lib/docker:ro
- /dev/disk:/dev/disk:ro
networks:
- monitoring
restart: unless-stopped
grafana:
image: grafana/grafana:latest
container_name: grafana
ports:
- "3000:3000"
environment:
- GF_SECURITY_ADMIN_USER=admin
- GF_SECURITY_ADMIN_PASSWORD=${GRAFANA_PASSWORD}
- GF_USERS_ALLOW_SIGN_UP=false
- GF_SERVER_DOMAIN=localhost
- GF_SMTP_ENABLED=true
- GF_SMTP_HOST=smtp.gmail.com:587
- GF_SMTP_USER=${SMTP_USER}
- GF_SMTP_PASSWORD=${SMTP_PASSWORD}
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/provisioning:/etc/grafana/provisioning
- ./grafana/dashboards:/var/lib/grafana/dashboards
depends_on:
- prometheus
networks:
- monitoring
restart: unless-stopped
loki:
image: grafana/loki:latest
container_name: loki
ports:
- "3100:3100"
volumes:
- ./loki/loki-config.yaml:/etc/loki/local-config.yaml
- loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
networks:
- monitoring
restart: unless-stopped
promtail:
image: grafana/promtail:latest
container_name: promtail
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail/promtail-config.yaml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
networks:
- monitoring
restart: unless-stopped
volumes:
prometheus-data:
grafana-data:
loki-data:
networks:
monitoring:
driver: bridge
Kubernetes Deployment with Helm
# Add Grafana Helm repository
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
# Install Grafana with custom values
helm install grafana grafana/grafana \
--namespace monitoring \
--create-namespace \
--set adminPassword=your-secure-password \
--set service.type=LoadBalancer \
--set persistence.enabled=true \
--set persistence.size=10Gi
Custom values file (grafana-values.yaml
):
adminUser: admin
adminPassword: your-secure-password
service:
type: LoadBalancer
port: 80
targetPort: 3000
persistence:
enabled: true
type: pvc
size: 10Gi
storageClassName: fast-ssd
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 250m
memory: 256Mi
datasources:
datasources.yaml:
apiVersion: 1
datasources:
- name: Prometheus
type: prometheus
url: http://prometheus-server:80
access: proxy
isDefault: true
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
dashboards:
default:
kubernetes-cluster:
gnetId: 7249
revision: 1
datasource: Prometheus
node-exporter:
gnetId: 1860
revision: 27
datasource: Prometheus
Best Practices
Dashboard Design
- Consistent Color Schemes: Use organization-standard colors
- Logical Grouping: Group related metrics together
- Appropriate Visualizations: Choose the right panel type for data
- Performance Optimization: Limit query complexity and panel count
- Responsive Design: Ensure dashboards work on different screen sizes
Data Source Management
- Connection Pooling: Configure appropriate connection limits
- Query Optimization: Use efficient queries to reduce load
- Caching: Enable appropriate caching mechanisms
- Backup Strategies: Regular backup of data source configurations
- Monitoring: Monitor data source health and performance
Operational Excellence
- Version Control: Store dashboard JSON in version control
- Testing: Test dashboard changes in staging environments
- Documentation: Maintain documentation for custom dashboards
- Standardization: Use consistent naming conventions
- Training: Provide user training and documentation
Performance Optimization
# Grafana performance settings
[database]
# Connection pool settings
max_idle_conn = 2
max_open_conn = 14
conn_max_lifetime = 14400
[session]
# Session provider optimization
provider = memory
provider_config =
[caching]
# Enable query result caching
enabled = true
ttl = 300
High Availability Setup
# Load balancer configuration for HA Grafana
version: '3.8'
services:
grafana-1:
image: grafana/grafana:latest
environment:
- GF_DATABASE_TYPE=postgres
- GF_DATABASE_HOST=postgres:5432
- GF_DATABASE_NAME=grafana
volumes:
- ./grafana/config:/etc/grafana
networks:
- grafana-ha
grafana-2:
image: grafana/grafana:latest
environment:
- GF_DATABASE_TYPE=postgres
- GF_DATABASE_HOST=postgres:5432
- GF_DATABASE_NAME=grafana
volumes:
- ./grafana/config:/etc/grafana
networks:
- grafana-ha
nginx:
image: nginx:alpine
ports:
- "80:80"
volumes:
- ./nginx/nginx.conf:/etc/nginx/nginx.conf
depends_on:
- grafana-1
- grafana-2
networks:
- grafana-ha
Troubleshooting
Common Issues and Solutions
Dashboard Not Loading
# Check Grafana logs
docker logs grafana
# Check data source connectivity
curl -H "Authorization: Bearer <api-key>" \
http://grafana:3000/api/datasources/proxy/1/api/v1/query?query=up
# Verify database connectivity
docker exec grafana grafana-cli admin data-migration --config /etc/grafana/grafana.ini
Performance Issues
# Monitor Grafana metrics
curl http://grafana:3000/metrics
# Check query performance
tail -f /var/log/grafana/grafana.log | grep "query took"
# Database connection monitoring
SELECT * FROM pg_stat_activity WHERE application_name = 'grafana';
Authentication Problems
# Reset admin password
docker exec -it grafana grafana-cli admin reset-admin-password newpassword
# Check LDAP configuration
docker exec grafana grafana-cli admin ldap-test
# Verify user permissions
curl -H "Authorization: Bearer <api-key>" \
http://grafana:3000/api/user
Monitoring Grafana Health
# Health check endpoint monitoring
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"]
interval: 30s
timeout: 10s
retries: 3
start_period: 60s
Log Analysis
# Enable debug logging
[log]
level = debug
mode = console file
# Important log patterns to monitor
grep -E "(error|Error|ERROR)" /var/log/grafana/grafana.log
grep -E "(datasource|query)" /var/log/grafana/grafana.log
grep -E "(auth|login)" /var/log/grafana/grafana.log
Backup and Recovery
# Backup Grafana database
docker exec postgres pg_dump -U grafana grafana > grafana_backup.sql
# Backup dashboards via API
curl -H "Authorization: Bearer <api-key>" \
http://grafana:3000/api/search?type=dash-db \
| jq -r '.[].uri' \
| xargs -I {} curl -H "Authorization: Bearer <api-key>" \
http://grafana:3000/api/dashboards/{} > dashboards_backup.json
# Restore dashboard
curl -X POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <api-key>" \
-d @dashboard.json \
http://grafana:3000/api/dashboards/db
Resources
Official Documentation
- Grafana Official Documentation
- Grafana API Documentation
- Grafana Plugin Development
- Grafana Provisioning
Container Resources
Community and Learning
Dashboard Libraries
- Grafana Dashboard Repository
- Prometheus Node Exporter Dashboard
- Kubernetes Cluster Monitoring
- Docker Container Monitoring