Homelab Monitoring with Grafana and Prometheus

By LK Wood IV · 2026-06-13 · ~14 min read · St. Louis County, MO

Uptime Kuma tells you when a service is down. Grafana + Prometheus tells you why it went down, what the CPU was doing three hours before it crashed, which disk is filling up, and which VM is hammering the network. They solve different problems. This guide sets up the full observability stack.

What each component does

Prometheus is a time-series database and scraping engine. You define scrape targets (exporters running on each machine), and Prometheus polls them every 15–60 seconds, storing the metrics. It handles the data collection and storage.

Grafana is the dashboard layer. It connects to Prometheus as a data source, and you build (or import) dashboards that visualize the metrics as graphs, gauges, and tables.

Node Exporter is a Prometheus exporter that runs on Linux machines and exposes system metrics: CPU per-core, RAM, disk IO, filesystem usage, network throughput. One process, ~15MB RAM, runs on every machine you want to monitor.

Alertmanager handles alert routing. Prometheus evaluates alert rules, fires alerts to Alertmanager, and Alertmanager sends them to your preferred notification channel (Slack, Discord, PagerDuty, email, Telegram).

Stack setup with Docker Compose

All components run in Docker containers on your existing Docker host. Create the monitoring stack directory:

mkdir -p /opt/stacks/monitoring/{prometheus,grafana,alertmanager}
cd /opt/stacks/monitoring

Prometheus configuration:

# /opt/stacks/monitoring/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alerts.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets: ["alertmanager:9093"]

scrape_configs:
  # Monitor the Prometheus instance itself
  - job_name: "prometheus"
    static_configs:
      - targets: ["localhost:9090"]

  # Monitor the Docker host
  - job_name: "node-docker-host"
    static_configs:
      - targets: ["node-exporter:9100"]
    relabel_configs:
      - target_label: instance
        replacement: "docker-host"

  # Remote Proxmox nodes — install node_exporter on each
  - job_name: "proxmox-nodes"
    static_configs:
      - targets:
          - "192.168.1.10:9100"    # pve01
          - "192.168.1.11:9100"    # pve02
          - "192.168.1.12:9100"    # pve03
    relabel_configs:
      - source_labels: [__address__]
        target_label: instance

Alert rules:

# /opt/stacks/monitoring/prometheus/alerts.yml
groups:
  - name: homelab
    rules:
      - alert: NodeDown
        expr: up == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.instance }} is down"

      - alert: HighCPU
        expr: 100 - (avg by(instance)(rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "CPU usage over 85% on {{ $labels.instance }}"

      - alert: DiskAlmostFull
        expr: (node_filesystem_avail_bytes{fstype!~"tmpfs|fuse.lxcfs"} / node_filesystem_size_bytes) * 100 < 15
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Disk {{ $labels.mountpoint }} on {{ $labels.instance }} is {{ $value | printf \"%.0f\" }}% full"

      - alert: HighRAMUsage
        expr: (1 - node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 > 90
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "RAM usage over 90% on {{ $labels.instance }}"

Alertmanager configuration:

# /opt/stacks/monitoring/alertmanager/alertmanager.yml
global:
  resolve_timeout: 5m

route:
  group_by: ['alertname', 'instance']
  group_wait: 10s
  group_interval: 10m
  repeat_interval: 12h
  receiver: 'discord'

receivers:
  - name: 'discord'
    discord_configs:
      - webhook_url: 'https://discord.com/api/webhooks/YOUR-WEBHOOK-URL'
        title: '{{ .GroupLabels.alertname }}'
        message: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'

Replace the Discord webhook URL with your own. Alertmanager also supports Slack, PagerDuty, Telegram, email, and many others — see the Alertmanager docs for other receivers.

Docker Compose file:

# /opt/stacks/monitoring/docker-compose.yml
services:
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    restart: unless-stopped
    volumes:
      - ./prometheus:/etc/prometheus
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.path=/prometheus'
      - '--storage.tsdb.retention.time=15d'
      - '--web.enable-lifecycle'
    networks:
      - monitoring
      - proxy

  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    restart: unless-stopped
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      GF_SECURITY_ADMIN_PASSWORD: "change-this-password"
      GF_USERS_ALLOW_SIGN_UP: "false"
    networks:
      - monitoring
      - proxy

  alertmanager:
    image: prom/alertmanager:latest
    container_name: alertmanager
    restart: unless-stopped
    volumes:
      - ./alertmanager:/etc/alertmanager
    networks:
      - monitoring

  node-exporter:
    image: prom/node-exporter:latest
    container_name: node-exporter
    restart: unless-stopped
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'
      - '--collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/)'
    networks:
      - monitoring

volumes:
  prometheus_data:
  grafana_data:

networks:
  monitoring:
    driver: bridge
  proxy:
    external: true

Start the stack:

cd /opt/stacks/monitoring
docker compose up -d

Install Node Exporter on each Proxmox node

On every machine you want to monitor (Proxmox hosts, NAS, etc.):

# Download and install Node Exporter
NODE_EXPORTER_VERSION="1.8.2"
wget https://github.com/prometheus/node_exporter/releases/download/v${NODE_EXPORTER_VERSION}/node_exporter-${NODE_EXPORTER_VERSION}.linux-amd64.tar.gz
tar xzf node_exporter-*.tar.gz
mv node_exporter-*/node_exporter /usr/local/bin/

Create a systemd service:

# /etc/systemd/system/node_exporter.service
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
User=nobody
ExecStart=/usr/local/bin/node_exporter
Restart=on-failure

[Install]
WantedBy=multi-user.target
systemctl enable --now node_exporter
# Verify it's up
curl -s http://localhost:9100/metrics | head -5

Add the node’s IP to your Prometheus prometheus.yml scrape config under proxmox-nodes, then reload Prometheus:

curl -X POST http://localhost:9090/-/reload

Proxmox-specific monitoring with pve-exporter

Node Exporter monitors the Proxmox host OS. For per-VM and per-LXC metrics (CPU, RAM, disk per VM), use pve-exporter:

pip3 install prometheus-pve-exporter

# Create a config file
mkdir -p /etc/pve_exporter
cat > /etc/pve_exporter/pve.yml << 'EOF'
default:
  user: pve-monitor@pve
  password: "strong-password-here"
  verify_ssl: false
EOF

Create the Proxmox API user:

pveum user add pve-monitor@pve
pveum passwd pve-monitor@pve
pveum aclmod / -user pve-monitor@pve -role PVEAuditor

Run pve-exporter as a systemd service and add it to your Prometheus scrape config:

  - job_name: "pve"
    static_configs:
      - targets:
          - "192.168.1.10:9221"    # pve-exporter running on monitoring host
    metrics_path: /pve
    params:
      module: [default]
      cluster: ["1"]
      node: ["1"]

Grafana setup

  1. Open Grafana at http://monitoring-host:3000 (or via NPM at grafana.yourdomain.com)
  2. Log in with admin / your configured password
  3. Add a data source: Configuration → Data Sources → Add → Prometheus → URL: http://prometheus:9090

Import dashboards:

Grafana’s dashboard library at grafana.com/grafana/dashboards has ready-made dashboards for Node Exporter.

Import these by ID (Dashboards → Import → enter ID):

  • 1860 — Node Exporter Full (the definitive node metrics dashboard)
  • 10229 — Node Exporter for Prometheus Dashboard
  • 7039 — Proxmox via Prometheus (pve-exporter dashboard)

After import, set the data source to your Prometheus instance. Dashboard 1860 immediately shows:

  • CPU usage over time per core
  • RAM used, available, cached, buffered
  • Disk IO (reads/writes per second)
  • Filesystem usage with trend
  • Network traffic per interface

Building a custom alert for disk space

The alert in the alerts.yml above fires when any filesystem drops below 15% free. This is a broad catch. For a homelab with a NAS that’s intentionally at 95% capacity (by design), you’d get false alerts. Refine it:

      - alert: DiskAlmostFull
        expr: |
          (node_filesystem_avail_bytes{
            fstype!~"tmpfs|fuse.lxcfs",
            mountpoint!~"/boot.*|/run.*"
          } / node_filesystem_size_bytes) * 100 < 15
          and
          node_filesystem_size_bytes > 10 * 1024^3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Disk {{ $labels.mountpoint }} at {{ printf \"%.0f\" $value }}% free on {{ $labels.instance }}"

The node_filesystem_size_bytes > 10 * 1024^3 filter only fires the alert for filesystems larger than 10GB — excluding tiny boot partitions and tmpfs mounts that are supposed to be “full.”

What to monitor beyond nodes

Docker containers. cAdvisor exports per-container CPU, memory, and network metrics to Prometheus. Add it to your Docker Compose stack and import the cAdvisor dashboard (ID: 14282).

UPS status. If you’re running NUT (Network UPS Tools) for your UPS, nut-exporter exposes battery charge, load percentage, and estimated runtime to Prometheus. Alert when battery drops below 50% and you have warning time before the power problem becomes a shutdown problem. Not sure how much runtime your UPS actually gives you? The UPS Runtime Calculator estimates runtime at load before you wire up monitoring.

SMART disk health. smartmon-textfile is a shell script that runs smartctl and outputs Prometheus text format. Run it as a cron job and Node Exporter picks it up via the textfile collector. Alert when reallocated sector count is non-zero.

Proxmox backup job status. PBS exposes a metrics endpoint — or you can write a simple exporter that checks PBS backup job last-run status via the API and exposes a gauge. Alert when last backup is older than 25 hours.

Resource overhead

The full monitoring stack (Prometheus + Grafana + Alertmanager + Node Exporter) on a 5-node homelab:

  • Prometheus RAM: 150–300MB (grows with time series count and retention period)
  • Grafana RAM: 80–150MB
  • Alertmanager RAM: 20–40MB
  • Node Exporter per node: 10–20MB

Total: 300–600MB for the monitoring infrastructure. Cheap. The 15-day Prometheus retention stores approximately 500MB–2GB of metrics data for 5 nodes at 15-second scrape intervals — fit it on an NVMe, not an HDD (Prometheus is write-intensive).


Using Docker for this stack? The Docker Compose Starter Stack covers the baseline services (NPM, Portainer, Uptime Kuma) that complement this monitoring setup.