Proxmox Cluster Setup: 2-Node and 3-Node HA (2026)
By LK Wood IV · 2026-05-29 · ~18 min read · St. Louis County, MO
A single Proxmox node is fine until it isn’t. The drive fails. The NIC dies. You need to reboot for a kernel update and everything goes down. A Proxmox cluster solves this — two or more nodes that share cluster state, can migrate VMs between them, and optionally restart VMs automatically when a node fails.
This guide covers building a cluster from scratch: the correct 2-node architecture with a QDevice, the cleaner 3-node architecture that doesn’t need one, shared storage options, live migration, HA groups, and the fencing question you can’t skip.
What a Proxmox cluster actually gives you
Proxmox clustering has two distinct features that people sometimes conflate:
1. Cluster management (no shared storage required)
- Unified web UI: manage all nodes from one panel
- Centralized user and permission management
- Pool-based resource allocation across nodes
- Live migration between nodes (with shared storage)
2. High Availability (HA) — requires shared storage + fencing
- Automatic VM restart on surviving node when a node fails
- HA watchdog groups with priority ordering
- Fencing to prevent split-brain and data corruption
You can run a cluster for live migration and unified management without HA. HA is the more complex addition and has specific hardware requirements (shared storage, fencing).
Architecture: 2-node vs 3-node
3-node cluster (recommended)
Three nodes each have a vote in corosync. Quorum is 2 votes (majority). If any single node fails:
- 2 surviving nodes form quorum
- Cluster continues operating
- HA restarts VMs on surviving nodes
This is the clean architecture. No workarounds, no QDevice, straightforward HA.
2-node cluster with QDevice
Two nodes cannot hold quorum on their own — each losing the other would leave one node alone, and one vote does not equal majority of two. The solution is a QDevice: a third machine running corosync-qdevice that provides a tie-breaking vote.
Node 1 (vote) + Node 2 (vote) + QDevice (vote) = 3 votes total
Quorum = 2 votes required
If Node 1 fails: Node 2 + QDevice = 2 votes → quorum → cluster continues. If the QDevice fails: Node 1 + Node 2 = 2 votes → quorum maintained (QDevice is just a tiebreaker). If Node 1 + QDevice both fail: Node 2 alone = 1 vote → no quorum → node fences itself.
The QDevice can be anything running Debian/Ubuntu with a network connection — a Raspberry Pi, an LXC on a third server, even a VM on a NAS. It requires minimal resources: 256MB RAM, a few hundred MB of disk, and a stable network connection to both nodes.
Prerequisites
For a 2-node + QDevice cluster:
- 2 Proxmox hosts (any hardware, same or different specs)
- 1 QDevice host (Raspberry Pi, mini PC, or any Debian/Ubuntu system)
- All three on the same network (or VLANs with routing between them)
- No existing Proxmox clusters on any node — clusters can only be created, not merged
For HA (optional, beyond basic clustering):
- Shared storage accessible from all nodes (NFS, iSCSI, or Ceph)
- Fencing mechanism (IPMI, BMC, or kernel watchdog)
Step 1: Create the cluster on node 1
Log into node 1’s Proxmox web UI or SSH in:
# Create the cluster (run on node 1 only)
pvecm create mycluster --link0 <node1-ip>
Verify the cluster is created:
pvecm status
# Should show: Quorum information, Node ID 1, Member count 1
Step 2: Join node 2 to the cluster
From node 2, join the existing cluster:
# Run on node 2
pvecm add <node1-ip> --link0 <node2-ip>
This prompts for node 1’s root password. After joining, verify from node 1:
pvecm nodes
# Should show both nodes listed
At this point, both nodes appear in the web UI under Datacenter. The cluster exists but has no quorum capability — with 2 nodes and 2 votes needed for quorum, losing either node means no quorum.
Step 3: Set up the QDevice
On the QDevice machine (Debian/Ubuntu):
apt update && apt install -y corosync-qdevice
On node 1, configure the QDevice:
pvecm qdevice setup <qdevice-ip>
This copies the cluster SSH key to the QDevice and configures corosync on all nodes to include the QDevice vote. Verify:
pvecm status
# Quorate: Yes
# Total votes: 3 (node1 + node2 + qdevice)
# Expected votes: 2 (quorum threshold)
Step 4: Configure shared storage (required for live migration and HA)
NFS (easiest for homelab)
If you have a TrueNAS or any NFS server:
In Proxmox web UI: Datacenter → Storage → Add → NFS
| Setting | Value |
|---|---|
| ID | nfs-shared |
| Server | 192.168.1.x (NFS server IP) |
| Export | /mnt/pool/proxmox (your NFS export path) |
| Content | Disk image, CT template, ISO image, VZDump backup file |
| Nodes | All (select all cluster nodes) |
After adding, both nodes should see the same NFS storage in their storage list.
Verify from both nodes:
# On both node1 and node2
ls /mnt/pve/nfs-shared/
# Both should show the same directory contents
iSCSI
For better performance than NFS, an iSCSI target (TrueNAS, OpenMediaVault, or a dedicated SAN):
# Discover iSCSI targets
iscsiadm -m discovery -t st -p <iscsi-server-ip>
# Log in to target
iscsiadm -m node -T <target-name> -p <iscsi-server-ip> --login
Add in Proxmox UI: Datacenter → Storage → Add → iSCSI. Wrap it in LVM-thin for thin-provisioned VM disks.
Ceph (advanced, distributed storage)
Ceph runs on the Proxmox nodes themselves — each node contributes disks to a distributed object store. No separate NAS required.
Ceph requires:
- 3+ Proxmox nodes (Ceph needs an odd number of OSDs for quorum)
- A dedicated disk per node for Ceph OSD (separate from the OS disk)
- Low-latency network between nodes (10GbE strongly recommended)
Set up via Proxmox web UI: Datacenter → Ceph → Install → follow wizard. Ceph is powerful but adds complexity; for a 2-node homelab, NFS is simpler.
Step 5: Live migration
With shared storage, you can migrate running VMs between nodes. In the web UI:
- Select a VM
- Right-click → Migrate
- Target node: select the other node
- Check “Online” for zero-downtime live migration (requires VM disk on shared storage)
- Click Migrate
From the CLI:
# Live migrate VM ID 100 to node2
qm migrate 100 node2 --online
Migration over 10GbE takes 10–30 seconds for a typical VM. On 1GbE with a large-memory VM, plan for several minutes.
Step 6: Enable High Availability
HA requires fencing configured before enabling. Without fencing, HA is dangerous — if a node becomes unreachable (not actually down), the surviving node might try to restart VMs while the original node is still running them, potentially causing data corruption on shared storage.
Fencing option 1: IPMI/iLO/iDRAC (recommended for production)
If your servers have IPMI (most enterprise/workstation boards):
Datacenter → HA → Groups → Resources → Fencing:
# /etc/pve/ha/resources.cfg example:
vm: 100
group: ha-group
max_restart: 3
max_relocate: 3
IPMI fencing configuration: Datacenter → HA → Fencing → Add IPMI fence agent with IP, username, password for each node’s IPMI interface.
Fencing option 2: Kernel watchdog (homelab without IPMI)
The Linux kernel watchdog causes a node to self-reboot if it loses cluster communication. It’s not hardware fencing, but it prevents split-brain.
# On each Proxmox node:
echo "softdog" >> /etc/modules
modprobe softdog
# Verify
ls /dev/watchdog
Configure in /etc/pve/ha/resources.cfg:
watchdog-mux: yes
Enable HA for a VM
Once fencing is configured:
- Web UI → Select VM → More → Manage HA
- Click “Add HA”
- Set Group (optional: assign to a specific HA group)
- Max Restart: 3 (restart attempts before abandoning)
- Max Relocate: 3 (move attempts to another node)
Or CLI:
ha-manager add vm:100 --state started --max_restart 3 --max_relocate 3
Check HA status:
ha-manager status
# Shows: vm:100 in state started, node: node1
When node1 fails, HA manager on node2 detects the node timeout, confirms fencing (node has powered off or watchdog-rebooted), and starts VM 100 on node2.
Step 7: HA groups and priorities
HA groups let you prefer specific VMs on specific nodes while allowing failover to others:
# Create a group that prefers node1 but allows node2
ha-manager groupadd ha-group1 --nodes node1:2,node2:1 --restricted 0
# node1:2 = priority 2 (preferred), node2:1 = priority 1 (failover)
# restricted 0 = allow running on any listed node
Assign VMs to the group:
ha-manager set vm:100 --group ha-group1
With this config, VM 100 will preferably run on node1. If node1 fails, it migrates to node2. When node1 comes back, it migrates back (if --nofailback 0, the default).
Monitoring cluster health
# Overall cluster status
pvecm status
# Node status
pvecm nodes
# HA manager status
ha-manager status
# Corosync ring status (network health between nodes)
corosync-cfgtool -s
# Check for quorum issues
corosync-quorumtool
In the web UI: Datacenter → Summary shows all nodes with CPU/RAM/disk graphs and cluster status.
Common issues
“Cluster not quorate” — Usually means the QDevice is unreachable or a node went offline without fencing. Check corosync-cfgtool -s for ring status. Confirm QDevice is running systemctl status corosync-qdevice.
Live migration fails with “node already contains this VM” — The VM disk is on local storage, not shared storage. Move the disk to shared storage first: VM → Hardware → Disk → Move Disk → Target shared storage.
HA manager won’t start VMs — Check ha-manager status for fencing errors. If fencing isn’t configured correctly, HA manager refuses to start VMs on surviving nodes (correct behavior — without fencing, it can’t safely assume the failed node is truly offline).
Corosync ring 0 errors — Network instability between nodes. Check physical connectivity, switch port speed (ensure both nodes are at full duplex), and that the corosync network isn’t saturated. Corosync is very sensitive to dropped packets.
Network recommendations
- Dedicate a network interface or VLAN for corosync (
--link0parameter) - VM traffic on a separate interface or VLAN from cluster communication
- 10GbE between nodes for live migration (1GbE works but migration is slow)
- Storage traffic on a third interface if possible
The VLANs for the homelab guide covers network segmentation for cluster + storage + VM traffic with OPNsense and managed switches.
Cluster storage architecture connects to ZFS. The ZFS on Proxmox guide covers local storage pool creation; for cluster scenarios, set up local ZFS before joining nodes to the cluster. For backup strategy across a cluster, the Proxmox Backup Server guide covers PBS with cluster-aware storage repos.