Proxmox - Cheatsheet
Proxmox Cheatsheet
Put Node Into Maintenance
ha-manager crm-command node-maintenance enable MUT7PVE201
Remove Node from Maintenance
ha-manager crm-command node-maintenance disable MUT7PVE201
Remove Old Nodes From HA
- Delete the
old node.pvecm delnode M1LHOSTPROX504 - Navigate to the
old nodefiles on thecurrent primarynode.cd /etc/pve/nodes - Remove the
old nodeby name.rm -rf M1LHOSTPROX504 - Run the following command on
all activenodes.systemctl stop pve-ha-crm.service - Run the command on the current
activenode.rm -f /etc/pve/ha/manager_status - Restart
all the nodes, starting with the node you just removed the above file from.systemctl start pve-ha-crm.service
Wipe SSH Keys
- Run the following command on the server that is failing to connect to the target node. Replace the below hostname with the IP address and run a second time to ensure all entries are wiped.
ssh-keygen -f "/etc/ssh/ssh_known_hosts" -R "MUT7PVE201"
- Run the following command to test the new connection and add the new keys to the keystore. Replace the following field with the target hostname and the field below that with the relevant IP address.
HostKeyAlias=MUT7PVE201
root@192.168.1.75
/usr/bin/ssh -e none -o 'HostKeyAlias=MUT7PVE201' root@192.168.1.75 /bin/true
zfs error: cannot open /rpool/data/vm-110-disk-0: dataset does not exist
- Verify that the disk does not exist on the node that it's sitting on.
zfs list | grep 110 && echo exist || echo not exist
- Check other nodes using the command above to see where the disk exists.
- Move the VM configuration file to where the disk actually resides by using the following command on the node that's currently trying to turn on the VM.
mv /etc/pve/qemu-server/100.conf /var/lib/vz/template/iso/template/iso/
- Run the following command on the node that actually hosts the disk files.
mv /var/lib/vz/template/iso/template/iso/100.conf /etc/pve/qemu-server
- Reenable the VM in the cluster so it can start normally using the method in the next section.
- Once confirmed, delete the .conf file on the node where it was having problems.
VM In Error State
- Fix the underlying issue that caused it to be in an error state in the first place, then run the following command.
ha-manager set vm:100 --state disabled
"trying to aquire lock...TASK ERROR: can't lock file '/var/lock/qemu-server/lock-109.conf' - got timeout"
There are several ways to fix this error. First, restarting the node. Second, restarting pve-cluster services. Third, and probably the safest, following the below steps to see whats causing the lock and potentially fixing it.
- Run the following command to check what's locking the file.
lsof /var/lock/qemu-server/lock-100.conf
- If necessary, run the following command to further identify what's causing the lock.
ps aux | grep PID
- You can kill this process, or as stated above, restart services to force it to stop.
ISOs Not Showing In ISOs Share
- Run the following command to identify shares that can be accessed via NFS from the node/s you want this share attached to.
pvesm scan nfs 192.168.1.1 #(NAS address)
- Run the following command to add the storage to the cluster paying attention to the content types.
pvesm add nfs MUT7NAS-NFS-ISOs --server 192.168.20.20 --path /var/lib/vz/template/iso/ --export/volume1/MUT7PVE-ISOs --content images,iso
- Verify storage was added properly in the gui and using this command.
cat /etc/pve/storage.cfg
Change HA Master Node
pvecm add MUT7PVE201
Change ZFS Pool Name While In Use
- Deactivate the pool through the web GUI
- Run the following command to export the pool
zpool export SSD_POOL_PVE2
- Run the following command to import the pool under the new name.
zpool import SSD_POOL_PVE2 SSD_POOL_PVE1
local node address: cannot use IP '10.0.40.50', not found on local node!
Check the /etc/hosts entry for that server and verify all entries are correct. Modify or delete any that are incorrect or stale.
Delete Replication Job
pvesr delete '106-4' --force
Cluster Not Ready - No Quorum When Removing Nodes
pvecm expected 1
Remove Node From Cluster
pvecm delnode MUT7PVE205
Mirror System Disk
sgdisk /dev/sda -R /dev/sdb
sgdisk --randomize-guids /dev/sdb
pve-efiboot-tool format /dev/sdb2 --force
pve-efiboot-tool init /dev/sdb2
zpool attach rpool /dev/disk/by-id/ata-SSDSC2KG960G8R_BTYG9284085N960CGN-part3 /dev/disk/by-id/ata-SSDSC2KG960G8R_BTYG2055022C960CGN-part3
(Note: checked "zpool status -v" which showed rpool as -part3 aka /dev/sda3)
Troubleshooting
warning: cannot send disk (Broken pipe) (I/O error) (already exists)
If you see an error when trying to replicate such as these.2023-11-14 14:38:08 100-0: warning: cannot send 'NVMe-R10-6D/vm-100-disk-0@__replicate_100-0_1699990683__': Broken pipe2023-11-14 14:38:08 100-0: cannot send 'NVMe-R10-6D/vm-100-disk-0': I/O error2023-11-14 14:38:08 100-0: command 'zfs send -Rpv -- NVMe-R10-6D/vm-100-disk-0@__replicate_100-0_1699990683__' failed: exit code 12023-11-14 14:38:08 100-0: [mut7pve202] volume 'NVMe-R10-6D/vm-100-disk-0' already exists
Verify the disk it's trying to replicate in the error and delete the snapshot replacing the disk number with the one in the error.
zfs destroy -r rpool/data/vm-100-disk-0