This document outlines the complete setup of a home lab infrastructure designed for learning and development. The lab consists of a Proxmox virtualization cluster, Kubernetes container orchestration, and infrastructure automation using Terraform and Ansible.
Overview
The home lab architecture includes:
- Proxmox VE: Three-node hypervisor cluster for virtualization
- Kubernetes: Container orchestration with one master and multiple worker nodes
- Terraform: Infrastructure as Code for VM provisioning
- Ansible: Configuration management and automation
- Cloud-init: Automated VM initialization and configuration
Note
This setup is designed for learning and development purposes. For production environments, additional security hardening and high availability considerations would be required.
Architecture
graph TB
subgraph "Proxmox Cluster"
PVE1[Proxmox Node 1]
PVE2[Proxmox Node 2]
PVE3[Proxmox Node 3]
end
subgraph "Kubernetes Cluster"
Master[K8s Master]
Worker1[K8s Worker 1]
Worker2[K8s Worker 2]
end
TF[Terraform] --> PVE1
TF --> PVE2
TF --> PVE3
PVE1 --> Master
PVE2 --> Worker1
PVE3 --> Worker2
Ansible --> Master
Ansible --> Worker1
Ansible --> Worker2
Proxmox Virtual Environment
Proxmox VE forms the foundation of the lab, providing a robust virtualization platform. The setup consists of three compact PCs with Proxmox VE 8.3 installed and configured as a cluster.
Tip
The following helper scripts automate the initial Proxmox setup and configuration. Run these scripts on each node after the initial Proxmox installation.
Initial Setup Script
The pve_setup.sh
script configures the Proxmox environment:
- Sets the non-enterprise apt repositories for pve and ceph
- Removes the enterprise repositories for pve and ceph
- Removes the "No Subscription" pop-up
- Upgrades installed packages
- Creates a
snippets
directory used for install time scripts in templates - Creates a script in
snippets
that will run on new templates
# pve_setup.sh
echo "Configure the 'non-subscription' repositories for pve and ceph...."
echo "deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription" | tee -a /etc/apt/sources.list
echo "deb http://download.proxmox.com/debian/ceph-quincy bookworm no-subscription" | tee -a /etc/apt/sources.list
echo "Disable the Enterprise repositories for pve and ceph...."
sed -i '/^deb/s/^/# /' /etc/apt/sources.list.d/pve-enterprise.list
sed -i '/^deb/s/^/# /' /etc/apt/sources.list.d/ceph.list
echo "Disable the subscription pop-up...."
sed -Ezi.bak "s/(Ext.Msg.show\(\{\s+title: gettext\('No valid sub)/void\(\{ \/\/\1/g" /usr/share/javascript/proxmox-widget-toolkit/proxmoxlib.js
systemctl restart pveproxy.service
echo "Upgrade installed sources...."
apt update
apt upgrade -y
echo "Set up 'snippets' for VM configuration...."
mkdir /var/lib/vz/snippets
tee /var/lib/vz/snippets/qemu-guest-agent.yml <<EOF
#cloud-config
runcmd:
- apt update
- apt install -y qemu-guest-agent
- systemctl start qemu-guest-agent
EOF
echo "Install Terraform...."
pushd /tmp
apt install -y lsb-release
wget -O - https://apt.releases.hashicorp.com/gpg | gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | tee /etc/apt/sources.list.d/hashicorp.list
apt update && apt install terraform -y
popd
Terraform Setup Script
The pve_terraform_setup.sh
script creates the necessary Terraform user and permissions:
- Creates a role for Terraform with appropriate permissions
- Creates a user for Terraform authentication
- Assigns the user to the role with root-level access
- Generates an API token for authentication
Important
Replace <password>
with a secure password for the Terraform user.
# pve_terraform_setup.sh
# Create role with necessary permissions for Terraform
pveum role add TerraformProv -privs "Datastore.AllocateSpace Datastore.AllocateTemplate Datastore.Audit Pool.Allocate Sys.Audit Sys.Console Sys.Modify VM.Allocate VM.Audit VM.Clone VM.Config.CDROM VM.Config.Cloudinit VM.Config.CPU VM.Config.Disk VM.Config.HWType VM.Config.Memory VM.Config.Network VM.Config.Options VM.Migrate VM.Monitor VM.PowerMgmt SDN.Use"
# Create user for Terraform
pveum user add terraform-prov@pve --password "<password>"
# Apply role to user at root level
pveum aclmod / -user terraform-prov@pve -role TerraformProv
# Create API token for the user
pveum user token add terraform-prov@pve terraform
Template Creation Script
The pve_template_create.sh
script downloads a cloud image and creates a VM template:
- Downloads the Debian 12 cloud image if not present
- Creates a VM with the specified configuration
- Configures the VM with cloud-init support
- Converts the VM to a template for cloning
Warning
This script contains a duplicate qm create
command that should be removed. The corrected version is shown below.
# pve_template_create.sh
VMID=9000
TEMPLATENAME="debian12-cloudinit"
TEMPLATEURL="https://cloud.debian.org/images/cloud/bookworm/latest/debian-12-genericcloud-amd64.qcow2"
FILE="debian-12-genericcloud-amd64.qcow2"
# Download cloud image if it doesn't exist
pushd /root/
if [ -f "$FILE" ]; then
echo "Image ($FILE) exists."
else
echo "Image ($FILE) does not exist. Downloading..."
wget "$TEMPLATEURL"
fi
popd
# Create VM from cloud image
qm create "$VMID" --name "$TEMPLATENAME" --memory 2048 --cores 2 --net0 virtio,bridge=vmbr0
qm set "$VMID" --scsi0 local-lvm:0,import-from=/root/"$FILE"
qm set "$VMID" --ide2 local-lvm:cloudinit
qm set "$VMID" --boot order=scsi0
qm set "$VMID" --serial0 socket --vga serial0
qm set "$VMID" --agent enabled=1
# Convert to template
qm template "$VMID"
Kubernetes Cluster Deployment
The Kubernetes cluster is deployed using Terraform for infrastructure provisioning and Ansible for configuration management. The cluster consists of one master node and multiple worker nodes.
Prerequisites
Before deploying the cluster, ensure:
- Proxmox cluster is configured and running
- Terraform user and API token are created
- Cloud-init template is available
- Ansible is installed on the control machine
Terraform Configuration
The Terraform configuration consists of several files that define the infrastructure.
Tip
The Terraform configuration uses version 3.0.1-rc7 of the Telmate Proxmox provider. Check for newer versions when implementing.
Provider Configuration
The providers.tf
file defines the Proxmox provider:
# providers.tf
terraform {
required_providers {
proxmox = {
source = "telmate/proxmox"
version = "3.0.1-rc7"
}
}
}
provider "proxmox" {
pm_api_url = var.proxmox_api_url
pm_api_token_id = var.proxmox_api_token_id
pm_api_token_secret = var.proxmox_api_token_secret
pm_tls_insecure = true
}
Main Resource Configuration
The main.tf
file defines the VM resources. The count
parameter controls the number of VMs created.
Important
The disk size specified in the resource must be equal to or larger than the disk configured in the template. If smaller, a new disk will be created and the cloned disk will be marked as "unused," preventing the VM from booting.
# main.tf
resource "proxmox_vm_qemu" "kubernetes_nodes" {
count = 3
name = "k8s-node-${count.index + 1}"
target_node = var.proxmox_host
vmid = 200 + count.index + 1
agent = 1
cores = 2
memory = 4096
boot = "order=scsi0"
clone = var.template_name
scsihw = "virtio-scsi-single"
vm_state = "running"
automatic_reboot = true
# Cloud-Init configuration
cicustom = "vendor=local:snippets/qemu-guest-agent.yml"
ciupgrade = true
nameserver = "192.168.127.1 8.8.8.8"
ipconfig0 = "ip=192.168.127.2${count.index + 1}/24,gw=192.168.127.1,ip6=dhcp"
skip_ipv6 = true
ciuser = var.ciuser
cipassword = var.password
sshkeys = var.ssh_key
# Serial console for cloud-init images
serial {
id = 0
}
# Disk configuration
disks {
scsi {
scsi0 {
disk {
storage = var.storage
size = "20G" # Increased for Kubernetes requirements
}
}
}
ide {
ide1 {
cloudinit {
storage = var.storage
}
}
}
}
# Network configuration
network {
id = 0
bridge = "vmbr0"
model = "virtio"
}
}
Output Configuration
The output.tf
file displays the assigned IP addresses:
# output.tf
output "kubernetes_node_ips" {
value = proxmox_vm_qemu.kubernetes_nodes[*].ipconfig0
description = "IP addresses assigned to the Kubernetes nodes"
}
output "kubernetes_node_names" {
value = proxmox_vm_qemu.kubernetes_nodes[*].name
description = "Names of the Kubernetes nodes"
}
Variable Definitions
The variables.tf
file defines all variables used in the configuration:
Note
The template name comment mentions Ubuntu 24.04, but the actual template is Debian 12. This has been corrected in the description.
# variables.tf
variable "proxmox_api_url" {
type = string
description = "URL of the Proxmox API"
default = "https://192.168.127.113:8006/api2/json"
}
variable "proxmox_api_token_id" {
type = string
description = "Proxmox API token ID"
sensitive = true
default = "terraform-prov@pve!terraform"
}
variable "proxmox_api_token_secret" {
type = string
description = "Proxmox API token secret"
sensitive = true
}
variable "proxmox_host" {
type = string
default = "FR-VH-01"
description = "Target Proxmox node for VM deployment"
}
variable "template_name" {
type = string
default = "debian12-cloudinit"
description = "Name of the Debian 12 cloud-init template"
}
variable "ssh_key" {
type = string
description = "SSH public key for VM access"
}
variable "ciuser" {
type = string
description = "Cloud-init user for VM configuration"
default = "debian"
}
variable "password" {
type = string
description = "Password for the cloud-init user"
sensitive = true
}
variable "storage" {
type = string
default = "local-lvm"
description = "Proxmox storage pool for VM disks"
}
Variable Values
The terraform.tfvars
file provides values for the variables:
Warning
This file contains sensitive information and should never be committed to version control. Add it to your .gitignore
file.
# terraform.tfvars
proxmox_api_url = "https://<proxmox-host-ip>:8006/api2/json"
proxmox_api_token_id = "terraform-prov@pve!terraform"
proxmox_api_token_secret = "<your-token-secret>"
proxmox_host = "<proxmox-node-name>"
template_name = "debian12-cloudinit"
ciuser = "debian"
password = "<secure-password>"
ssh_key = "<your-ssh-public-key>"
storage = "local-lvm"
Deployment Commands
To deploy the infrastructure:
# Initialize Terraform
terraform init
# Plan the deployment
terraform plan
# Apply the configuration
terraform apply
# Destroy the infrastructure (when needed)
terraform destroy
Ansible Configuration Management
Once the VMs are deployed, Ansible configures them as a Kubernetes cluster. The configuration includes one master node and multiple worker nodes.
Ansible Prerequisites
- Ansible installed on the control machine
- SSH access to all nodes
- Python 3 installed on target nodes
Inventory Configuration
The hosts
file defines the cluster topology:
Tip
The ansible_user
should match the ciuser
configured in the Terraform variables.
# hosts
[master]
master1 ansible_host=192.168.127.21
[workers]
worker1 ansible_host=192.168.127.22
worker2 ansible_host=192.168.127.23
[all:vars]
ansible_python_interpreter=/usr/bin/python3
ansible_ssh_extra_args='-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null'
ansible_user=<user>
Node Dependencies
The kube-depends.yml
playbook installs and configures prerequisites on all nodes:
# kube-depends.yml
- hosts: all
become: yes
tasks:
- name: update APT packages
apt:
update_cache: yes
- name: reboot and wait for reboot to complete
reboot:
- name: disable SWAP (Kubeadm requirement)
shell: |
swapoff -a
- name: disable SWAP in fstab (Kubeadm requirement)
replace:
path: /etc/fstab
regexp: '^([^#].*?\sswap\s+sw\s+.*)$'
replace: '# \1'
- name: create an empty file for the Containerd module
copy:
content: ""
dest: /etc/modules-load.d/containerd.conf
force: no
- name: configure modules for Containerd
blockinfile:
path: /etc/modules-load.d/containerd.conf
block: |
overlay
br_netfilter
- name: create an empty file for Kubernetes sysctl params
copy:
content: ""
dest: /etc/sysctl.d/99-kubernetes-cri.conf
force: no
- name: configure sysctl params for Kubernetes
lineinfile:
path: /etc/sysctl.d/99-kubernetes-cri.conf
line: "{{ item }}"
with_items:
- 'net.bridge.bridge-nf-call-iptables = 1'
- 'net.ipv4.ip_forward = 1'
- 'net.bridge.bridge-nf-call-ip6tables = 1'
- name: apply sysctl params without reboot
command: sysctl --system
- name: install APT Transport HTTPS
apt:
name: apt-transport-https
state: present
- name: add Docker apt-key
get_url:
url: https://download.docker.com/linux/ubuntu/gpg
dest: /etc/apt/keyrings/docker-apt-keyring.asc
mode: '0644'
force: true
- name: add Docker's APT repository
apt_repository:
repo: "deb [arch={{ 'amd64' if ansible_architecture == 'x86_64' else 'arm64' }} signed-by=/etc/apt/keyrings/docker-apt-keyring.asc] https://download.docker.com/linux/debian {{ ansible_distribution_release }} stable"
state: present
update_cache: yes
- name: add Kubernetes apt-key
get_url:
url: https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key
dest: /etc/apt/keyrings/kubernetes-apt-keyring.asc
mode: '0644'
force: true
- name: add Kubernetes' APT repository
apt_repository:
repo: "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.asc] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /"
state: present
update_cache: yes
- name: install Containerd
apt:
name: containerd.io
state: present
- name: create Containerd directory
file:
path: /etc/containerd
state: directory
- name: add Containerd configuration
shell: /usr/bin/containerd config default > /etc/containerd/config.toml
- name: configuring the systemd cgroup driver for Containerd
lineinfile:
path: /etc/containerd/config.toml
regexp: ' SystemdCgroup = false'
line: ' SystemdCgroup = true'
- name: enable the Containerd service and start it
systemd:
name: containerd
state: restarted
enabled: yes
daemon-reload: yes
- name: install Kubelet
apt:
name: kubelet=1.29.*
state: present
update_cache: true
- name: install Kubeadm
apt:
name: kubeadm=1.29.*
state: present
- name: enable the Kubelet service, and enable it persistently
service:
name: kubelet
enabled: yes
- name: load br_netfilter kernel module
modprobe:
name: br_netfilter
state: present
- name: set bridge-nf-call-iptables
sysctl:
name: net.bridge.bridge-nf-call-iptables
value: 1
- name: set ip_forward
sysctl:
name: net.ipv4.ip_forward
value: 1
- name: reboot and wait for reboot to complete
reboot:
- hosts: master
become: yes
tasks:
- name: install Kubectl
apt:
name: kubectl=1.29.*
state: present
force: yes # allow downgrades
The master.yml
file is used to configure the node identified as the Master and configure the cluster.
# master.yml
- hosts: master
become: yes
tasks:
- name: create an empty file for Kubeadm configuring
copy:
content: ""
dest: /etc/kubernetes/kubeadm-config.yaml
force: no
- name: configuring the container runtime including its cgroup driver
blockinfile:
path: /etc/kubernetes/kubeadm-config.yaml
block: |
kind: ClusterConfiguration
apiVersion: kubeadm.k8s.io/v1beta3
networking:
podSubnet: "10.244.0.0/16"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
runtimeRequestTimeout: "15m"
cgroupDriver: "systemd"
systemReserved:
cpu: 100m
memory: 350M
kubeReserved:
cpu: 100m
memory: 50M
enforceNodeAllocatable:
- pods
- name: initialize the cluster (this could take some time)
shell: kubeadm init --config /etc/kubernetes/kubeadm-config.yaml >> cluster_initialized.log
args:
chdir: /home/{{ ansible_user }}
creates: cluster_initialized.log
- name: create .kube directory
become: yes
become_user: "{{ ansible_user }}"
file:
path: $HOME/.kube
state: directory
mode: 0755
- name: copy admin.conf to user's kube config
copy:
src: /etc/kubernetes/admin.conf
dest: /home/{{ ansible_user }}/.kube/config
remote_src: yes
owner: "{{ ansible_user }}"
- name: install Pod network
become: yes
become_user: "{{ ansible_user }}"
shell: kubectl apply -f https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml >> pod_network_setup.log
args:
chdir: $HOME
creates: pod_network_setup.log
Worker Node Configuration
The workers.yml
playbook configures worker nodes and joins them to the cluster:
Note
This playbook first retrieves the join command from the master node, then executes it on all worker nodes.
# workers.yml
- hosts: master
become: yes
tasks:
- name: get join command
shell: kubeadm token create --print-join-command
register: join_command_raw
- name: set join command
set_fact:
join_command: "{{ join_command_raw.stdout_lines[0] }}"
- hosts: workers
become: yes
tasks:
- name: TCP port 6443 on master is reachable from worker
wait_for: "host={{ hostvars['master1']['ansible_default_ipv4']['address'] }} port=6443 timeout=1"
- name: join cluster
shell: "{{ hostvars['master1'].join_command }} >> node_joined.log"
args:
chdir: /home/{{ ansible_user }}
creates: node_joined.log
Deployment Workflow
Execute the playbooks in the following order:
# 1. Install dependencies on all nodes
ansible-playbook -i hosts kube-depends.yml
# 2. Configure the master node
ansible-playbook -i hosts master.yml
# 3. Configure worker nodes and join them to the cluster
ansible-playbook -i hosts workers.yml
Cluster Verification
After deployment, verify the cluster is functioning correctly:
# Check cluster status
kubectl get nodes
# Check system pods
kubectl get pods --all-namespaces
# Check cluster info
kubectl cluster-info
Troubleshooting
Common Issues
Warning
If you encounter issues during deployment, check the following common problems:
Terraform Issues
- Authentication errors: Verify API token and permissions
- Template not found: Ensure the cloud-init template exists
- Disk size errors: Check that disk size matches or exceeds template size
Ansible Issues
- SSH connection failures: Verify SSH keys and connectivity
- Permission errors: Ensure the ansible_user has sudo privileges
- Kubernetes version compatibility: Check that all components use compatible versions
Kubernetes Issues
- Nodes not joining: Check network connectivity and firewall rules
- Pods not starting: Verify container runtime and CNI configuration
- DNS resolution: Ensure CoreDNS is running properly
Useful Commands
# Terraform troubleshooting
terraform plan -detailed-exitcode
terraform state list
terraform state show <resource>
# Ansible troubleshooting
ansible-playbook -i hosts playbook.yml --check
ansible-playbook -i hosts playbook.yml -vvv
# Kubernetes troubleshooting
kubectl describe nodes
kubectl logs -n kube-system <pod-name>
journalctl -u kubelet
Next Steps
Once the cluster is operational, consider these enhancements:
- Storage: Configure persistent storage with Longhorn or Rook
- Ingress: Set up ingress controller (Traefik or NGINX)
- Monitoring: Deploy Prometheus and Grafana
- Security: Implement network policies and RBAC
- Backup: Set up etcd backup and restore procedures