Architecture
Velero is a single binary that runs both the server-side controller and the CLI. Understanding the internal process model is essential before reading the code.
Process topology¶
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster │
│ │
│ ┌── namespace: velero ──────────────────────────────────────┐ │
│ │ │ │
│ │ ┌─── velero-server pod ────┐ ┌─── node-agent DaemonSet─┐│ │
│ │ │ BackupController │ │ DataUploadController ││ │
│ │ │ RestoreController │ │ DataDownloadController ││ │
│ │ │ ScheduleController │ │ Kopia repository engine││ │
│ │ │ GCController │ │ hostPath: / (ro) ││ │
│ │ │ BSLController │ └─────────────────────────┘│ │
│ │ └──────────────────────────┘ │ │
│ │ │ │
│ │ ┌─── API Server (controller-runtime informers) ─────────┐│ │
│ │ └───────────────────────────────────────────────────────┘│ │
│ │ │ │
│ │ ┌─── Velero CRDs (etcd) ──┐ ┌─── PVC/VolumeSnapshot ──┐ │ │
│ │ └─────────────────────────┘ └─────────────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
│ │
▼ (go-plugin / gRPC) ▼
┌─── Plugin process ───┐ ┌─── Object Storage ───┐
│ ObjectStore impl │──────────│ S3 / GCS / Azure / │
│ VolumeSnapshotter │ │ custom │
└──────────────────────┘ └──────────────────────┘
Controllers¶
Velero uses controller-runtime (the same library as operator-sdk). Each CRD has a reconciler that watches for changes and drives the state toward spec.
| Controller | Watches | Action |
|---|---|---|
BackupController |
Backup CRD | Drives backup from New → InProgress → Completed/Failed |
RestoreController |
Restore CRD | Downloads backup, replays resources via dynamic client |
ScheduleController |
Schedule CRD | Creates Backup objects on cron cadence |
GCController |
Backup CRD (expired) | Deletes backup files from object store + CRD from API |
BackupSyncController |
BackupStorageLocation | Syncs Backup objects from BSL into cluster (cross-cluster restores) |
BackupDeletionController |
DeleteBackupRequest CRD | Handles explicit backup deletion requests |
DataUploadController |
DataUpload CRD (node-agent) | Runs Kopia upload from node-local PVC mount |
DataDownloadController |
DataDownload CRD (node-agent) | Runs Kopia download to restore PVC data |
Plugin Process Model¶
Velero uses hashicorp/go-plugin to r un plugins as separate OS processes communicating over gRPC.
This design has deliberate consequences:
- Crash isolation: a crashing plugin doesn't take down
velero-server. - Language agnostic: plugins can be written in any language that speaks gRPC (though the Go SDK is the only officially supported one).
- No hot reload: plugins are discovered at startup from the
/pluginsdirectory in the velero pod. Changing plugins requires a pod restart.
// pkg/client/factory.go: plugin manager setup (simplified)
pluginManager := clientmgmt.NewManager(logger, logLevel, pluginRegistry)
// Plugin registry scans the /plugins dir in the pod at startup
// Each binary exposes its capabilities via the SDK handshake
objectStore, err := pluginManager.GetObjectStore("velero.io/aws")
snapshotter, err := pluginManager.GetVolumeSnapshotter("velero.io/aws")
velero CLI¶
The velero binary is the same binary as the server: it branches on
subcommand. velero server starts the controller manager.
The CLI communicates with the cluster exclusively through CRD objects and the API server: there is no direct channel to the velero-server pod.
Prototyping Tip
Because CLI actions work through CRDs, you can prototype behavior by
manually creating YAML and watching reconciliation:
no UI or CLI shim needed.
HA and Leadership Election¶
When running multiple replicas of velero-server (for HA),
controller-runtime's built-in leader election (via Kubernetes leases)
ensures only one replica runs the reconcilers at a time.
The lease is in the velero namespace.
# velero server flags for HA
--leader-elect=true
--leader-elect-lease-duration=15s
--leader-elect-renew-deadline=10s