Skip to content

Architecture

Velero is a single binary that runs both the server-side controller and the CLI. Understanding the internal process model is essential before reading the code.

Process topology

┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes Cluster                                              │
│                                                                 │
│  ┌── namespace: velero ──────────────────────────────────────┐  │
│  │                                                           │  │
│  │  ┌─── velero-server pod ────┐  ┌─── node-agent DaemonSet─┐│  │
│  │  │  BackupController        │  │  DataUploadController   ││  │
│  │  │  RestoreController       │  │  DataDownloadController ││  │
│  │  │  ScheduleController      │  │  Kopia repository engine││  │
│  │  │  GCController            │  │  hostPath: / (ro)       ││  │
│  │  │  BSLController           │  └─────────────────────────┘│  │
│  │  └──────────────────────────┘                             │  │
│  │                                                           │  │
│  │  ┌─── API Server (controller-runtime informers) ─────────┐│  │
│  │  └───────────────────────────────────────────────────────┘│  │
│  │                                                           │  │
│  │  ┌─── Velero CRDs (etcd) ──┐  ┌─── PVC/VolumeSnapshot ──┐ │  │
│  │  └─────────────────────────┘  └─────────────────────────┘ │  │
│  └───────────────────────────────────────────────────────────┘  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
          │                                    │
          ▼ (go-plugin / gRPC)                 ▼
   ┌─── Plugin process ───┐          ┌─── Object Storage ───┐
   │  ObjectStore impl    │──────────│  S3 / GCS / Azure /  │
   │  VolumeSnapshotter   │          │  custom              │
   └──────────────────────┘          └──────────────────────┘

Controllers

Velero uses controller-runtime (the same library as operator-sdk). Each CRD has a reconciler that watches for changes and drives the state toward spec.

Controller Watches Action
BackupController Backup CRD Drives backup from New → InProgress → Completed/Failed
RestoreController Restore CRD Downloads backup, replays resources via dynamic client
ScheduleController Schedule CRD Creates Backup objects on cron cadence
GCController Backup CRD (expired) Deletes backup files from object store + CRD from API
BackupSyncController BackupStorageLocation Syncs Backup objects from BSL into cluster (cross-cluster restores)
BackupDeletionController DeleteBackupRequest CRD Handles explicit backup deletion requests
DataUploadController DataUpload CRD (node-agent) Runs Kopia upload from node-local PVC mount
DataDownloadController DataDownload CRD (node-agent) Runs Kopia download to restore PVC data

Plugin Process Model

Velero uses hashicorp/go-plugin to r un plugins as separate OS processes communicating over gRPC.

This design has deliberate consequences:

  • Crash isolation: a crashing plugin doesn't take down velero-server.
  • Language agnostic: plugins can be written in any language that speaks gRPC (though the Go SDK is the only officially supported one).
  • No hot reload: plugins are discovered at startup from the /plugins directory in the velero pod. Changing plugins requires a pod restart.
// pkg/client/factory.go: plugin manager setup (simplified)
pluginManager := clientmgmt.NewManager(logger, logLevel, pluginRegistry)
// Plugin registry scans the /plugins dir in the pod at startup
// Each binary exposes its capabilities via the SDK handshake
objectStore, err := pluginManager.GetObjectStore("velero.io/aws")
snapshotter, err := pluginManager.GetVolumeSnapshotter("velero.io/aws")

velero CLI

The velero binary is the same binary as the server: it branches on subcommand. velero server starts the controller manager.

The CLI communicates with the cluster exclusively through CRD objects and the API server: there is no direct channel to the velero-server pod.

Prototyping Tip

Because CLI actions work through CRDs, you can prototype behavior by manually creating YAML and watching reconciliation: no UI or CLI shim needed.

HA and Leadership Election

When running multiple replicas of velero-server (for HA), controller-runtime's built-in leader election (via Kubernetes leases) ensures only one replica runs the reconcilers at a time.

The lease is in the velero namespace.

# velero server flags for HA
--leader-elect=true
--leader-elect-lease-duration=15s
--leader-elect-renew-deadline=10s

Next Up

Core CRDs