Node maintenance for Aerospike on Kubernetes
When performing Kubernetes node maintenance (such as version upgrades, patching, or hardware changes), you need to safely migrate Aerospike pods off the affected nodes. The Aerospike Kubernetes Operator (AKO) provides multiple approaches to handle this:
| Approach | Use Case | Storage Type |
|---|---|---|
| Safe Pod Eviction | kubectl drain operations | Any |
| Scheduling Policies | Planned migrations of Aerospike pods | Network-attached |
| K8sNodeBlockList | Planned migrations of Aerospike pods | Any |
Safe pod eviction webhook
AKO provides a webhook that intercepts pod eviction API calls triggered by commands like kubectl drain or Kubernetes node scale-down by cluster autoscalers like Karpenter. This webhook blocks the Aerospike pod eviction API calls and safely migrates those pods to other Kubernetes nodes, ensuring all safety checks for data migration.
This feature works with both network-attached and local-attached storage configurations. It is disabled by default.
Enabling safe pod eviction
To enable the safe pod eviction webhook, set the ENABLE_SAFE_POD_EVICTION environment variable to true in the operator deployment.
If you installed the operator using Helm, enable it by setting the value during installation or upgrade:
helm upgrade aerospike-kubernetes-operator aerospike/aerospike-kubernetes-operator \ --set safePodEviction.enable="true"Or add it to your values.yaml:
# Enable the eviction webhook to safely block Aerospike pod evictions during node maintenance# Also enables Prometheus metrics: aerospike_ako_eviction_webhook_requests_total (labels: eviction_namespace, decision)safePodEviction: enable: "true" # Eviction webhook timeout in seconds timeoutSeconds: "20"If you installed the operator using OLM (Operator Lifecycle Manager), patch the Subscription to add the environment variable:
kubectl -n operators patch subscription SUBSCRIPTION_NAME \ --type='merge' \ -p '{"spec":{"config":{"env":[{"name":"ENABLE_SAFE_POD_EVICTION","value":"true"}]}}}'Using kubectl drain
Once the safe pod eviction webhook is enabled, you can use standard Kubernetes commands to drain nodes:
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-dataThe webhook intercepts the eviction request for pods that belong to an AerospikeCluster and denies it. For non-Aerospike pods, the eviction request is passed through without modification.
If the eviction is blocked, the webhook sets an annotation aerospike.com/eviction-blocked on the pod. AKO receives this event and starts migrating the Aerospike pods safely.
Wait for the AerospikeCluster to reach the Completed phase before retrying the drain command:
kubectl -n NAMESPACE wait --for=jsonpath='{.status.phase}'=Completed aerospikecluster/CLUSTER_NAME --timeout=300sScheduling Policy
Network-attached storage
For clusters using network-attached storage (such as cloud provider block storage), you can migrate pods by updating scheduling policies in the CR. The pods can move freely between nodes since the storage follows them.
Setting a scheduling policy like affinity, taint & tolerations, and nodeSelectors can help migrate the pods to a different node pool, and the current node pool can be brought down.
Set RollingUpdateBatchSize to expedite this process by migrating pods in a batch.
For example, you can set the following nodeAffinity in the podSpec section of the Custom Resource (CR) file.
AKO performs a rolling restart of the cluster and migrates the pods based on the scheduling policies.
Following nodeAffinity ensures that pods are migrated to a node-pool named upgrade-pool.
AKO restarts the pods and move them to the nodes with the node label cloud.google.com/gke-nodepool: upgrade-pool.
podSpec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: cloud.google.com/gke-nodepool operator: In values: - upgrade-poolK8sNodeBlockList
Local-attached storage
When Kubernetes pods use local storage, they are unable to move to different Aerospike cluster nodes because of volume affinity. This prevents a rolling restart with a different scheduling policy from working.
However, you can use the K8sNodeBlockList feature to migrate the pods out of the given Kubernetes nodes when using local storage.
K8sNodeBlockList specifies the list of Kubernetes node names from which you want to migrate pods. AKO reads this configuration and safely migrates pods off these nodes.
If pods are using network-attached storage, AKO migrates the pods out of their Kubernetes nodes without additional configuration.
If pods are using local-attached storage, you must specify those local storage classes in the spec.Storage.LocalStorageClasses field of the CR.
AKO uses this field to delete the corresponding local volumes so that the pods can be easily migrated out of the Kubernetes nodes.
This process uses the RollingUpdateBatchSize parameter defined in your CR to migrate pods in batches for efficiency.
The following example CR includes a spec.K8sNodeBlockList section with two nodes defined:
apiVersion: asdb.aerospike.com/v1kind: AerospikeClustermetadata: name: aerocluster namespace: aerospikespec: k8sNodeBlockList: - gke-test-default-pool-b6f71594-1w85 - gke-test-default-pool-b6f71594-9vm2
size: 4
image: aerospike/aerospike-server-enterprise:8.1.0.0
rackConfig: namespaces: - test racks: - id: 1 - id: 2...