Isolating Monitoring Components on Kubernetes Infra Nodes
This guide covers Prometheus and VictoriaMetrics components. For logging components (Elasticsearch, Kafka, ZooKeeper, ClickHouse, lanaya, razor), see Isolating Log Components on Kubernetes Infra Nodes.
TOC
Objectives
- Reliability: Keep monitoring pipelines stable during workload bursts.
- Observability at scale: Ensure consistent resource guarantees.
- Operational clarity: Apply uniform node selection and taints.
Prerequisites
- Follow labeling/tainting and local-PV checks in
Isolating Log Components on Kubernetes Infra Nodes.
- Ensure monitoring operators are healthy (Prometheus Operator, VM Operator).
- Planning the infra nodes by referring to the
Move Monitoring Components to Infra Nodes
Operators will reconcile fields on the respective StatefulSets/Deployments; patch the CRs or owned resources as applicable.
Prometheus and Alertmanager (Prometheus Operator)
# Prometheus instances
kubectl patch prometheus -n cpaas-system kube-prometheus-0 \
--type='merge' \
-p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
# Optional: additional instances
kubectl patch prometheus -n cpaas-system kube-prometheus-1 \
--type='merge' \
-p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
kubectl patch prometheus -n cpaas-system kube-prometheus-2 \
--type='merge' \
-p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
# Alertmanager
kubectl patch alertmanager -n cpaas-system kube-prometheus \
--type='merge' \
-p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
# Verify
kubectl get pods -n cpaas-system -o wide | grep prometheus-kube-prometheus
kubectl get pods -n cpaas-system -o wide | grep alertmanager-kube-prometheus
VictoriaMetrics (VM Operator)
# vmagents
kubectl patch vmagents.operator.victoriametrics.com agent -n cpaas-system \
--type='merge' \
-p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
# vmalerts (if deployed)
kubectl patch vmalerts.operator.victoriametrics.com alert -n cpaas-system \
--type='merge' \
-p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
# vmalertmanager (if deployed)
kubectl patch vmalertmanager alertmanager -n cpaas-system \
--type='merge' \
-p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'
# vmclusters (if deployed) - vminsert, vmselect, vmstorage
kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
--type='merge' \
-p='{"spec":{"vminsert":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'
kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
--type='merge' \
-p='{"spec":{"vmselect":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'
kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
--type='merge' \
-p='{"spec":{"vmstorage":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'
# Verify
kubectl get pods -n cpaas-system -o wide | grep vm
Evict non-infra monitoring Pods
If non-infra monitoring Pods continue running on infra nodes, trigger a reschedule (for example, update an annotation) or adjust selectors/affinities.
Troubleshooting
- Pending Pods: Check Events for tolerations/node selectors.
- Operator override: Some fields may be reconciled by the operator; always patch the CR when available.
- Resource pressure: Monitor
kubectl top nodes -l node-role.kubernetes.io/infra and right-size requests/limits.