Isolating Monitoring Components on Kubernetes Infra Nodes

This guide covers Prometheus and VictoriaMetrics components. For logging components (Elasticsearch, Kafka, ZooKeeper, ClickHouse, lanaya, razor), see Isolating Log Components on Kubernetes Infra Nodes.

Objectives

Reliability: Keep monitoring pipelines stable during workload bursts.
Observability at scale: Ensure consistent resource guarantees.
Operational clarity: Apply uniform node selection and taints.

Prerequisites

Follow labeling/tainting and local-PV checks in Isolating Log Components on Kubernetes Infra Nodes.
Ensure monitoring operators are healthy (Prometheus Operator, VM Operator).
Planning the infra nodes by referring to the

Move Monitoring Components to Infra Nodes

Operators will reconcile fields on the respective StatefulSets/Deployments; patch the CRs or owned resources as applicable.

Prometheus and Alertmanager (Prometheus Operator)

# Prometheus instances
kubectl patch prometheus -n cpaas-system kube-prometheus-0 \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# Optional: additional instances
kubectl patch prometheus -n cpaas-system kube-prometheus-1 \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

kubectl patch prometheus -n cpaas-system kube-prometheus-2 \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# Alertmanager
kubectl patch alertmanager -n cpaas-system kube-prometheus \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# Verify
kubectl get pods -n cpaas-system -o wide | grep prometheus-kube-prometheus
kubectl get pods -n cpaas-system -o wide | grep alertmanager-kube-prometheus

VictoriaMetrics (VM Operator)

# vmagents
kubectl patch vmagents.operator.victoriametrics.com agent -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# vmalerts (if deployed)
kubectl patch vmalerts.operator.victoriametrics.com alert -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# vmalertmanager (if deployed)
kubectl patch vmalertmanager alertmanager -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# vmclusters (if deployed) - vminsert, vmselect, vmstorage
kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"vminsert":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'

kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"vmselect":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'

kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"vmstorage":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'

# Verify
kubectl get pods -n cpaas-system -o wide | grep vm

Evict non-infra monitoring Pods

If non-infra monitoring Pods continue running on infra nodes, trigger a reschedule (for example, update an annotation) or adjust selectors/affinities.

Troubleshooting

Pending Pods: Check Events for tolerations/node selectors.
Operator override: Some fields may be reconciled by the operator; always patch the CR when available.
Resource pressure: Monitor kubectl top nodes -l node-role.kubernetes.io/infra and right-size requests/limits.

Isolating Monitoring Components on Kubernetes Infra Nodes

Objectives

Reliability: Keep monitoring pipelines stable during workload bursts.
Observability at scale: Ensure consistent resource guarantees.
Operational clarity: Apply uniform node selection and taints.

Prerequisites

Follow labeling/tainting and local-PV checks in Isolating Log Components on Kubernetes Infra Nodes.
Ensure monitoring operators are healthy (Prometheus Operator, VM Operator).
Planning the infra nodes by referring to the

Move Monitoring Components to Infra Nodes

Operators will reconcile fields on the respective StatefulSets/Deployments; patch the CRs or owned resources as applicable.

Prometheus and Alertmanager (Prometheus Operator)

# Prometheus instances
kubectl patch prometheus -n cpaas-system kube-prometheus-0 \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# Optional: additional instances
kubectl patch prometheus -n cpaas-system kube-prometheus-1 \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

kubectl patch prometheus -n cpaas-system kube-prometheus-2 \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# Alertmanager
kubectl patch alertmanager -n cpaas-system kube-prometheus \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# Verify
kubectl get pods -n cpaas-system -o wide | grep prometheus-kube-prometheus
kubectl get pods -n cpaas-system -o wide | grep alertmanager-kube-prometheus

VictoriaMetrics (VM Operator)

# vmagents
kubectl patch vmagents.operator.victoriametrics.com agent -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# vmalerts (if deployed)
kubectl patch vmalerts.operator.victoriametrics.com alert -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# vmalertmanager (if deployed)
kubectl patch vmalertmanager alertmanager -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}'

# vmclusters (if deployed) - vminsert, vmselect, vmstorage
kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"vminsert":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'

kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"vmselect":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'

kubectl patch vmclusters.operator.victoriametrics.com cluster -n cpaas-system \
  --type='merge' \
  -p='{"spec":{"vmstorage":{"nodeSelector":{"node-role.kubernetes.io/infra":""}}}}'

# Verify
kubectl get pods -n cpaas-system -o wide | grep vm

Evict non-infra monitoring Pods

If non-infra monitoring Pods continue running on infra nodes, trigger a reschedule (for example, update an annotation) or adjust selectors/affinities.

Troubleshooting

Pending Pods: Check Events for tolerations/node selectors.
Operator override: Some fields may be reconciled by the operator; always patch the CR when available.
Resource pressure: Monitor kubectl top nodes -l node-role.kubernetes.io/infra and right-size requests/limits.

ACP CLI (ac)

Node Management

Managed Clusters

Import Clusters

Public Cloud Cluster Initialization

Network Initialization

Storage Initialization

How to

How to

Backup Management

Recovery Management

Guides

How To

Kube OVN

alb

Trouble Shooting

Concepts

Guides

How To

Troubleshooting

Object Storage

Guides

How To

Install

Concepts

Guides

How To

Disaster Recovery

Concepts

Guides

How To

Guides

How To

ALB Operator

Compliance

HowTo

API Refiner

User

Guides

Group

Guides

Role

Guides

IDP

Guides

Troubleshooting

User Policy

Guides

Overview

Images

Guides

How To

Virtual Machine

Guides

How To

Troubleshooting

Network

Guides

How To

Storage

Guides

Backup and Recovery

Guides

Concepts

Namespaces

Creating Applications

Operation and Maintaining Applications

Application Rollout

KEDA(Kubernetes Event-driven Autoscaling)

How To

Workloads

Configurations

Application Observability

How To

How To

Install

How To

Overview

Install

Upgrade