TOC
Overview
The Endpoint Health Checker is a cluster plugin designed to monitor and manage the health status of service endpoints in k8s cluster. It automatically removes unhealthy endpoints from service to ensure traffic is only routed to healthy instances, improving overall service reliability and availability.
Key Features
- Automatic Health Monitoring: Continuously monitors the health status of service endpoints in k8s cluster
- Load Balancer Integration: Automatically removes unhealthy endpoints from service
- Service Availability: Ensures traffic is only directed to healthy, available endpoints
- Rapid Failover: Reduces endpoint switching time from 40s to 10s during node power outages
Installation
Install via Marketplace
-
Navigate to Administrator > Marketplace > Cluster Plugins.
-
Search for "Alauda Container Platform Endpoint Health Checker" in the plugin list.
-
Click Install to open the installation configuration page.
-
In the deployment configuration dialog, you can optionally configure the following parameters:
| Parameter | Description |
|---|
| Node Selectors | Configure label selectors to specify which nodes the Endpoint Health Checker components should run on. Click Add to add multiple label key-value pairs. |
| Node Tolerations | Configure tolerations to allow Endpoint Health Checker components to be scheduled on nodes with specific taints. Click Add to add multiple tolerations with Key, Value, and Type. |
-
Click Install to deploy the plugin.
-
Wait for the plugin status to change to "Ready".
How It Works
Health Check Mechanism
The Endpoint Health Checker is a dedicated health monitoring component that ensures only healthy endpoints receive traffic. It operates by monitoring service endpoints and automatically managing their availability status.
Core Functionality
The Endpoint Health Checker works by:
- Service Discovery: Identifies services and pods configured for health monitoring in the cluster.
- Pod Health Monitoring: Monitors the readiness and liveness probe status of pods backing the service endpoints
- Active Health Checks: Performs active health assessments using configurable criteria:
- TCP connectivity checks: Establishes TCP connections to verify port accessibility
- Endpoint Management: Automatically removes unhealthy endpoints from service endpoint lists to prevent traffic routing to failed instances
Health Check Process
The health checking process involves:
- Probe Integration: Leverages Kubernetes readiness and liveness probe results as initial health indicators
- Network Connectivity: Sends TCP packets to target endpoint ports to verify accessibility
- Response Validation: Evaluates response status, timing, and content to determine endpoint health
- Automatic Failover: Removes unresponsive or failed endpoints from service endpoint lists
- Previous Method: Relied on kubelet heartbeat detection with up to 40 seconds delay
- Current Method: Active endpoint health checking with 10 second detection and switching time
- Improvement: Significantly improves service availability during node failures in ALB + MetalLB environments
How To Activate
Health checking can be activated through two methods:
Pod-level annotation (Recommended)
For ALB
set alb.cpaas.io/pod-annotations annotation of ALB2
apiVersion: crd.alauda.io/v2
kind: ALB2
metadata:
annotations:
alb.cpaas.io/pod-annotations: '{"endpoint-health-checker.io/enabled":"true"}'
name: demo-alb
spec:
config:
loadbalancerName: demo-alb
nodeSelector:
ingress: 'true'
replicas: 1
type: nginx
For IngressNginx
- Install ingress-nginx
- Set
podAnnotations in .spec.controller.podAnnotations of IngressNginx.
apiVersion: ingress-nginx.alauda.io/v1
kind: IngressNginx
metadata:
name: demo
namespace: ingress-nginx-operator
spec:
controller:
replicaCount: 1
podAnnotations:
endpoint-health-checker.io/enabled: 'true'
For EnvoyGateway
- Install envoy-gateway-operator
- Set
annotations in .spec.provider.kubernetes.envoyDeployment.patch.value.spec.template.metadata.annotations of EnvoyProxy.
apiVersion: gateway.networking.k8s.io/v1
kind: Gateway
metadata:
name: demo
spec:
infrastructure:
parametersRef:
group: gateway.envoyproxy.io
kind: EnvoyProxy
name: demo
gatewayClassName: envoy-gateway-operator-cpaas-default
listeners:
- name: http
port: 80
protocol: HTTP
---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
name: demo
spec:
provider:
kubernetes:
envoyDeployment:
replicas: 1
patch:
type: StrategicMerge
value:
spec:
template:
metadata:
annotations:
endpoint-health-checker.io/enabled: 'true'
container:
imageRepository: registry.alauda.cn:60080/acp/envoyproxy/envoy
type: Kubernetes
For Custom Deployment
set annotations in .spec.template.metadata.annotations of Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: demo
spec:
replicas: 1
selector:
matchLabels:
app: demo
template:
metadata:
labels:
app: demo
annotations:
endpoint-health-checker.io/enabled: 'true'
spec:
containers:
- name: container
ports:
- containerPort: 8080
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Pod-level readinessGates (Legacy)
Configure readinessGates in the pod spec for older versions:
apiVersion: apps/v1
kind: Deployment
metadata:
name: pod-legacy
namespace: cpaas-system
spec:
replicas: 3
selector:
matchLabels:
app: pod-legacy
template:
metadata:
labels:
app: pod-legacy
spec:
readinessGates:
- conditionType: 'endpointHealthCheckSuccess'
containers:
- name: container
image: your-image:latest
ports:
- containerPort: 8080
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 15
periodSeconds: 10
readinessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
Note: The readinessGates configuration is from an older version. It's recommended to use the pod annotation endpoint-health-checker.io/enabled: 'true' for new deployments.
Uninstallation
To uninstall the Endpoint Health Checker:
-
Navigate to Administrator > Marketplace > Cluster Plugins.
-
Find the installed "Endpoint Health Checker" plugin.
-
Click the options menu and select Uninstall.
-
Confirm the uninstallation when prompted.