云栈社区»论坛 › 技术文档「 Note & Doc 」 › 基于Prometheus自定义指标的Kubernetes HPA实战指南

发回帖发新帖

5580 积分	1 好友	759 主题

发消息

基于Prometheus自定义指标的Kubernetes HPA实战指南

发表于 2026-3-24 04:59:05 | 查看: 94| 回复: 0

适用场景 & 前置条件

项目	要求
适用场景	业务流量波动明显的 Web 应用、API 服务、定时任务处理
Kubernetes	1.23+ (支持 HPA v2 API)
Metrics Server	0.6.0+ (提供资源指标)
Prometheus	2.40+ (采集自定义指标)
Prometheus Adapter	0.11+ (转换 Prometheus 指标为 K8s Custom Metrics)
集群规模	3+ 节点，每节点 4C8G（最小）/ 8C16G（推荐）
网络要求	集群内部 DNS 正常，Prometheus 可访问所有 Pod
权限要求	cluster-admin 或具有创建 HPA、ServiceMonitor 的权限
技能要求	熟悉 Kubernetes、Prometheus PromQL、YAML 配置

反模式警告

⚠️ 以下场景不推荐使用本方案：

超低延迟要求：金融交易、实时竞价系统（HPA 扩容需 30-60 秒，无法满足毫秒级响应）
有状态服务：数据库集群、消息队列（扩缩容需手动迁移数据，HPA 仅适合无状态应用）
流量极度平稳：日均波动 < 10%，固定副本数更简单
资源受限集群：节点资源利用率 > 80%，无扩容空间
成本敏感场景：云环境按量计费，频繁扩缩容可能增加成本

替代方案对比：

场景	推荐方案	理由
低延迟要求	预留固定副本数 + 超卖策略	避免扩容等待时间
有状态服务	StatefulSet + 手动扩容	保证数据一致性
极度平稳流量	Deployment 固定 replicas	简化运维
成本优先	Cluster Autoscaler + 节点池	同时缩减节点降低成本
批处理任务	CronJob + Job	定时任务无需常驻 Pod

环境与版本矩阵

组件	版本	安装方式	测试状态
Kubernetes	1.28.4 / 1.27.8 / 1.26.11	kubeadm / k3s / EKS	[已实测]
Metrics Server	0.6.4 / 0.6.3	Helm / kubectl apply	[已实测]
Prometheus	2.48.0 / 2.45.0	Prometheus Operator / Helm	[已实测]
Prometheus Adapter	0.11.1 / 0.10.0	Helm	[已实测]
Helm	3.12+	官方二进制	[已实测]
kubectl	1.28+	官方二进制	[已实测]

版本差异说明：

Kubernetes 1.23 之前：使用 HPA v2beta2 API（本文不覆盖）
Metrics Server 0.5 vs 0.6：0.6 改进了高可用性，推荐生产环境使用
Prometheus Adapter 0.10 vs 0.11：0.11 支持更灵活的指标映射规则

阅读导航

📖 建议阅读路径：

快速上手（20分钟）： → 章节6（快速清单） → 章节7（实施步骤 Step 1-5） → 章节14（一键脚本）

深入理解（60分钟）： → 章节8（最小必要原理） → 章节7（实施步骤完整版） → 章节10（最佳实践） → 章节12（扩展阅读）

故障排查： → 章节9（常见故障与排错） → 章节8（HPA 决策机制）

快速清单

[ ] 准备阶段
- [ ] 检查 Kubernetes 集群版本（kubectl version）
- [ ] 部署 Metrics Server（kubectl apply -f metrics-server.yaml）
- [ ] 部署 Prometheus Operator（helm install prometheus）
- [ ] 部署 Prometheus Adapter（helm install prometheus-adapter）
[ ] 实施阶段
- [ ] 部署示例应用（Nginx）并配置 ServiceMonitor
- [ ] 配置 Prometheus Adapter 自定义指标映射
- [ ] 创建基于 CPU 的 HPA（验证基础功能）
- [ ] 创建基于自定义指标（QPS）的 HPA
- [ ] 配置扩缩容策略（冷却时间、行为策略）
[ ] 验证阶段
- [ ] 测试基于 CPU 的扩容（kubectl run -it --rm load-generator）
- [ ] 测试基于 QPS 的扩容（使用 wrk 压测工具）
- [ ] 验证缩容行为（停止压测，观察缩容时间）
- [ ] 检查 HPA 事件日志（kubectl describe hpa）
[ ] 监控阶段
- [ ] 配置 Grafana 面板监控 HPA 状态
- [ ] 设置 Prometheus 告警规则
- [ ] 验证指标采集完整性

实施步骤

系统架构

【Kubernetes HPA 自动扩缩容架构】

应用层
 ├─ Nginx Deployment (初始 2 副本)
 │   ↓
 ├─ Service (暴露 80 端口)
 │   ↓
 └─ Pod (暴露 Prometheus 指标：/metrics)

监控层
 ├─ Prometheus
 │   ├─ 通过 ServiceMonitor 发现 Pod
 │   ├─ 采集指标：http_requests_total, container_cpu_usage
 │   └─ 存储时序数据
 │
 └─ Prometheus Adapter
     ├─ 查询 Prometheus 指标
     ├─ 转换为 K8s Custom Metrics API
     └─ 暴露接口：/apis/custom.metrics.k8s.io/v1beta1

控制层
 ├─ Metrics Server
 │   └─ 提供资源指标：CPU/内存使用率
 │
 └─ HPA Controller
     ├─ 定期查询指标（默认 15 秒）
     │   ├─ 资源指标来源：Metrics Server
     │   └─ 自定义指标来源：Prometheus Adapter
     │
     ├─ 计算期望副本数
     │   公式：期望副本数 = ceil(当前副本数 * (当前指标值 / 目标指标值))
     │
     ├─ 执行扩缩容决策
     │   ├─ 扩容：当前指标 > 目标值 * 容忍度 (默认 1.1)
     │   ├─ 缩容：当前指标 < 目标值 * 容忍度 (默认 0.9)
     │   └─ 稳定：在容忍范围内，不变更
     │
     └─ 更新 Deployment replicas

数据流向
 ┌─────────────────────────────────────────────┐
 │ 1. Pod 暴露指标 → Prometheus 采集          │
 │ 2. Prometheus Adapter 转换为 K8s API       │
 │ 3. HPA Controller 每 15 秒查询指标         │
 │ 4. HPA 计算期望副本数并更新 Deployment    │
 │ 5. Deployment 创建/删除 Pod                │
 └─────────────────────────────────────────────┘

Step 1: 部署 Metrics Server

目标： 提供基础资源指标（CPU/内存）

检查是否已安装：

kubectl get deployment metrics-server -n kube-system
# 预期输出：如果未安装会显示 "Error from server (NotFound)"

安装 Metrics Server（官方 YAML）：

# 下载官方部署文件
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.4/components.yaml

# 如果集群使用自签名证书，需修改 Deployment 参数
kubectl patch deployment metrics-server -n kube-system --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'

关键参数解释：

--kubelet-insecure-tls：跳过 kubelet 证书验证（仅测试环境）
--metric-resolution=15s：指标采集间隔（默认 60 秒，可优化为 15 秒）
--kubelet-preferred-address-types=InternalIP：优先使用内部 IP 访问 kubelet

执行后验证：

# 等待 Pod 运行
kubectl wait --for=condition=ready pod -l k8s-app=metrics-server -n kube-system --timeout=60s

# 验证指标可用
kubectl top nodes
# 预期输出：显示节点 CPU 和内存使用率

kubectl top pods -A
# 预期输出：显示所有 Pod 的资源使用

常见错误示例：

# 错误1：无法连接 kubelet
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
# 解决：检查 kubelet 10250 端口是否开放，添加 --kubelet-insecure-tl`s` 参数

# 错误2：证书验证失败
x509: cannot validate certificate for 192.168.1.10 because it doesn't contain any IP SANs
# 解决：添加 --kubelet-insecure-tls 参数（测试环境）或配置正确的证书

Step 2: 部署 Prometheus Operator

目标： 部署 Prometheus 用于采集自定义指标

使用 Helm 安装（推荐）：

# 添加 Prometheus 社区 Helm 仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

# 创建命名空间
kubectl create namespace monitoring

# 安装 Prometheus Operator（包含 Prometheus、Grafana、Alertmanager）
helm install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --set prometheus.prometheusSpec.retention=7d \
  --set prometheus.prometheusSpec.storageSpec.volumeClaimTemplate.spec.resources.requests.storage=50Gi

# 参数说明：
# - serviceMonitorSelectorNilUsesHelmValues=false：允许发现所有命名空间的 ServiceMonitor
# - retention=7d：数据保留 7 天
# - storage=50Gi：持久化存储 50GB

执行后验证：

# 检查 Prometheus Pod 状态
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus
# 预期输出：prometheus-prometheus-kube-prometheus-prometheus-0   2/2   Running

# 检查 Prometheus Service
kubectl get svc -n monitoring prometheus-kube-prometheus-prometheus
# 预期输出：ClusterIP 类型服务，端口 9090

# 临时访问 Prometheus UI（端口转发）
kubectl port-forward -n monitoring svc/prometheus-kube-prometheus-prometheus 9090:9090
# 浏览器访问：http://localhost:9090

验证指标采集：

# 在 Prometheus UI 中执行 PromQL 查询
# 查询1：检查集群节点数量
up{job="kubernetes-nodes"}
# 预期结果：显示所有节点状态（值为 1 表示健康）

# 查询2：检查 kube-state-metrics
kube_deployment_status_replicas
# 预期结果：显示所有 Deployment 的副本数

Step 3: 部署 Prometheus Adapter

目标： 将 Prometheus 指标转换为 Kubernetes Custom Metrics API

使用 Helm 安装：

# 安装 Prometheus Adapter
helm install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  --set prometheus.url=http://prometheus-kube-prometheus-prometheus.monitoring.svc \
  --set prometheus.port=9090

# 参数说明：
# - prometheus.url：Prometheus 服务地址（集群内部 DNS）
# - prometheus.port：Prometheus 服务端口

配置自定义指标映射（关键步骤）：

cat > prometheus-adapter-values.yaml <<EOF
prometheus:
  url: http://prometheus-kube-prometheus-prometheus.monitoring.svc
  port: 9090

rules:
  default: true  # 保留默认的资源指标规则
  custom:
  # 自定义规则1：Pod HTTP 请求 QPS
  - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "^(.*)_total$"
      as: "\${1}_per_second"
    metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'

  # 自定义规则2：Nginx 活跃连接数
  - seriesQuery: 'nginx_connections_active{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "^(.*)$"
      as: "\${1}"
    metricsQuery: 'sum(<<.Series>>{<<.LabelMatchers>>}) by (<<.GroupBy>>)'

  # 自定义规则3：应用自定义业务指标
  - seriesQuery: 'myapp_processing_duration_seconds_sum{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "^(.*)_sum$"
      as: "\${1}_avg"
    metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>) / sum(rate(myapp_processing_duration_seconds_count{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
EOF

# 更新 Prometheus Adapter 配置
helm upgrade prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  -f prometheus-adapter-values.yaml

关键参数解释：

seriesQuery：从 Prometheus 查询哪些指标
resources.overrides：将 Prometheus 标签映射为 K8s 资源（namespace/pod）
name.matches：正则匹配指标名称并重命名
metricsQuery：实际查询 Prometheus 的 PromQL（rate() 计算速率）

执行后验证：

# 检查 Prometheus Adapter Pod 状态
kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus-adapter
# 预期输出：prometheus-adapter-xxxx   1/1   Running

# 验证 Custom Metrics API 可用
kubectl get apiservices | grep custom.metrics
# 预期输出：v1beta1.custom.metrics.k8s.io   monitoring/prometheus-adapter   True

# 查询可用的自定义指标
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources[].name'
# 预期输出：包含 "pods/http_requests_per_second", "pods/nginx_connections_active" 等

常见错误示例：

# 错误1：API Service 不可用
Error from server (ServiceUnavailable): the server is currently unable to handle the request
# 解决：检查 Prometheus Adapter Pod 日志
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-adapter

# 错误2：指标未出现在 Custom Metrics API
# 原因：seriesQuery 未匹配到 Prometheus 中的指标
# 解决：在 Prometheus UI 中验证指标存在，调整 seriesQuery 正则表达式

Step 4: 部署示例应用并暴露指标

目标： 部署 Nginx 应用并配置 Prometheus 采集

创建示例应用（包含 Prometheus Exporter）：

cat > nginx-deployment.yaml <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-app
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "9113"
        prometheus.io/path: "/metrics"
    spec:
      containers:
      # 主容器：Nginx
      - name: nginx
        image: nginx:1.24-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi

      # Sidecar：Nginx Prometheus Exporter
      - name: nginx-exporter
        image: nginx/nginx-prometheus-exporter:0.11
        args:
          - '-nginx.scrape-uri=http://localhost:80/stub_status'
        ports:
        - containerPort: 9113
        resources:
          requests:
            cpu: 50m
            memory: 64Mi
          limits:
            cpu: 100m
            memory: 128Mi
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-app
  namespace: default
spec:
  selector:
    app: nginx
  ports:
  - port: 80
    targetPort: 80
    name: http
  - port: 9113
    targetPort: 9113
    name: metrics
---
# 配置 Nginx stub_status 模块（暴露基础指标）
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
  namespace: default
data:
  default.conf: |
    server {
        listen 80;
        location / {
            root /usr/share/nginx/html;
            index index.html;
        }
        location /stub_status {
            stub_status on;
            access_log off;
        }
    }
EOF

# 应用配置
kubectl apply -f nginx-deployment.yaml

创建 ServiceMonitor（Prometheus 自动发现）：

cat > nginx-servicemonitor.yaml <<EOF
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: nginx-app
  namespace: default
  labels:
    app: nginx
spec:
  selector:
    matchLabels:
      app: nginx
  endpoints:
  - port: metrics
    interval: 15s
    path: /metrics
EOF

kubectl apply -f nginx-servicemonitor.yaml

执行后验证：

# 检查 Pod 运行状态
kubectl get pods -l app=nginx
# 预期输出：2 个 Pod，每个包含 2 个容器（2/2 Running）

# 验证指标端点可访问
kubectl port-forward svc/nginx-app 9113:9113
curl http://localhost:9113/metrics
# 预期输出：包含 nginx_connections_active、nginx_http_requests_total 等指标

# 在 Prometheus UI 中验证指标采集
# 查询：nginx_connections_active{namespace="default"}
# 预期结果：显示 2 个 Pod 的连接数

Step 5: 创建基于 CPU 的 HPA（基础验证）

目标： 验证 HPA 基础功能

创建 HPA 配置：

cat > hpa-cpu.yaml <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa-cpu
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50  # 目标：平均 CPU 使用率 50%
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300  # 缩容冷却时间 5 分钟
      policies:
      - type: Percent
        value: 50  # 每次最多缩容 50%
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0  # 扩容无冷却
      policies:
      - type: Percent
        value: 100  # 每次最多扩容 100%（翻倍）
        periodSeconds: 15
      - type: Pods
        value: 4  # 或每次最多增加 4 个 Pod
        periodSeconds: 15
      selectPolicy: Max  # 选择最激进的策略
EOF

kubectl apply -f hpa-cpu.yaml

关键参数解释：

averageUtilization: 50：目标 CPU 使用率（相对于 Pod requests）
stabilizationWindowSeconds：决策窗口，防止指标抖动导致频繁扩缩容
scaleUp.policies：扩容策略，支持百分比或固定数量，取最大值

执行后验证：

# 查看 HPA 状态
kubectl get hpa nginx-hpa-cpu
# 预期输出：
# NAME            REFERENCE              TARGETS   MINPODS   MAXPODS   REPLICAS
# nginx-hpa-cpu   Deployment/nginx-app   5%/50%    2         10        2

# 查看详细事件
kubectl describe hpa nginx-hpa-cpu
# 预期输出：包含 "New size: 2; reason: All metrics below target"

压测触发扩容：

# 启动压测容器（模拟 CPU 密集型请求）
kubectl run -it --rm load-generator --image=busybox --restart=Never -- /bin/sh

# 在容器内执行（持续请求 Nginx）
while true; do wget -q -O- http://nginx-app.default.svc.cluster.local; done

# 在另一终端观察 HPA 扩容
kubectl get hpa nginx-hpa-cpu --watch
# 预期变化：
# REPLICAS: 2 → 4 → 8（约 1-2 分钟内完成）

# 停止压测后观察缩容
# 预期变化：等待 5 分钟冷却后，REPLICAS: 8 → 4 → 2

Step 6: 创建基于自定义指标（QPS）的 HPA

目标： 基于业务指标（每秒请求数）自动扩缩容

创建 HPA 配置：

cat > hpa-qps.yaml <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa-qps
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
  # 指标1：HTTP 请求 QPS（自定义指标）
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"  # 目标：每个 Pod 处理 100 QPS

  # 指标2：CPU 使用率（资源指标，作为兜底）
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60  # 每分钟最多缩容 2 个 Pod
    scaleUp:
      stabilizationWindowSeconds: 30  # 扩容冷却 30 秒
      policies:
      - type: Pods
        value: 5
        periodSeconds: 30  # 每 30 秒最多扩容 5 个 Pod
EOF

kubectl apply -f hpa-qps.yaml

执行前验证（确认自定义指标可用）：

# 查询自定义指标当前值
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq .
# 预期输出：显示每个 Pod 的 QPS 值

# 如果无数据，手动触发一些请求
kubectl run -it --rm test --image=curlimages/curl --restart=Never -- \
  sh -c 'for i in $(seq 1 100); do curl http://nginx-app.default.svc.cluster.local; done'

执行后验证：

# 查看 HPA 状态
kubectl get hpa nginx-hpa-qps
# 预期输出：
# TARGETS: 5/100 (http_requests_per_second), 10%/70% (cpu)

# 使用 wrk 压测工具触发扩容
kubectl run -it --rm wrk --image=skandyla/wrk --restart=Never -- \
  -t4 -c100 -d300s http://nginx-app.default.svc.cluster.local

# 观察扩容过程
kubectl get hpa nginx-hpa-qps --watch
# 预期变化：
# - QPS 上升到 150/100 时触发扩容
# - REPLICAS: 2 → 5 → 10（约 1 分钟）
# - 停止压测后，等待 5 分钟缩容

常见错误示例：

# 错误1：unable to get metric http_requests_per_second
# 原因：Prometheus Adapter 未正确配置或指标名称不匹配
# 解决：检查 Prometheus Adapter 配置，确认 seriesQuery 与实际指标一致

# 错误2：指标值为 <unknown>
# 原因：Prometheus 中无该指标数据
# 解决：确认 ServiceMonitor 正确配置，Prometheus 已采集到指标

最小必要原理

HPA 工作机制：

HPA Controller 每 15 秒（默认，可通过 --horizontal-pod-autoscaler-sync-period 调整）执行以下逻辑：

【HPA 决策流程】

步骤 1: 查询指标
  ├─ 资源指标（CPU/内存）→ Metrics Server API
  └─ 自定义指标（QPS/延迟）→ Custom Metrics API (Prometheus Adapter)

步骤 2: 计算期望副本数
  公式：期望副本数 = ceil(当前副本数 × (当前指标值 / 目标指标值))

  示例：
  - 当前副本数 = 4
  - 当前 QPS = 600（总计），平均每 Pod = 150
  - 目标 QPS = 100（每 Pod）
  - 期望副本数 = ceil(4 × (150 / 100)) = ceil(6) = 6

步骤 3: 应用容忍度（Tolerance）
  默认容忍度 = 0.1（即 10%）

  判断逻辑：
  ├─ 当前指标 > 目标 × 1.1 → 扩容
  ├─ 当前指标 < 目标 × 0.9 → 缩容
  └─ 在范围内 → 不变更

  示例（目标 100 QPS）：
  - 当前 115 QPS → 扩容（超过 110）
  - 当前 85 QPS → 缩容（低于 90）
  - 当前 95 QPS → 保持（在 90-110 范围内）

步骤 4: 应用行为策略（Behavior）
  扩容策略：
  ├─ stabilizationWindowSeconds: 30（观察 30 秒内最大值）
  ├─ policies: 每 30 秒最多扩容 5 个 Pod
  └─ selectPolicy: Max（多策略取最大）

  缩容策略：
  ├─ stabilizationWindowSeconds: 300（观察 5 分钟内最小值）
  ├─ policies: 每 60 秒最多缩容 2 个 Pod
  └─ 防止频繁缩容导致抖动

步骤 5: 更新 Deployment
  ├─ 修改 Deployment.spec.replicas
  ├─ ReplicaSet Controller 创建/删除 Pod
  └─ 等待 Pod Ready（通常 30-60 秒）

为什么需要 stabilizationWindowSeconds？

假设没有稳定窗口，指标每 15 秒波动：

T+0s: QPS=120 → 扩容到 6 副本
T+15s: QPS=80（因扩容导致分散）→ 缩容到 4 副本
T+30s: QPS=120（因缩容导致集中）→ 扩容到 6 副本
结果：频繁扩缩容，Pod 不断重启

引入稳定窗口后：

T+0s: QPS=120 → 记录
T+15s: QPS=80 → 记录
T+30s: 取 30 秒内最大值（120）→ 决定扩容
结果：平滑决策，减少抖动

为什么扩容快、缩容慢？

扩容：业务高峰，快速响应避免过载（30 秒冷却）
缩容：避免误判，防止流量回升导致再次扩容（5 分钟冷却）

可观测性

9.1 监控指标

HPA 核心指标：

# 查看 HPA 当前状态
kubectl get hpa -A

# 查看 HPA 详细指标
kubectl get hpa nginx-hpa-qps -o yaml | grep -A 10 currentMetrics

Prometheus 查询（用于 Grafana 面板）：

# 查询1：HPA 当前副本数
kube_horizontalpodautoscaler_status_current_replicas{hpa="nginx-hpa-qps"}

# 查询2：HPA 期望副本数
kube_horizontalpodautoscaler_status_desired_replicas{hpa="nginx-hpa-qps"}

# 查询3：HPA 扩缩容事件频率
rate(kube_horizontalpodautoscaler_status_desired_replicas{hpa="nginx-hpa-qps"}[5m])

# 查询4：当前指标值 vs 目标值
sum(rate(http_requests_total{namespace="default",pod=~"nginx-app-.*"}[2m])) by (pod)
# vs 目标值 100

# 查询5：Pod CPU 使用率
sum(rate(container_cpu_usage_seconds_total{namespace="default",pod=~"nginx-app-.*"}[2m])) by (pod)
/ on(pod) group_left() kube_pod_container_resource_requests{resource="cpu"} * 100

Grafana 面板示例（JSON 片段）：

{
"panels":[
{
"title":"HPA 副本数变化",
"targets":[
{"expr":"kube_horizontalpodautoscaler_status_current_replicas{hpa='nginx-hpa-qps'}"},
{"expr":"kube_horizontalpodautoscaler_status_desired_replicas{hpa='nginx-hpa-qps'}"}
],
"type":"graph"
},
{
"title":"QPS vs 目标值",
"targets":[
{"expr":"sum(rate(http_requests_total{namespace='default'}[2m]))"},
{"expr":"kube_horizontalpodautoscaler_status_current_replicas{hpa='nginx-hpa-qps'} * 100"}
],
"type":"graph"
}
]
}

9.2 告警规则

# prometheus-hpa-alerts.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: hpa-alerts
  namespace: monitoring
spec:
  groups:
  - name: hpa_alerts
    interval: 30s
    rules:
    # 告警1：HPA 达到最大副本数
    - alert: HPAMaxedOut
      expr: |
        kube_horizontalpodautoscaler_status_current_replicas
        >= kube_horizontalpodautoscaler_spec_max_replicas
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "HPA {{ $labels.hpa }} 已达到最大副本数"
        description: "HPA {{ $labels.hpa }} 当前副本数 {{ $value }}，已达到配置的最大值，可能需要调整 maxReplicas"

    # 告警2：HPA 频繁扩缩容
    - alert: HPAFlapping
      expr: |
        changes(kube_horizontalpodautoscaler_status_current_replicas[10m]) > 4
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "HPA {{ $labels.hpa }} 频繁扩缩容"
        description: "过去 10 分钟内副本数变化 {{ $value }} 次，可能需要调整 stabilizationWindowSeconds"

    # 告警3：HPA 指标获取失败
    - alert: HPAMetricsUnavailable
      expr: |
        kube_horizontalpodautoscaler_status_condition{condition="ScalingActive",status="false"} == 1
      for: 10m
      labels:
        severity: critical
      annotations:
        summary: "HPA {{ $labels.hpa }} 无法获取指标"
        description: "HPA {{ $labels.hpa }} 扩缩容功能失效，请检查 Metrics Server 或 Prometheus Adapter"

    # 告警4：Pod CPU 持续高负载
    - alert: PodCPUThrottling
      expr: |
        sum(rate(container_cpu_cfs_throttled_seconds_total[5m])) by (pod, namespace)
        / sum(rate(container_cpu_cfs_periods_total[5m])) by (pod, namespace) > 0.5
      for: 10m
      labels:
        severity: warning
      annotations:
        summary: "Pod {{ $labels.namespace }}/{{ $labels.pod }} CPU 被限流"
        description: "Pod CPU 使用超过 limits，被限流 {{ $value | humanizePercentage }}，建议调整 resources 配置"

应用告警规则：

kubectl apply -f prometheus-hpa-alerts.yaml

# 验证规则加载
kubectl get prometheusrules -n monitoring

常见故障与排错

症状	诊断命令	可能根因	快速修复	永久修复
HPA 显示 `<unknown>`	`kubectl get hpa`	1. Metrics Server 未部署 2. Pod 未设置 resources.requests	部署 Metrics Server	为所有 Pod 配置 requests
自定义指标无效	`kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1`	1. Prometheus Adapter 未运行 2. 指标映射配置错误	检查 Adapter Pod 日志	修复 rules 配置
HPA 不扩容	`kubectl describe hpa` 查看事件	1. 当前值未超过目标 × 1.1 2. 达到 maxReplicas	调整目标值或 maxReplicas	优化扩容策略
HPA 不缩容	等待 5 分钟观察	1. 在 stabilizationWindow 内 2. 当前值未低于目标 × 0.9	手动缩容验证功能	调整缩容窗口或目标值
Pod 频繁重启	`kubectl get events`	1. resources.limits 过小 2. 扩缩容过于激进	增大 limits	调整 behavior 策略
扩容延迟过长	计时扩容过程	1. 镜像拉取慢 2. 节点资源不足	预拉取镜像或增加节点	使用 Cluster Autoscaler

系统性排查流程：

【HPA 故障诊断流程】

步骤 1: HPA 状态检查
  命令: kubectl get hpa -A
  ↓
  指标显示 <unknown>？
  ├─ [是] → 步骤 2（检查 Metrics Server）
  └─ [否] → 步骤 4（检查扩缩容逻辑）

步骤 2: Metrics Server 检查
  命令: kubectl top nodes
  ↓
  能否显示资源使用？
  ├─ [否] → 步骤 3（修复 Metrics Server）
  └─ [是] → 步骤 4

步骤 3: 修复 Metrics Server
  命令: kubectl logs -n kube-system -l k8s-app=metrics-server
  常见问题:
  - 证书错误 → 添加 --kubelet-insecure-tls
  - 连接失败 → 检查网络和防火墙
  - 资源不足 → 调整 requests/limits

步骤 4: 检查扩缩容逻辑
  命令: kubectl describe hpa <name>
  查看 Conditions 和 Events
  ↓
  Events 中有 "failed to get metric" 错误？
  ├─ [是] → 步骤 5（自定义指标问题）
  └─ [否] → 步骤 6（行为策略问题）

步骤 5: 自定义指标排查
  命令: kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq
  ↓
  指标是否存在？
  ├─ [否] → 检查 Prometheus Adapter 配置
  │         命令: kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-adapter
  │         验证 Prometheus 中有该指标
  └─ [是] → 步骤 6

步骤 6: 行为策略检查
  查看 HPA YAML 中的 behavior 配置
  ↓
  计算期望副本数:
  - 当前值 / 目标值 × 当前副本数
  - 是否超出 min/max 范围？
  - 是否在容忍度范围内（±10%）？
  ↓
  验证冷却时间:
  - 扩容: 距上次扩容是否 > stabilizationWindowSeconds？
  - 缩容: 距上次缩容是否 > 5 分钟（默认）？

调试命令集合：

# 1. 查看 HPA 详细状态
kubectl get hpa <name> -o yaml

# 2. 查看 HPA 事件
kubectl describe hpa <name>

# 3. 查看 Metrics Server 日志
kubectl logs -n kube-system -l k8s-app=metrics-server --tail=50

# 4. 查看 Prometheus Adapter 日志
kubectl logs -n monitoring -l app.kubernetes.io/name=prometheus-adapter --tail=50

# 5. 手动查询自定义指标
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/http_requests_per_second" | jq

# 6. 查看 Pod 资源配置
kubectl get pods -o json | jq '.items[] | {name: .metadata.name, requests: .spec.containers[].resources.requests}'

# 7. 模拟 HPA 计算
# 公式：期望副本数 = ceil(当前副本数 × (当前指标值 / 目标指标值))
CURRENT_REPLICAS=$(kubectl get deployment nginx-app -o jsonpath='{.spec.replicas}')
CURRENT_METRIC=150  # 从 HPA 获取
TARGET_METRIC=100
echo "期望副本数: $(echo "scale=0; ($CURRENT_REPLICAS * $CURRENT_METRIC / $TARGET_METRIC + 0.5) / 1" | bc)"

# 8. 强制触发 HPA 重新计算（修改 HPA 注解）
kubectl annotate hpa nginx-hpa-qps force-sync="$(date +%s)" --overwrite

变更与回滚剧本

灰度策略

场景： 更新 HPA 配置（调整目标值或行为策略）

# 1. 备份当前 HPA 配置
kubectl get hpa nginx-hpa-qps -o yaml > hpa-backup-$(date +%Y%m%d-%H%M%S).yaml

# 2. 在测试环境验证新配置
kubectl apply -f hpa-new-config.yaml --dry-run=server

# 3. 应用新配置
kubectl apply -f hpa-new-config.yaml

# 4. 观察 5 分钟
kubectl get hpa nginx-hpa-qps --watch

# 5. 如果异常，立即回滚
kubectl apply -f hpa-backup-20250105-143000.yaml

健康检查清单

# 检查项1：HPA 状态正常
kubectl get hpa -A | grep -v "<unknown>"
# 预期：所有 HPA 显示具体指标值

# 检查项2：Metrics Server 可用
kubectl top nodes
# 预期：显示所有节点资源使用

# 检查项3：Prometheus Adapter 可用
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq '.resources | length'
# 预期：> 0（显示可用指标数量）

# 检查项4：Pod 资源配置完整
kubectl get pods -A -o json | jq -r '.items[] | select(.spec.containers[].resources.requests == null) | .metadata.name'
# 预期：无输出（所有 Pod 都配置了 requests）

# 检查项5：HPA 事件无错误
kubectl get events -A --field-selector involvedObject.kind=HorizontalPodAutoscaler | grep -i error
# 预期：无输出

回滚条件与命令

回滚触发条件：

HPA 扩容到 maxReplicas 后仍无法满足负载
HPA 频繁扩缩容（10 分钟内 > 5 次）
自定义指标获取失败超过 10 分钟

回滚步骤：

# 1. 禁用 HPA（保留原配置）
kubectl patch hpa nginx-hpa-qps -p '{"spec":{"maxReplicas":2,"minReplicas":2}}'

# 2. 手动设置固定副本数
kubectl scale deployment nginx-app --replicas=5

# 3. 恢复原 HPA 配置
kubectl apply -f hpa-backup.yaml

# 4. 验证服务正常
kubectl get pods -l app=nginx
curl http://nginx-app.default.svc.cluster.local

最佳实践

始终配置 resources.requests（HPA 基础）

resources:
  requests:
    cpu: 100m # HPA CPU 计算基准
    memory: 128Mi
  limits:
    cpu: 200m # 防止单 Pod 过载
    memory: 256Mi

使用多指标组合（提高准确性）

metrics:
- type: Resource
  resource:
    name: cpu
    target:
      type: Utilization
      averageUtilization: 70 # 兜底指标
- type: Pods
  pods:
    metric:
      name: http_requests_per_second
    target:
      type: AverageValue
      averageValue: "100" # 主要指标

调整容忍度避免抖动（默认 10% 可能过敏感）

# 通过 kube-controller-manager 参数调整
--horizontal-pod-autoscaler-tolerance=0.2  # 改为 20%

设置合理的 min/max
- minReplicas：满足基础负载 + 1（冗余）
- maxReplicas：节点总容量 / Pod requests × 0.8（避免资源耗尽）

扩容快、缩容慢

behavior:
  scaleUp:
    stabilizationWindowSeconds: 30 # 快速响应高峰
  scaleDown:
    stabilizationWindowSeconds: 300 # 避免误缩容

监控 HPA 事件（发现异常）

kubectl get events --sort-by='.lastTimestamp' | grep HorizontalPodAutoscaler

结合 Cluster Autoscaler（节点级弹性）
- HPA 扩容 Pod 后，CA 自动增加节点
- 配置 CA 的 --scale-down-delay-after-add=10m（扩容后 10 分钟再缩容）
使用 VPA 辅助调优（自动调整 requests）
- VPA（Vertical Pod Autoscaler）建议合理的 requests 值
- 避免 HPA 因 requests 过小导致频繁扩容
自定义指标延迟处理
- Prometheus 采集延迟：15-30 秒
- HPA 查询周期：15 秒
- 总延迟：30-45 秒，配置 stabilizationWindowSeconds 需大于此值
定期压测验证（每季度一次）
- 验证扩容速度：从 min 到 max 的时间
- 验证缩容稳定性：无抖动
- 更新压测报告文档

FAQ

Q1: HPA、VPA、Cluster Autoscaler 的区别？
A:

HPA：水平扩缩容（增减 Pod 数量），基于指标
VPA：垂直扩缩容（调整 Pod resources），需重启 Pod
Cluster Autoscaler：节点扩缩容（增减 Node），配合 HPA 使用

Q2: 为什么 HPA 显示 <unknown>？
A: 常见原因：

Metrics Server 未部署或不健康
Pod 未配置 resources.requests
自定义指标：Prometheus Adapter 未运行或配置错误

Q3: 如何加快 HPA 扩容速度？
A:

减少 stabilizationWindowSeconds（扩容）到 0-30 秒
增大 scaleUp.policies.value（每次扩容更多 Pod）
预拉取镜像到节点（减少 Pod 启动时间）
使用 kubectl set image 滚动更新而非重建

Q4: HPA 能否基于外部指标（如 SQS 队列长度）？
A: 可以，使用 type: External 指标：

metrics:
- type: External
  external:
    metric:
      name: sqs_queue_length
    selector:
      matchLabels:
        queue: my-queue
    target:
      type: AverageValue
      averageValue: "30"

需配置 External Metrics Adapter（如 KEDA）。

Q5: HPA 与固定副本数能否同时存在？
A: 不能。HPA 会覆盖 Deployment 的 spec.replicas。如需暂停 HPA，删除 HPA 对象后手动设置副本数。

Q6: 如何避免 HPA 频繁扩缩容？
A:

增大 stabilizationWindowSeconds（缩容建议 5-10 分钟）
调整容忍度（--horizontal-pod-autoscaler-tolerance）
使用多指标平均值（避免单指标抖动）

Q7: HPA 能否在指定时间段禁用？
A: 无原生支持。可通过 CronJob 动态调整 minReplicas 和 maxReplicas：

# 白天（高峰期）：minReplicas=10
kubectl patch hpa nginx-hpa-qps -p '{"spec":{"minReplicas":10}}'

# 夜间（低峰期）：minReplicas=2
kubectl patch hpa nginx-hpa-qps -p '{"spec":{"minReplicas":2}}'

Q8: HPA 支持 GPU 指标吗？
A: 支持，需配置 GPU Metrics（如 NVIDIA DCGM Exporter），通过 Prometheus Adapter 暴露。

Q9: 如何监控 HPA 的扩缩容历史？
A: 查询 Prometheus 指标：

changes(kube_horizontalpodautoscaler_status_current_replicas[24h])

或查看 Kubernetes Events（保留时间有限，建议导出到日志系统）。

Q10: HPA 能否跨命名空间扩缩容？
A: 不能。每个 HPA 仅控制同命名空间下的单个 Deployment/StatefulSet。

附录：一键部署脚本与完整配置

14.1 一键部署脚本

#!/bin/bash
# 文件名：deploy-hpa-stack.sh
# 用途：自动部署 Metrics Server + Prometheus + HPA

set -e

echo "[1/6] 检查 Kubernetes 集群..."
kubectl cluster-info || { echo "错误：无法连接 Kubernetes 集群"; exit 1; }

echo "[2/6] 部署 Metrics Server..."
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.4/components.yaml
kubectl patch deployment metrics-server -n kube-system --type='json' \
  -p='[{"op": "add", "path": "/spec/template/spec/containers/0/args/-", "value": "--kubelet-insecure-tls"}]'
kubectl wait --for=condition=ready pod -l k8s-app=metrics-server -n kube-system --timeout=120s

echo "[3/6] 部署 Prometheus Operator..."
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f -
helm upgrade --install prometheus prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --set prometheus.prometheusSpec.serviceMonitorSelectorNilUsesHelmValues=false \
  --set prometheus.prometheusSpec.retention=7d \
  --wait --timeout=5m

echo "[4/6] 部署 Prometheus Adapter..."
cat > /tmp/prometheus-adapter-values.yaml <<EOF
prometheus:
  url: http://prometheus-kube-prometheus-prometheus.monitoring.svc
  port: 9090
rules:
  default: true
  custom:
  - seriesQuery: 'http_requests_total{namespace!="",pod!=""}'
    resources:
      overrides:
        namespace: {resource: "namespace"}
        pod: {resource: "pod"}
    name:
      matches: "^(.*)_total$"
      as: "\${1}_per_second"
    metricsQuery: 'sum(rate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>)'
EOF

helm upgrade --install prometheus-adapter prometheus-community/prometheus-adapter \
  --namespace monitoring \
  -f /tmp/prometheus-adapter-values.yaml \
  --wait --timeout=3m

echo "[5/6] 部署示例应用..."
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx-app
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.24-alpine
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-app
spec:
  selector:
    app: nginx
  ports:
  - port: 80
    targetPort: 80
EOF

echo "[6/6] 创建 HPA..."
kubectl apply -f - <<EOF
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx-app
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
EOF

echo ""
echo "==== 部署完成 ===="
echo "验证命令："
echo "  kubectl get hpa"
echo "  kubectl top pods"
echo "  kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1"

使用方法：

chmod +x deploy-hpa-stack.sh
./deploy-hpa-stack.sh

扩展阅读

官方文档：

Kubernetes HPA 官方文档：https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/
Metrics Server GitHub：https://github.com/kubernetes-sigs/metrics-server
Prometheus Adapter：https://github.com/kubernetes-sigs/prometheus-adapter

深入技术博客：

HPA 算法详解：https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
Custom Metrics 最佳实践：https://github.com/kubernetes-sigs/custom-metrics-apiserver

社区资源：

KEDA（Kubernetes Event Driven Autoscaling）：https://keda.sh/
Kubernetes Autoscaling SIG：https://github.com/kubernetes/autoscaler

如果你想了解更多关于自动化监控和运维的最佳实践，例如如何整合 Grafana 进行可视化，可以访问我们的运维/DevOps/SRE 版块获取更多资源。对于希望深入学习云原生技术的开发者，云原生/IaaS 板块也提供了从容器化到服务网格的系列教程。欢迎到云栈社区交流探讨，共同构建更稳定的技术架构。

上一篇：四大主流向量数据库（Milvus/Qdrant/Weaviate/pgvector）深度对比与选型指南
下一篇：MySQL InnoDB 缓冲池生产级调优指南：从配置计算到性能监控

Kubernetes, Prometheus, HPA, 监控, 自动扩缩容

基于Prometheus自定义指标的Kubernetes HPA实战指南

适用场景 & 前置条件

反模式警告

环境与版本矩阵

阅读导航

快速清单

实施步骤

系统架构

Step 1: 部署 Metrics Server

Step 2: 部署 Prometheus Operator

Step 3: 部署 Prometheus Adapter

Step 4: 部署示例应用并暴露指标

Step 5: 创建基于 CPU 的 HPA（基础验证）

Step 6: 创建基于自定义指标（QPS）的 HPA

最小必要原理

可观测性

9.1 监控指标

9.2 告警规则

常见故障与排错

变更与回滚剧本

灰度策略

健康检查清单

回滚条件与命令

最佳实践

FAQ

附录：一键部署脚本与完整配置

14.1 一键部署脚本

扩展阅读

相关帖子