云栈社区»论坛 › 站务中心「 Forum Service 」 › K8s Ingress 生产级七层负载均衡：灰度发布、TLS 与监控实践 ...

发回帖发新帖

5659 积分	0 好友	745 主题

发消息

K8s Ingress 生产级七层负载均衡：灰度发布、TLS 与监控实践

发表于 2026-5-23 03:24:03 | 查看: 150| 回复: 0

一、概述

1.1 背景介绍

在云原生应用架构中，流量管理是保障服务稳定性和可用性的核心环节。随着微服务架构的普及，单一应用被拆分为数十甚至数百个独立服务，传统的四层负载均衡已无法满足复杂的流量调度需求。Kubernetes Ingress 作为集群的统一流量入口，提供了七层（HTTP/HTTPS）负载均衡能力，支持基于域名、路径、请求头等维度的精细化流量控制。

Kubernetes 从 1.19 版本开始将 Ingress API 升级为稳定版本（v1），并在后续版本中持续增强其功能。截至 2025 年，Kubernetes 1.31 版本对 Ingress 的支持已经非常成熟，配合 Ingress-NGINX Controller 1.11+ 版本，可以实现企业级的流量治理方案。

七层负载均衡相比四层负载均衡的核心优势在于：

对比维度	四层负载均衡（L4）	七层负载均衡（L7）
工作层级	TCP/UDP 传输层	HTTP/HTTPS 应用层
路由依据	IP + 端口	域名、路径、请求头、Cookie
SSL 终止	不支持	支持
内容感知	无	可解析 HTTP 协议
会话保持	基于源 IP	基于 Cookie
流量控制	粗粒度	细粒度

七层负载均衡流量动画示意图

1.2 技术特点

统一流量入口

Ingress 为集群内所有服务提供单一入口点，外部流量通过 Ingress Controller 进入集群，再根据路由规则分发到后端 Service。这种架构简化了网络拓扑，降低了运维复杂度。典型的流量路径为：

客户端 → 外部负载均衡器 → Ingress Controller → Service → Pod

声明式配置

Ingress 资源采用声明式 API 定义路由规则，运维人员只需描述期望的路由状态，Ingress Controller 自动完成配置同步。这种模式与 Kubernetes 的整体设计理念一致，支持 GitOps 工作流和基础设施即代码（IaC）实践。

多租户支持

通过 IngressClass 机制，单个集群可部署多个 Ingress Controller 实例，不同业务线或租户使用独立的流量入口，实现资源隔离和故障隔离。Kubernetes 1.31 支持为每个命名空间指定默认的 IngressClass。

扩展性设计

Ingress 规范定义了标准化的路由配置接口，具体实现由 Ingress Controller 完成。社区提供了丰富的 Controller 实现：

Ingress-NGINX：基于 NGINX，社区维护，功能全面，市场占有率超过 60%
Traefik：云原生设计，自动服务发现，内置 Dashboard
HAProxy Ingress：高性能，适合超大规模集群
Kong Ingress：API 网关能力，插件生态丰富
Istio Gateway：服务网格集成，高级流量管理

原生 TLS 支持

Ingress 内置 TLS 终止能力，通过 Secret 资源管理证书，支持 SNI（Server Name Indication）实现单 IP 多域名 HTTPS 服务。配合 cert-manager 可实现证书自动签发和续期。

1.3 适用场景

场景一：多域名统一接入

企业通常拥有多个域名，如主站 www.example.com、API 服务 api.example.com、管理后台 admin.example.com。Ingress 支持基于 Host 的路由，将不同域名的流量导向对应的后端服务，共享同一套 Ingress Controller 基础设施。

场景二：单域名多路径路由

微服务架构下，单个域名可能对应多个后端服务。例如 example.com/api/users 路由到用户服务，example.com/api/orders 路由到订单服务。Ingress 支持基于 Path 的路由，实现 URL 级别的流量分发。

场景三：灰度发布与 A/B 测试

Ingress-NGINX 支持基于权重、请求头、Cookie 的流量分割，可实现金丝雀发布（Canary Release）。新版本服务上线时，先导入 5% 的流量进行验证，逐步提升比例直至全量切换。

场景四：HTTPS 卸载与证书管理

在 Ingress 层面统一处理 TLS 终止，后端服务只需处理 HTTP 流量，简化应用配置。证书集中管理，避免在每个服务中重复配置。

场景五：流量限速与访问控制

通过 Ingress 注解配置请求速率限制、IP 白名单、Basic Auth 认证等安全策略，在流量入口统一拦截恶意请求。

场景六：WebSocket 与长连接支持

Ingress-NGINX 支持 WebSocket 协议升级和长连接保持，适用于实时通信、消息推送等场景。

1.4 环境要求

Kubernetes 集群

组件	最低版本	推荐版本	说明
Kubernetes	1.28	1.31+	需支持 Ingress v1 API
kubectl	1.28	1.31+	与集群版本匹配
Helm	3.12	3.16+	用于部署 Ingress Controller

节点配置

Ingress Controller 作为流量入口，对节点资源有一定要求：

部署规模	CPU	内存	说明
开发测试	0.5 核	512 MB	单副本，QPS < 1000
生产小规模	2 核	2 GB	双副本，QPS < 10000
生产中规模	4 核	4 GB	三副本，QPS < 50000
生产大规模	8 核	8 GB	多副本 + HPA，QPS > 50000

网络要求

Ingress Controller 需要通过 LoadBalancer 类型 Service 或 NodePort 暴露到集群外部
云环境建议使用云厂商提供的 LoadBalancer
裸金属环境可使用 MetalLB 或直接绑定节点 IP

存储要求

Ingress Controller 本身无状态，不需要持久化存储
如启用访问日志持久化，需配置 PV/PVC

操作系统

系统	版本	内核版本
Rocky Linux	9.3+	5.14+
Ubuntu	24.04 LTS	6.8+
Debian	12	6.1+

二、详细步骤

2.1 准备工作

2.1.1 验证集群状态

部署 Ingress Controller 前，确认 Kubernetes 集群正常运行：

# 检查集群版本
kubectl version --short

# 输出示例
# Client Version: v1.31.2
# Server Version: v1.31.1

# 检查节点状态
kubectl get nodes -o wide

# 输出示例
# NAME        STATUS   ROLES           AGE   VERSION   INTERNAL-IP    OS-IMAGE
# master-01   Ready    control-plane   30d   v1.31.1   192.168.1.10   Rocky Linux 9.3
# worker-01   Ready    <none>          30d   v1.31.1   192.168.1.11   Rocky Linux 9.3
# worker-02   Ready    <none>          30d   v1.31.1   192.168.1.12   Rocky Linux 9.3

# 检查核心组件状态
kubectl get pods -n kube-system

2.1.2 安装 Helm

Helm 是部署 Ingress-NGINX 的推荐方式：

# Rocky Linux 9 / Ubuntu 24.04
curl -fsSL https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

# 验证安装
helm version

# 输出示例
# version.BuildInfo{Version:"v3.16.2", GitCommit:"...", GoVersion:"go1.22.7"}

# 添加 ingress-nginx 官方仓库
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update

2.1.3 创建命名空间

为 Ingress Controller 创建独立命名空间，便于资源管理和权限控制：

kubectl create namespace ingress-nginx

# 设置资源配额（生产环境建议）
cat <<EOF | kubectl apply -f -
apiVersion: v1
kind: ResourceQuota
metadata:
  name: ingress-nginx-quota
  namespace: ingress-nginx
spec:
  hard:
    requests.cpu: "8"
    requests.memory: "16Gi"
    limits.cpu: "16"
    limits.memory: "32Gi"
    pods: "20"
EOF

2.1.4 准备 TLS 证书

生产环境使用正式证书，测试环境可使用自签名证书：

# 创建自签名证书（仅测试环境）
openssl req -x509 -nodes -days 365 -newkey rsa:2048 \
  -keyout tls.key \
  -out tls.crt \
  -subj "/CN=*.example.com/O=Example Inc"

# 创建 TLS Secret
kubectl create secret tls example-tls \
  --cert=tls.crt \
  --key=tls.key \
  -n default

# 验证 Secret
kubectl get secret example-tls -o yaml

2.1.5 部署测试应用

部署用于验证 Ingress 功能的测试应用：

# httpbin-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: httpbin
  namespace: default
spec:
  replicas: 2
  selector:
    matchLabels:
      app: httpbin
  template:
    metadata:
      labels:
        app: httpbin
    spec:
      containers:
      - name: httpbin
        image: kennethreitz/httpbin:latest
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 200m
            memory: 256Mi
        readinessProbe:
          httpGet:
            path: /status/200
            port: 80
          initialDelaySeconds: 5
          periodSeconds: 10
        livenessProbe:
          httpGet:
            path: /status/200
            port: 80
          initialDelaySeconds: 10
          periodSeconds: 30
---
apiVersion: v1
kind: Service
metadata:
  name: httpbin
  namespace: default
spec:
  selector:
    app: httpbin
  ports:
  - port: 80
    targetPort: 80

应用并验证：

# 部署
kubectl apply -f httpbin-deploy.yaml

# 验证部署
kubectl get pods -l app=httpbin
kubectl get svc httpbin

2.2 核心配置

2.2.1 Ingress Controller 部署

使用 Helm 部署 Ingress-NGINX Controller，以下配置适用于生产环境：

# ingress-nginx-values.yaml
controller:
  name: controller
  # 镜像配置
  image:
    registry: registry.k8s.io
    image: ingress-nginx/controller
    tag: "v1.11.3"
    pullPolicy: IfNotPresent

  # 副本数配置
  replicaCount: 2

  # 资源限制
  resources:
    requests:
      cpu: 500m
      memory: 512Mi
    limits:
      cpu: 2000m
      memory: 2Gi

  # 亲和性配置 - 将 Controller Pod 分散到不同节点
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchExpressions:
            - key: app.kubernetes.io/name
              operator: In
              values:
              - ingress-nginx
          topologyKey: kubernetes.io/hostname

  # 节点选择器（可选，指定运行节点）
  # nodeSelector:
  #   node-role.kubernetes.io/ingress: "true"

  # 容忍度配置
  tolerations: []

  # 服务配置
  service:
    enabled: true
    type: LoadBalancer
    # 云厂商特定注解
    annotations: {}
    # 指定 LoadBalancer IP（如适用）
    # loadBalancerIP: "203.0.113.10"
    # 外部流量策略
    externalTrafficPolicy: Local

  # IngressClass 配置
  ingressClassResource:
    name: nginx
    enabled: true
    default: true
    controllerValue: "k8s.io/ingress-nginx"

  # 配置项
  config:
    # 启用真实 IP 获取
    use-forwarded-headers: "true"
    compute-full-forwarded-for: "true"
    use-proxy-protocol: "false"

    # 日志格式
    log-format-upstream: '$remote_addr - $remote_user [$time_local] "$request" $status $body_bytes_sent "$http_referer" "$http_user_agent" $request_length $request_time [$proxy_upstream_name] [$proxy_alternative_upstream_name] $upstream_addr $upstream_response_length $upstream_response_time $upstream_status $req_id'

    # 性能调优
    worker-processes: "auto"
    max-worker-connections: "65535"
    worker-cpu-affinity: "auto"

    # 超时配置
    proxy-connect-timeout: "10"
    proxy-read-timeout: "60"
    proxy-send-timeout: "60"

    # 请求体大小限制
    proxy-body-size: "100m"

    # 启用 gzip 压缩
    use-gzip: "true"
    gzip-level: "5"
    gzip-types: "application/json application/javascript application/xml text/css text/plain text/xml"

    # 安全头
    hide-headers: "X-Powered-By,Server"

    # SSL 配置
    ssl-protocols: "TLSv1.2 TLSv1.3"
    ssl-ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384"
    ssl-prefer-server-ciphers: "true"

    # HSTS 配置
    hsts: "true"
    hsts-max-age: "31536000"
    hsts-include-subdomains: "true"
    hsts-preload: "true"

  # 指标配置
  metrics:
    enabled: true
    serviceMonitor:
      enabled: true
      namespace: monitoring
      additionalLabels:
        release: prometheus

  # Pod 中断预算
  podDisruptionBudget:
    enabled: true
    minAvailable: 1

  # 自动扩缩容
  autoscaling:
    enabled: true
    minReplicas: 2
    maxReplicas: 10
    targetCPUUtilizationPercentage: 70
    targetMemoryUtilizationPercentage: 80

# 默认后端配置
defaultBackend:
  enabled: true
  image:
    registry: registry.k8s.io
    image: defaultbackend-amd64
    tag: "1.5"
  resources:
    requests:
      cpu: 10m
      memory: 20Mi
    limits:
      cpu: 20m
      memory: 40Mi

执行部署：

helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --values ingress-nginx-values.yaml \
  --wait

# 验证部署状态
kubectl get pods -n ingress-nginx -w

2.2.2 Ingress 规则配置

Ingress 资源定义了路由规则，以下是完整的配置示例：

# basic-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpbin-ingress
  namespace: default
  annotations:
    # 指定 Ingress Class（如未设置默认值）
    # kubernetes.io/ingress.class: nginx

    # 重写目标路径
    nginx.ingress.kubernetes.io/rewrite-target: /

    # 启用 CORS
    nginx.ingress.kubernetes.io/enable-cors: "true"
    nginx.ingress.kubernetes.io/cors-allow-origin: "*"
    nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS"
    nginx.ingress.kubernetes.io/cors-allow-headers: "DNT,X-CustomHeader,Keep-Alive,User-Agent,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Authorization"

    # 代理缓冲配置
    nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
    nginx.ingress.kubernetes.io/proxy-buffers-number: "4"

    # 连接超时
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "60"
spec:
  ingressClassName: nginx
  rules:
  - host: httpbin.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: httpbin
            port:
              number: 80

路径类型说明

Kubernetes 1.31 支持三种路径匹配类型：

类型	说明	示例	匹配	不匹配
`Exact`	精确匹配	`/api`	`/api`	`/api/`, `/api/v1`
`Prefix`	前缀匹配	`/api`	`/api`, `/api/`, `/api/v1`	`/apis`
`ImplementationSpecific`	取决于 Controller 实现	-	-	-

2.2.3 TLS 配置

为 Ingress 配置 HTTPS 访问：

# tls-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpbin-tls-ingress
  namespace: default
  annotations:
    # 强制 HTTPS 重定向
    nginx.ingress.kubernetes.io/ssl-redirect: "true"

    # HSTS 配置
    nginx.ingress.kubernetes.io/hsts: "true"
    nginx.ingress.kubernetes.io/hsts-max-age: "31536000"
    nginx.ingress.kubernetes.io/hsts-include-subdomains: "true"

    # 后端协议（如后端是 HTTPS）
    # nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - httpbin.example.com
    secretName: example-tls
  rules:
  - host: httpbin.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: httpbin
            port:
              number: 80

使用 cert-manager 自动管理证书

# cert-manager-issuer.yaml
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: admin@example.com
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
    - http01:
        ingress:
          class: nginx
---
# 在 Ingress 中引用
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: httpbin-auto-tls
  namespace: default
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - httpbin.example.com
    secretName: httpbin-tls-auto  # cert-manager 自动创建
  rules:
  - host: httpbin.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: httpbin
            port:
              number: 80

2.3 启动和验证

2.3.1 应用 Ingress 配置

# 应用基础 Ingress
kubectl apply -f basic-ingress.yaml

# 应用 TLS Ingress
kubectl apply -f tls-ingress.yaml

# 查看 Ingress 状态
kubectl get ingress

# 输出示例
# NAME                    CLASS   HOSTS                 ADDRESS         PORTS     AGE
# httpbin-ingress         nginx   httpbin.example.com   203.0.113.10    80        1m
# httpbin-tls-ingress     nginx   httpbin.example.com   203.0.113.10    80, 443   30s

# 查看 Ingress 详情
kubectl describe ingress httpbin-ingress

2.3.2 获取外部访问地址

# 获取 LoadBalancer 外部 IP
kubectl get svc -n ingress-nginx ingress-nginx-controller

# 输出示例
# NAME                       TYPE           CLUSTER-IP     EXTERNAL-IP    PORT(S)
# ingress-nginx-controller   LoadBalancer   10.96.100.10   203.0.113.10   80:30080/TCP,443:30443/TCP

# 保存外部 IP 到变量
INGRESS_IP=$(kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "Ingress External IP: $INGRESS_IP"

# 如果是 NodePort 模式
NODE_PORT_HTTP=$(kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.spec.ports[?(@.port==80)].nodePort}')
NODE_PORT_HTTPS=$(kubectl get svc -n ingress-nginx ingress-nginx-controller -o jsonpath='{.spec.ports[?(@.port==443)].nodePort}')
echo "HTTP NodePort: $NODE_PORT_HTTP"
echo "HTTPS NodePort: $NODE_PORT_HTTPS"

2.3.3 功能验证

# 配置本地 hosts（测试环境）
echo "$INGRESS_IP httpbin.example.com" | sudo tee -a /etc/hosts

# 测试 HTTP 访问
curl -v http://httpbin.example.com/get

# 测试 HTTPS 访问（忽略自签名证书警告）
curl -vk https://httpbin.example.com/get

# 测试 HTTP 到 HTTPS 重定向
curl -I http://httpbin.example.com/get

# 输出示例
# HTTP/1.1 308 Permanent Redirect
# Location: https://httpbin.example.com/get

# 测试 POST 请求
curl -X POST https://httpbin.example.com/post \
  -H "Content-Type: application/json" \
  -d '{"key": "value"}' \
  -k

# 测试请求头传递
curl https://httpbin.example.com/headers \
  -H "X-Custom-Header: test-value" \
  -k

2.3.4 检查 Controller 日志

# 查看 Controller Pod
kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller

# 查看 Controller 日志
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller --tail=100

# 实时跟踪日志
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f

# 查看 NGINX 配置
kubectl exec -n ingress-nginx \
  $(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}') \
  -- cat /etc/nginx/nginx.conf

2.3.5 验证高可用性

# 检查 Pod 分布
kubectl get pods -n ingress-nginx -o wide

# 模拟 Pod 故障
kubectl delete pod -n ingress-nginx \
  $(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}')

# 验证服务持续可用
while true; do
  curl -s -o /dev/null -w "%{http_code}\n" http://httpbin.example.com/status/200
  sleep 1
done

三、示例代码和配置

3.1 完整配置示例

3.1.1 生产环境完整部署清单

# namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: production
  labels:
    env: production
    team: platform
---
# resource-quota.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "20"
    requests.memory: "40Gi"
    limits.cpu: "40"
    limits.memory: "80Gi"
    persistentvolumeclaims: "10"
    services.loadbalancers: "2"
---
# limit-range.yaml
apiVersion: v1
kind: LimitRange
metadata:
  name: production-limits
  namespace: production
spec:
  limits:
  - default:
      cpu: 500m
      memory: 512Mi
    defaultRequest:
      cpu: 100m
      memory: 128Mi
    type: Container

3.1.2 多服务部署

# deployments.yaml
# 用户服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: user-service
  namespace: production
  labels:
    app: user-service
    version: v1.2.0
spec:
  replicas: 3
  selector:
    matchLabels:
      app: user-service
  template:
    metadata:
      labels:
        app: user-service
        version: v1.2.0
    spec:
      containers:
      - name: user-service
        image: example/user-service:v1.2.0
        ports:
        - containerPort: 8080
          name: http
        - containerPort: 9090
          name: metrics
        env:
        - name: SERVICE_NAME
          value: "user-service"
        - name: LOG_LEVEL
          value: "info"
        resources:
          requests:
            cpu: 200m
            memory: 256Mi
          limits:
            cpu: 1000m
            memory: 1Gi
        readinessProbe:
          httpGet:
            path: /health/ready
            port: 8080
          initialDelaySeconds: 10
          periodSeconds: 5
          timeoutSeconds: 3
          successThreshold: 1
          failureThreshold: 3
        livenessProbe:
          httpGet:
            path: /health/live
            port: 8080
          initialDelaySeconds: 30
          periodSeconds: 10
          timeoutSeconds: 5
          successThreshold: 1
          failureThreshold: 3
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            podAffinityTerm:
              labelSelector:
                matchLabels:
                  app: user-service
              topologyKey: kubernetes.io/hostname
---
# 订单服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: order-service
  namespace: production
  labels:
    app: order-service
    version: v2.0.1
spec:
  replicas: 3
  selector:
    matchLabels:
      app: order-service
  template:
    metadata:
      labels:
        app: order-service
        version: v2.0.1
    spec:
      containers:
      - name: order-service
        image: example/order-service:v2.0.1
        ports:
        - containerPort: 8080
          name: http
        resources:
          requests:
            cpu: 300m
            memory: 512Mi
          limits:
            cpu: 1500m
            memory: 2Gi
        readinessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 15
          periodSeconds: 5
        livenessProbe:
          httpGet:
            path: /actuator/health
            port: 8080
          initialDelaySeconds: 60
          periodSeconds: 15
---
# 前端服务
apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: production
  labels:
    app: frontend
    version: v3.1.0
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
        version: v3.1.0
    spec:
      containers:
      - name: frontend
        image: example/frontend:v3.1.0
        ports:
        - containerPort: 80
        resources:
          requests:
            cpu: 100m
            memory: 128Mi
          limits:
            cpu: 500m
            memory: 512Mi

3.1.3 Service 定义

# services.yaml
apiVersion: v1
kind: Service
metadata:
  name: user-service
  namespace: production
  labels:
    app: user-service
spec:
  selector:
    app: user-service
  ports:
  - name: http
    port: 80
    targetPort: 8080
  - name: metrics
    port: 9090
    targetPort: 9090
---
apiVersion: v1
kind: Service
metadata:
  name: order-service
  namespace: production
  labels:
    app: order-service
spec:
  selector:
    app: order-service
  ports:
  - name: http
    port: 80
    targetPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: frontend
  namespace: production
  labels:
    app: frontend
spec:
  selector:
    app: frontend
  ports:
  - name: http
    port: 80
    targetPort: 80

3.1.4 完整 Ingress 配置

# production-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: production-ingress
  namespace: production
  annotations:
    # SSL 配置
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"

    # 安全头
    nginx.ingress.kubernetes.io/configuration-snippet: |
      add_header X-Frame-Options "SAMEORIGIN" always;
      add_header X-Content-Type-Options "nosniff" always;
      add_header X-XSS-Protection "1; mode=block" always;
      add_header Referrer-Policy "strict-origin-when-cross-origin" always;
      add_header Content-Security-Policy "default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline';" always;

    # 速率限制
    nginx.ingress.kubernetes.io/limit-rps: "100"
    nginx.ingress.kubernetes.io/limit-connections: "50"

    # 请求体大小
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"

    # 超时配置
    nginx.ingress.kubernetes.io/proxy-connect-timeout: "30"
    nginx.ingress.kubernetes.io/proxy-read-timeout: "120"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "120"

    # 负载均衡算法
    nginx.ingress.kubernetes.io/upstream-hash-by: "$request_uri"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - www.example.com
    - api.example.com
    secretName: production-tls
  rules:
  # 前端路由
  - host: www.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: frontend
            port:
              number: 80
  # API 路由
  - host: api.example.com
    http:
      paths:
      - path: /users
        pathType: Prefix
        backend:
          service:
            name: user-service
            port:
              number: 80
      - path: /orders
        pathType: Prefix
        backend:
          service:
            name: order-service
            port:
              number: 80

3.2 实际应用案例

3.2.1 多域名路由

企业通常运营多个品牌或产品线，每个使用独立域名：

# multi-domain-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: multi-domain-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - www.brand-a.com
    secretName: brand-a-tls
  - hosts:
    - www.brand-b.com
    secretName: brand-b-tls
  - hosts:
    - api.brand-a.com
    - api.brand-b.com
    secretName: api-wildcard-tls
  rules:
  # Brand A 主站
  - host: www.brand-a.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: brand-a-frontend
            port:
              number: 80
  # Brand B 主站
  - host: www.brand-b.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: brand-b-frontend
            port:
              number: 80
  # Brand A API
  - host: api.brand-a.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: brand-a-api
            port:
              number: 80
  # Brand B API
  - host: api.brand-b.com
    http:
      paths:
      - path: /v1
        pathType: Prefix
        backend:
          service:
            name: brand-b-api
            port:
              number: 80

通配符域名配置

# wildcard-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: wildcard-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    # 捕获子域名并传递到后端
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header X-Subdomain $host;
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - "*.saas.example.com"
    secretName: saas-wildcard-tls
  rules:
  - host: "*.saas.example.com"
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: saas-gateway
            port:
              number: 80

3.2.2 路径重写

API 版本管理场景，将外部路径映射到内部服务：

# path-rewrite-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api-gateway
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    # 路径重写规则
    nginx.ingress.kubernetes.io/rewrite-target: /$2
    nginx.ingress.kubernetes.io/use-regex: "true"
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - api.example.com
    secretName: api-tls
  rules:
  - host: api.example.com
    http:
      paths:
      # /v1/users/xxx -> /xxx (用户服务)
      - path: /v1/users(/|$)(.*)
        pathType: ImplementationSpecific
        backend:
          service:
            name: user-service
            port:
              number: 80
      # /v1/orders/xxx -> /xxx (订单服务)
      - path: /v1/orders(/|$)(.*)
        pathType: ImplementationSpecific
        backend:
          service:
            name: order-service
            port:
              number: 80
      # /v1/products/xxx -> /xxx (商品服务)
      - path: /v1/products(/|$)(.*)
        pathType: ImplementationSpecific
        backend:
          service:
            name: product-service
            port:
              number: 80

复杂路径重写示例

# complex-rewrite.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: legacy-api-adapter
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /api/v2/$1
    nginx.ingress.kubernetes.io/use-regex: "true"
    # 添加请求头标识来源
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header X-Original-URI $request_uri;
      proxy_set_header X-API-Version "v1-legacy";
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      # 将旧版 API 路径映射到新版
      # /legacy/users -> /api/v2/users
      - path: /legacy/(.*)
        pathType: ImplementationSpecific
        backend:
          service:
            name: api-gateway
            port:
              number: 80

3.2.3 灰度发布（金丝雀发布）

Ingress-NGINX 支持基于多种维度的流量分割：

基于权重的灰度

# canary-weight.yaml
# 稳定版本 Ingress
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-stable
  namespace: production
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-stable
            port:
              number: 80
---
# 金丝雀版本 Ingress - 承接 10% 流量
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-canary
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    nginx.ingress.kubernetes.io/canary-weight: "10"
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-canary
            port:
              number: 80

基于请求头的灰度

# canary-header.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-canary-header
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    # 当请求头 X-Canary: always 时路由到金丝雀版本
    nginx.ingress.kubernetes.io/canary-by-header: "X-Canary"
    nginx.ingress.kubernetes.io/canary-by-header-value: "always"
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-canary
            port:
              number: 80

基于 Cookie 的灰度

# canary-cookie.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: app-canary-cookie
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/canary: "true"
    # 当 Cookie canary=true 时路由到金丝雀版本
    nginx.ingress.kubernetes.io/canary-by-cookie: "canary"
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: app-canary
            port:
              number: 80

完整的灰度发布流程

#!/bin/bash
# canary-deploy.sh - 灰度发布自动化脚本

set -e

NAMESPACE="production"
APP_NAME="myapp"
CANARY_WEIGHT=0

# 部署金丝雀版本
deploy_canary() {
  local version=$1
  echo "Deploying canary version: $version"

  kubectl set image deployment/${APP_NAME}-canary \
    ${APP_NAME}=example/${APP_NAME}:${version} \
    -n ${NAMESPACE}

  kubectl rollout status deployment/${APP_NAME}-canary -n ${NAMESPACE}
}

# 调整流量权重
adjust_weight() {
  local weight=$1
  echo "Adjusting canary weight to: ${weight}%"

  kubectl annotate ingress ${APP_NAME}-canary \
    nginx.ingress.kubernetes.io/canary-weight="${weight}" \
    --overwrite \
    -n ${NAMESPACE}
}

# 监控错误率
check_error_rate() {
  local threshold=$1
  # 从 Prometheus 查询错误率
  error_rate=$(curl -s "http://prometheus:9090/api/v1/query?query=sum(rate(http_requests_total{service=\"${APP_NAME}-canary\",status=~\"5..\"}[5m]))/sum(rate(http_requests_total{service=\"${APP_NAME}-canary\"}[5m]))*100" | jq -r '.data.result[0].value[1]')

  if (( $(echo "$error_rate > $threshold" | bc -l) )); then
    echo "Error rate ${error_rate}% exceeds threshold ${threshold}%"
    return 1
  fi
  echo "Error rate ${error_rate}% is acceptable"
  return 0
}

# 执行灰度发布
progressive_rollout() {
  local version=$1
  local weights=(5 10 25 50 75 100)

  deploy_canary $version

  for weight in "${weights[@]}"; do
    adjust_weight $weight
    echo "Waiting 5 minutes for traffic analysis..."
    sleep 300

    if ! check_error_rate 1.0; then
      echo "Rolling back due to high error rate"
      adjust_weight 0
      exit 1
    fi
  done

  echo "Canary deployment successful, promoting to stable"
  # 更新稳定版本
  kubectl set image deployment/${APP_NAME}-stable \
    ${APP_NAME}=example/${APP_NAME}:${version} \
    -n ${NAMESPACE}

  # 重置金丝雀权重
  adjust_weight 0
}

# 主流程
progressive_rollout "v2.0.0"

3.2.4 会话保持（Session Affinity）

对于需要会话状态的应用，配置基于 Cookie 的会话保持：

# session-affinity.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: stateful-app
  namespace: production
  annotations:
    # 启用会话亲和性
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/affinity-mode: "persistent"
    nginx.ingress.kubernetes.io/session-cookie-name: "SERVERID"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
    nginx.ingress.kubernetes.io/session-cookie-path: "/"
    nginx.ingress.kubernetes.io/session-cookie-samesite: "Strict"
    nginx.ingress.kubernetes.io/session-cookie-conditional-samesite-none: "true"
spec:
  ingressClassName: nginx
  rules:
  - host: stateful.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: stateful-app
            port:
              number: 80

3.2.5 WebSocket 支持

# websocket-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: websocket-app
  namespace: production
  annotations:
    # WebSocket 相关配置
    nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
    nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
    nginx.ingress.kubernetes.io/upstream-hash-by: "$request_uri"
    # 配置 WebSocket 协议升级
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection "upgrade";
      proxy_http_version 1.1;
spec:
  ingressClassName: nginx
  tls:
  - hosts:
    - ws.example.com
    secretName: ws-tls
  rules:
  - host: ws.example.com
    http:
      paths:
      - path: /socket.io
        pathType: Prefix
        backend:
          service:
            name: socketio-server
            port:
              number: 80
      - path: /ws
        pathType: Prefix
        backend:
          service:
            name: websocket-server
            port:
              number: 80

四、最佳实践和注意事项

4.1 最佳实践

4.1.1 性能调优

NGINX Worker 配置

# ConfigMap 性能优化配置
apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  # Worker 进程数，auto 表示与 CPU 核心数一致
  worker-processes: "auto"

  # 每个 Worker 的最大连接数
  max-worker-connections: "65535"

  # Worker CPU 亲和性
  worker-cpu-affinity: "auto"

  # 启用多线程 sendfile
  aio-threads: "true"

  # Keepalive 连接数
  upstream-keepalive-connections: "320"
  upstream-keepalive-timeout: "60"
  upstream-keepalive-requests: "10000"

  # 启用 HTTP/2
  use-http2: "true"

  # 大文件传输优化
  proxy-buffering: "on"
  proxy-buffer-size: "128k"
  proxy-buffers: "4 256k"
  proxy-busy-buffers-size: "256k"

  # 响应压缩
  use-gzip: "true"
  gzip-level: "5"
  gzip-min-length: "256"
  gzip-types: "application/atom+xml application/javascript application/json application/rss+xml application/vnd.ms-fontobject application/x-font-opentype application/x-font-truetype application/x-font-ttf application/x-javascript application/xhtml+xml application/xml font/eot font/opentype font/otf font/truetype image/svg+xml image/vnd.microsoft.icon image/x-icon image/x-win-bitmap text/css text/javascript text/plain text/xml"

后端连接池优化

# 针对高并发服务的 Ingress 配置
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: high-traffic-service
  namespace: production
  annotations:
    # 后端连接复用
    nginx.ingress.kubernetes.io/upstream-keepalive-connections: "100"

    # 负载均衡算法
    nginx.ingress.kubernetes.io/load-balance: "ewma"

    # 后端服务发现刷新间隔
    nginx.ingress.kubernetes.io/service-upstream: "true"

    # 代理缓冲
    nginx.ingress.kubernetes.io/proxy-buffering: "on"
    nginx.ingress.kubernetes.io/proxy-buffer-size: "128k"
    nginx.ingress.kubernetes.io/proxy-buffers-number: "4"
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

资源配置基准

场景	QPS	CPU Request	CPU Limit	Memory Request	Memory Limit	副本数
小型站点	< 1,000	200m	1000m	256Mi	1Gi	2
中型应用	1,000-10,000	500m	2000m	512Mi	2Gi	3
大型平台	10,000-50,000	1000m	4000m	1Gi	4Gi	5
超大规模	> 50,000	2000m	8000m	2Gi	8Gi	10+

4.1.2 安全配置

TLS 安全加固

# tls-security.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
data:
  # TLS 协议版本
  ssl-protocols: "TLSv1.2 TLSv1.3"

  # 安全加密套件
  ssl-ciphers: "ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384"

  # 优先使用服务端加密套件
  ssl-prefer-server-ciphers: "true"

  # OCSP Stapling
  enable-ocsp: "true"

  # 会话复用
  ssl-session-cache: "true"
  ssl-session-cache-size: "10m"
  ssl-session-timeout: "1d"
  ssl-session-tickets: "false"

  # HSTS
  hsts: "true"
  hsts-max-age: "31536000"
  hsts-include-subdomains: "true"
  hsts-preload: "true"

请求限制与防护

# rate-limiting.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: protected-api
  namespace: production
  annotations:
    # 速率限制 - 每秒请求数
    nginx.ingress.kubernetes.io/limit-rps: "50"

    # 连接数限制
    nginx.ingress.kubernetes.io/limit-connections: "10"

    # 请求体大小限制
    nginx.ingress.kubernetes.io/proxy-body-size: "10m"

    # 限速白名单（CIDR）
    nginx.ingress.kubernetes.io/limit-whitelist: "10.0.0.0/8,172.16.0.0/12"

    # 限速响应码
    nginx.ingress.kubernetes.io/limit-rate-after: "500k"
    nginx.ingress.kubernetes.io/limit-rate: "100k"

    # ModSecurity WAF（需要启用）
    nginx.ingress.kubernetes.io/enable-modsecurity: "true"
    nginx.ingress.kubernetes.io/enable-owasp-core-rules: "true"
    nginx.ingress.kubernetes.io/modsecurity-snippet: |
      SecRuleEngine On
      SecRequestBodyAccess On
      SecAuditEngine RelevantOnly
      SecAuditLogParts ABIJDEFHZ
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: api-service
            port:
              number: 80

IP 白名单与黑名单

# ip-access-control.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: admin-panel
  namespace: production
  annotations:
    # IP 白名单
    nginx.ingress.kubernetes.io/whitelist-source-range: "10.0.0.0/8,192.168.1.0/24,203.0.113.50/32"

    # 自定义错误页面
    nginx.ingress.kubernetes.io/custom-http-errors: "403,404,500,502,503"
    nginx.ingress.kubernetes.io/default-backend: error-pages
spec:
  ingressClassName: nginx
  rules:
  - host: admin.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: admin-panel
            port:
              number: 80

Basic Auth 认证

# 创建密码文件
htpasswd -c auth admin
# 输入密码

# 创建 Secret
kubectl create secret generic basic-auth \
  --from-file=auth \
  -n production

# basic-auth-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: protected-area
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/auth-type: basic
    nginx.ingress.kubernetes.io/auth-secret: basic-auth
    nginx.ingress.kubernetes.io/auth-realm: "Authentication Required - Admin Area"
spec:
  ingressClassName: nginx
  rules:
  - host: internal.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: internal-app
            port:
              number: 80

外部认证服务

# external-auth.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: oauth-protected
  namespace: production
  annotations:
    # 外部认证 URL
    nginx.ingress.kubernetes.io/auth-url: "https://auth.example.com/oauth2/auth"
    nginx.ingress.kubernetes.io/auth-signin: "https://auth.example.com/oauth2/start?rd=$escaped_request_uri"

    # 传递认证响应头到后端
    nginx.ingress.kubernetes.io/auth-response-headers: "X-Auth-Request-User,X-Auth-Request-Email,X-Auth-Request-Groups"

    # 缓存认证结果
    nginx.ingress.kubernetes.io/auth-cache-key: "$cookie_session"
    nginx.ingress.kubernetes.io/auth-cache-duration: "200 202 401 5m"
spec:
  ingressClassName: nginx
  rules:
  - host: app.example.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: protected-app
            port:
              number: 80

4.1.3 高可用部署

Pod 反亲和性配置

# ha-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
  template:
    metadata:
      labels:
        app.kubernetes.io/name: ingress-nginx
    spec:
      affinity:
        # Pod 反亲和 - 分散到不同节点
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchExpressions:
              - key: app.kubernetes.io/name
                operator: In
                values:
                - ingress-nginx
            topologyKey: kubernetes.io/hostname
        # 节点亲和 - 优先调度到边缘节点
        nodeAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
          - weight: 100
            preference:
              matchExpressions:
              - key: node-role.kubernetes.io/edge
                operator: Exists
      # 拓扑分布约束 - 跨可用区分布
      topologySpreadConstraints:
      - maxSkew: 1
        topologyKey: topology.kubernetes.io/zone
        whenUnsatisfiable: DoNotSchedule
        labelSelector:
          matchLabels:
            app.kubernetes.io/name: ingress-nginx

Pod 中断预算

# pdb.yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: ingress-nginx-pdb
  namespace: ingress-nginx
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/component: controller

自动扩缩容

# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: ingress-nginx-hpa
  namespace: ingress-nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: ingress-nginx-controller
  minReplicas: 2
  maxReplicas: 20
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  - type: Pods
    pods:
      metric:
        name: nginx_ingress_controller_requests_per_second
      target:
        type: AverageValue
        averageValue: "10000"
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 4
        periodSeconds: 60
      - type: Percent
        value: 100
        periodSeconds: 60
      selectPolicy: Max
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 120

4.2 注意事项

4.2.1 常见错误及解决方案

错误类型	错误现象	可能原因	解决方案
502 Bad Gateway	页面显示 502 错误	后端 Service 未就绪	检查 Pod 状态和 readinessProbe
503 Service Unavailable	服务不可用	Endpoints 为空	确认 Service selector 与 Pod labels 匹配
504 Gateway Timeout	请求超时	后端响应时间过长	调整 proxy-read-timeout 参数
404 Not Found	路径不存在	Ingress 路由规则未匹配	检查 path 配置和 pathType
SSL 证书错误	浏览器显示不安全	证书过期或不匹配	更新 TLS Secret
重定向循环	ERR_TOO_MANY_REDIRECTS	HTTP/HTTPS 重定向配置冲突	检查 ssl-redirect 注解
上传文件失败	Request Entity Too Large	请求体大小超限	调整 proxy-body-size
WebSocket 断连	连接频繁断开	超时时间过短	增加 proxy-read-timeout
流量不均衡	部分 Pod 负载过高	会话亲和或负载算法问题	检查 affinity 配置
健康检查失败	Pod 被频繁重启	livenessProbe 配置过严	调整探针参数

4.2.2 版本兼容性矩阵

Ingress-NGINX 版本	Kubernetes 支持版本	NGINX 版本	重要特性
1.11.x	1.28 - 1.31	1.25.x	Ingress v1 完整支持
1.10.x	1.27 - 1.30	1.25.x	改进的 Canary 功能
1.9.x	1.26 - 1.29	1.21.x	OpenTelemetry 支持
1.8.x	1.25 - 1.28	1.21.x	增强的 ModSecurity

4.2.3 资源命名规范

# 命名规范示例
# Ingress 命名: <service>-<environment>-ingress
# 示例: api-production-ingress, frontend-staging-ingress

# Secret 命名: <domain>-tls
# 示例: example-com-tls, api-example-com-tls

# ConfigMap 命名: <component>-config
# 示例: ingress-nginx-config, app-config

4.2.4 配置变更注意事项

ConfigMap 变更：修改 Ingress Controller ConfigMap 后，需要重启 Controller Pod 才能生效
Ingress 注解优先级：Ingress 资源的注解会覆盖 ConfigMap 中的全局配置
TLS Secret 更新：更新证书 Secret 后，NGINX 会自动重新加载，无需重启
金丝雀 Ingress 数量：每个主 Ingress 最多只能有一个对应的 Canary Ingress

4.2.5 生产环境检查清单

## 部署前检查
- [ ] 集群版本与 Ingress Controller 版本兼容
- [ ] 资源配额和限制已正确设置
- [ ] TLS 证书有效期 > 30 天
- [ ] 后端服务 readinessProbe 配置正确
- [ ] PodDisruptionBudget 已配置
- [ ] HPA 已配置且参数合理

## 配置检查
- [ ] IngressClass 已设置为默认（如需要）
- [ ] 路由规则无冲突
- [ ] HTTPS 重定向已启用
- [ ] 安全头已配置
- [ ] 速率限制已启用

## 监控告警
- [ ] Prometheus 指标采集已配置
- [ ] Grafana Dashboard 已导入
- [ ] 关键告警规则已创建
- [ ] 日志收集已配置

## 灾备准备
- [ ] 多副本跨节点部署
- [ ] 配置已版本化（GitOps）
- [ ] 回滚流程已验证

五、故障排查和监控

5.1 故障排查

5.1.1 诊断流程

                     ┌─────────────────┐
                     │  故障现象识别   │
                     └────────┬────────┘
                              │
              ┌──────────────┼──────────────┐
              │              │              │
              ▼              ▼              ▼
        ┌───────────┐  ┌───────────┐  ┌───────────┐
        │  连接失败  │  │  响应异常  │  │  性能问题  │
        └─────┬─────┘  └─────┬─────┘  └─────┬─────┘
              │              │              │
              ▼              ▼              ▼
      检查 Service      检查 Ingress    检查 Controller
      和 Endpoints      路由配置        资源使用
              │              │              │
              ▼              ▼              ▼
      检查 Pod 状态    检查 NGINX       检查后端
      和 readiness     配置和日志      响应时间

5.1.2 常用诊断命令

#!/bin/bash
# diagnose-ingress.sh - Ingress 诊断脚本

NAMESPACE="${1:-ingress-nginx}"
INGRESS_NAME="${2:-}"

echo "=== Ingress Controller 状态 ==="
kubectl get pods -n $NAMESPACE -l app.kubernetes.io/component=controller

echo -e "\n=== Controller 事件 ==="
kubectl get events -n $NAMESPACE --sort-by='.lastTimestamp' | tail -20

echo -e "\n=== Ingress 资源列表 ==="
kubectl get ingress -A

if [ -n "$INGRESS_NAME" ]; then
  echo -e "\n=== Ingress 详情: $INGRESS_NAME ==="
  kubectl describe ingress $INGRESS_NAME
fi

echo -e "\n=== Controller 配置检查 ==="
CONTROLLER_POD=$(kubectl get pods -n $NAMESPACE -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}')

# 检查 NGINX 配置语法
kubectl exec -n $NAMESPACE $CONTROLLER_POD -- nginx -t 2>&1

echo -e "\n=== 活跃连接数 ==="
kubectl exec -n $NAMESPACE $CONTROLLER_POD -- curl -s http://localhost:10254/metrics | grep nginx_ingress_controller_nginx_process_connections

echo -e "\n=== 上游服务状态 ==="
kubectl exec -n $NAMESPACE $CONTROLLER_POD -- curl -s http://localhost:10254/metrics | grep nginx_ingress_controller_upstream_server

echo -e "\n=== 最近错误日志 ==="
kubectl logs -n $NAMESPACE $CONTROLLER_POD --tail=50 | grep -i error

5.1.3 网络连通性测试

# 从 Controller Pod 内部测试后端服务
CONTROLLER_POD=$(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}')

# DNS 解析测试
kubectl exec -n ingress-nginx $CONTROLLER_POD -- nslookup httpbin.default.svc.cluster.local

# TCP 连通性测试
kubectl exec -n ingress-nginx $CONTROLLER_POD -- nc -zv httpbin.default.svc.cluster.local 80

# HTTP 请求测试
kubectl exec -n ingress-nginx $CONTROLLER_POD -- curl -v http://httpbin.default.svc.cluster.local/status/200

# 查看 Endpoints
kubectl get endpoints httpbin -o wide

5.1.4 NGINX 配置分析

# 导出当前 NGINX 配置
kubectl exec -n ingress-nginx \
  $(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}') \
  -- cat /etc/nginx/nginx.conf > nginx.conf.dump

# 查看特定 server 块配置
kubectl exec -n ingress-nginx \
  $(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}') \
  -- cat /etc/nginx/nginx.conf | grep -A 50 "server_name httpbin.example.com"

# 查看 upstream 配置
kubectl exec -n ingress-nginx \
  $(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}') \
  -- cat /etc/nginx/nginx.conf | grep -A 10 "upstream default-httpbin"

# 测试配置语法
kubectl exec -n ingress-nginx \
  $(kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller -o jsonpath='{.items[0].metadata.name}') \
  -- nginx -t

5.1.5 日志分析

# 实时查看访问日志
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f | grep "httpbin.example.com"

# 过滤错误请求（5xx）
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller --tail=1000 | grep '" 5[0-9][0-9] '

# 分析慢请求（响应时间 > 1s）
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller --tail=1000 | \
  awk -F'"' '{print $1}' | awk '$NF > 1 {print}'

# 统计状态码分布
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller --tail=10000 | \
  awk '{print $9}' | sort | uniq -c | sort -rn

# JSON 格式日志解析（如启用 JSON 日志）
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller --tail=100 | \
  jq -r 'select(.status >= 500) | "\(.time) \(.request) \(.status) \(.upstream_response_time)"'

5.1.6 常见问题排查

问题：502 Bad Gateway

# 排查步骤
# 1. 检查后端 Pod 状态
kubectl get pods -l app=httpbin -o wide

# 2. 检查 Service Endpoints
kubectl get endpoints httpbin

# 3. 检查 Pod 健康状态
kubectl describe pod -l app=httpbin | grep -A 5 "Conditions:"

# 4. 从 Controller 测试后端
kubectl exec -n ingress-nginx $CONTROLLER_POD -- curl -v http://httpbin.default.svc.cluster.local/status/200

# 5. 检查 readinessProbe
kubectl get pod -l app=httpbin -o jsonpath='{.items[0].spec.containers[0].readinessProbe}'

问题：SSL 证书错误

# 检查 Secret 是否存在
kubectl get secret example-tls -o yaml

# 验证证书有效期
kubectl get secret example-tls -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -dates

# 检查证书域名
kubectl get secret example-tls -o jsonpath='{.data.tls\.crt}' | base64 -d | openssl x509 -noout -subject -issuer

# 检查 Ingress TLS 配置
kubectl get ingress httpbin-tls-ingress -o jsonpath='{.spec.tls}'

问题：路由不生效

# 检查 IngressClass
kubectl get ingressclass

# 检查 Ingress 状态
kubectl describe ingress httpbin-ingress

# 检查 Controller 是否识别该 Ingress
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller | grep "httpbin-ingress"

# 验证 NGINX 配置中是否包含该路由
kubectl exec -n ingress-nginx $CONTROLLER_POD -- grep -c "httpbin.example.com" /etc/nginx/nginx.conf

5.2 性能监控

5.2.1 Prometheus 监控配置

# servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ingress-nginx
  namespace: monitoring
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
  namespaceSelector:
    matchNames:
    - ingress-nginx
  endpoints:
  - port: metrics
    interval: 15s
    path: /metrics

5.2.2 关键监控指标

指标名称	类型	说明	告警阈值建议
`nginx_ingress_controller_requests`	Counter	请求总数	-
`nginx_ingress_controller_request_duration_seconds`	Histogram	请求延迟分布	P99 > 5s
`nginx_ingress_controller_response_size`	Histogram	响应体大小分布	-
`nginx_ingress_controller_nginx_process_connections`	Gauge	活跃连接数	> 80% max_connections
`nginx_ingress_controller_nginx_process_requests_total`	Counter	NGINX 处理请求总数	-
`nginx_ingress_controller_upstream_server_up`	Gauge	上游服务状态	= 0
`nginx_ingress_controller_config_hash`	Gauge	配置哈希值	频繁变化告警

5.2.3 Prometheus 告警规则

# prometheus-rules.yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: ingress-nginx-alerts
  namespace: monitoring
spec:
  groups:
  - name: ingress-nginx
    interval: 30s
    rules:
    # 高错误率告警
    - alert: IngressNginxHighErrorRate
      expr: |
        sum(rate(nginx_ingress_controller_requests{status=~"5.."}[5m])) by (ingress, namespace)
        /
        sum(rate(nginx_ingress_controller_requests[5m])) by (ingress, namespace)
        > 0.05
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Ingress {{ $labels.ingress }} 错误率过高"
        description: "命名空间 {{ $labels.namespace }} 的 Ingress {{ $labels.ingress }} 5xx 错误率达到 {{ printf \"%.2f\" $value | mul 100 }}%"

    # 高延迟告警
    - alert: IngressNginxHighLatency
      expr: |
        histogram_quantile(0.99,
          sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) by (le, ingress, namespace)
        ) > 5
      for: 5m
      labels:
        severity: warning
      annotations:
        summary: "Ingress {{ $labels.ingress }} 延迟过高"
        description: "命名空间 {{ $labels.namespace }} 的 Ingress {{ $labels.ingress }} P99 延迟达到 {{ printf \"%.2f\" $value }}s"

    # Controller 不可用告警
    - alert: IngressNginxControllerDown
      expr: |
        absent(up{job="ingress-nginx-controller"} == 1)
      for: 1m
      labels:
        severity: critical
      annotations:
        summary: "Ingress NGINX Controller 不可用"
        description: "Ingress NGINX Controller 实例无法访问"

    # 配置重载失败告警
    - alert: IngressNginxConfigReloadFailed
      expr: |
        nginx_ingress_controller_config_last_reload_successful == 0
      for: 5m
      labels:
        severity: critical
      annotations:
        summary: "Ingress NGINX 配置重载失败"
        description: "Ingress NGINX Controller 配置重载失败，请检查配置"

    # 上游服务不可用告警
    - alert: IngressNginxUpstreamDown
      expr: |
        nginx_ingress_controller_upstream_server_up == 0
      for: 2m
      labels:
        severity: warning
      annotations:
        summary: "上游服务 {{ $labels.upstream }} 不可用"
        description: "Ingress 上游服务 {{ $labels.upstream }} 已下线"

    # 证书即将过期告警
    - alert: IngressCertificateExpiringSoon
      expr: |
        nginx_ingress_controller_ssl_certificate_expiration_date_seconds_since_epoch - time() < 7 * 24 * 3600
      for: 1h
      labels:
        severity: warning
      annotations:
        summary: "SSL 证书即将过期"
        description: "Ingress {{ $labels.host }} 的 SSL 证书将在 7 天内过期"

5.2.4 Grafana Dashboard

{
  "dashboard": {
    "title": "Ingress NGINX 监控面板",
    "uid": "ingress-nginx",
    "panels": [
      {
        "title": "请求速率",
        "type": "graph",
        "targets": [
          {
            "expr": "sum(rate(nginx_ingress_controller_requests[5m])) by (ingress)",
            "legendFormat": "{{ ingress }}"
          }
        ]
      },
      {
        "title": "错误率",
        "type": "graph",
        "targets": [
          {
            "expr": "sum(rate(nginx_ingress_controller_requests{status=~\"5..\"}[5m])) by (ingress) / sum(rate(nginx_ingress_controller_requests[5m])) by (ingress) * 100",
            "legendFormat": "{{ ingress }}"
          }
        ]
      },
      {
        "title": "P50/P95/P99 延迟",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.50, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) by (le))",
            "legendFormat": "P50"
          },
          {
            "expr": "histogram_quantile(0.95, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) by (le))",
            "legendFormat": "P95"
          },
          {
            "expr": "histogram_quantile(0.99, sum(rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) by (le))",
            "legendFormat": "P99"
          }
        ]
      },
      {
        "title": "活跃连接数",
        "type": "gauge",
        "targets": [
          {
            "expr": "sum(nginx_ingress_controller_nginx_process_connections{state=\"active\"})"
          }
        ]
      }
    ]
  }
}

5.2.5 日志聚合配置

# fluent-bit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         5
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf

    [INPUT]
        Name          tail
        Tag           ingress.*
        Path          /var/log/containers/ingress-nginx-controller*.log
        Parser        docker
        Refresh_Interval 10
        Mem_Buf_Limit 5MB

    [FILTER]
        Name          parser
        Match         ingress.*
        Key_Name      log
        Parser        nginx-ingress
        Reserve_Data  On

    [OUTPUT]
        Name          es
        Match         ingress.*
        Host          elasticsearch.logging.svc.cluster.local
        Port          9200
        Index         ingress-nginx
        Type          _doc

  parsers.conf: |
    [PARSER]
        Name        nginx-ingress
        Format      regex
        Regex       ^(?<remote_addr>[^ ]*) - (?<remote_user>[^ ]*) \[(?<time_local>[^\]]*)\] "(?<request>[^"]*)" (?<status>[^ ]*) (?<body_bytes_sent>[^ ]*) "(?<http_referer>[^"]*)" "(?<http_user_agent>[^"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^\]]*)\] \[(?<proxy_alternative_upstream_name>[^\]]*)\] (?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<req_id>[^ ]*)$
        Time_Key    time_local
        Time_Format %d/%b/%Y:%H:%M:%S %z

5.3 备份与恢复

5.3.1 配置备份策略

#!/bin/bash
# backup-ingress.sh - Ingress 配置备份脚本

BACKUP_DIR="/backup/ingress/$(date +%Y%m%d_%H%M%S)"
mkdir -p $BACKUP_DIR

# 备份所有 Ingress 资源
echo "备份 Ingress 资源..."
kubectl get ingress -A -o yaml > $BACKUP_DIR/ingress-all.yaml

# 备份 TLS Secrets
echo "备份 TLS Secrets..."
for ns in $(kubectl get ingress -A -o jsonpath='{.items.metadata.namespace}' | tr ' ' '\n' | sort -u); do
  for secret in $(kubectl get ingress -n $ns -o jsonpath='{.items
.spec.tls.secretName}' | tr ' ' '\n' | sort -u); do
    if [ -n "$secret" ]; then
      kubectl get secret $secret -n $ns -o yaml > $BACKUP_DIR/secret-${ns}-${secret}.yaml
    fi
  done
done

# 备份 Ingress Controller 配置
echo "备份 Ingress Controller 配置..."
kubectl get configmap -n ingress-nginx ingress-nginx-controller -o yaml > $BACKUP_DIR/configmap-controller.yaml
kubectl get deployment -n ingress-nginx ingress-nginx-controller -o yaml > $BACKUP_DIR/deployment-controller.yaml
kubectl get service -n ingress-nginx ingress-nginx-controller -o yaml > $BACKUP_DIR/service-controller.yaml

# 备份 IngressClass
echo "备份 IngressClass..."
kubectl get ingressclass -o yaml > $BACKUP_DIR/ingressclass.yaml

# 打包备份
echo "打包备份文件..."
tar -czvf ${BACKUP_DIR}.tar.gz -C $(dirname $BACKUP_DIR) $(basename $BACKUP_DIR)
rm -rf $BACKUP_DIR

echo "备份完成: ${BACKUP_DIR}.tar.gz"

5.3.2 配置恢复流程

#!/bin/bash
# restore-ingress.sh - Ingress 配置恢复脚本

BACKUP_FILE="${1:-}"

if [ -z "$BACKUP_FILE" ]; then
  echo "用法: $0 <backup-file.tar.gz>"
  exit 1
fi

RESTORE_DIR="/tmp/ingress-restore-$$"
mkdir -p $RESTORE_DIR

# 解压备份
echo "解压备份文件..."
tar -xzvf $BACKUP_FILE -C $RESTORE_DIR

BACKUP_DIR=$(ls $RESTORE_DIR)

# 恢复 IngressClass
echo "恢复 IngressClass..."
kubectl apply -f $RESTORE_DIR/$BACKUP_DIR/ingressclass.yaml

# 恢复 Controller 配置
echo "恢复 Ingress Controller 配置..."
kubectl apply -f $RESTORE_DIR/$BACKUP_DIR/configmap-controller.yaml

# 恢复 TLS Secrets
echo "恢复 TLS Secrets..."
for secret_file in $RESTORE_DIR/$BACKUP_DIR/secret-*.yaml; do
  if [ -f "$secret_file" ]; then
    kubectl apply -f $secret_file
  fi
done

# 恢复 Ingress 资源
echo "恢复 Ingress 资源..."
kubectl apply -f $RESTORE_DIR/$BACKUP_DIR/ingress-all.yaml

# 清理临时文件
rm -rf $RESTORE_DIR

echo "恢复完成"

5.3.3 定时备份 CronJob

# backup-cronjob.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: ingress-backup
  namespace: ingress-nginx
spec:
  schedule: "0 2 * * *"  # 每天凌晨 2 点
  concurrencyPolicy: Forbid
  successfulJobsHistoryLimit: 7
  failedJobsHistoryLimit: 3
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: backup-sa
          containers:
          - name: backup
            image: bitnami/kubectl:1.31
            command:
            - /bin/bash
            - -c
            - |
              BACKUP_DIR="/backup/ingress/$(date +%Y%m%d)"
              mkdir -p $BACKUP_DIR
              kubectl get ingress -A -o yaml > $BACKUP_DIR/ingress.yaml
              kubectl get secret -A -l cert-manager.io/certificate-name -o yaml > $BACKUP_DIR/secrets.yaml
              kubectl get configmap -n ingress-nginx -o yaml > $BACKUP_DIR/configmaps.yaml
              # 上传到对象存储
              aws s3 cp --recursive $BACKUP_DIR s3://backup-bucket/ingress/$(date +%Y%m%d)/
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-storage
            emptyDir: {}
          restartPolicy: OnFailure

5.3.4 灾难恢复演练

#!/bin/bash
# dr-drill.sh - 灾难恢复演练脚本

echo "=== Ingress 灾难恢复演练 ==="
echo "警告: 此脚本将删除并重建 Ingress Controller"
read -p "确认继续? (yes/no): " confirm
if [ "$confirm" != "yes" ]; then
  echo "取消操作"
  exit 0
fi

# 记录当前状态
echo "1. 记录当前状态..."
kubectl get ingress -A > /tmp/dr-before-ingress.txt
kubectl get pods -n ingress-nginx > /tmp/dr-before-pods.txt

# 执行备份
echo "2. 执行备份..."
./backup-ingress.sh

# 模拟故障 - 删除 Controller
echo "3. 模拟故障 - 删除 Controller..."
kubectl delete deployment -n ingress-nginx ingress-nginx-controller

# 等待服务中断
echo "4. 等待 30 秒模拟故障时间..."
sleep 30

# 验证服务中断
echo "5. 验证服务状态..."
curl -s -o /dev/null -w "%{http_code}" http://httpbin.example.com/status/200 || echo "服务已中断"

# 执行恢复
echo "6. 执行恢复..."
LATEST_BACKUP=$(ls -t /backup/ingress/*.tar.gz | head -1)
./restore-ingress.sh $LATEST_BACKUP

# 重新部署 Controller
echo "7. 重新部署 Controller..."
helm upgrade --install ingress-nginx ingress-nginx/ingress-nginx \
  --namespace ingress-nginx \
  --values ingress-nginx-values.yaml \
  --wait

# 验证恢复
echo "8. 验证恢复状态..."
kubectl get pods -n ingress-nginx
kubectl get ingress -A

# 测试服务
echo "9. 测试服务可用性..."
for i in {1..10}; do
  status=$(curl -s -o /dev/null -w "%{http_code}" http://httpbin.example.com/status/200)
  echo "测试 $i: HTTP $status"
  sleep 2
done

# 对比状态
echo "10. 对比恢复前后状态..."
kubectl get ingress -A > /tmp/dr-after-ingress.txt
diff /tmp/dr-before-ingress.txt /tmp/dr-after-ingress.txt

echo "=== 灾难恢复演练完成 ==="

六、总结

Kubernetes Ingress 作为集群的统一流量入口，通过七层负载均衡能力实现了精细化的流量治理。本文从架构设计、部署配置、实战案例到运维监控，全面介绍了 Ingress 在生产环境中的应用实践。

核心要点回顾

架构设计：Ingress 采用声明式 API 定义路由规则，通过 Ingress Controller 实现配置同步。Ingress-NGINX 作为市场占有率最高的 Controller 实现，提供了丰富的注解扩展能力。
路由配置：支持基于域名（Host）和路径（Path）的多维度路由，通过 pathType 控制匹配精度。TLS 终止在 Ingress 层面统一处理，简化了后端服务配置。
流量治理：金丝雀发布通过 canary 注解实现，支持基于权重、请求头、Cookie 的流量分割。会话亲和性通过 affinity 注解配置，保证有状态应用的会话连续性。
安全加固：通过 TLS 配置、速率限制、IP 白名单、认证授权等多层防护，构建纵深防御体系。ModSecurity WAF 提供了应用层攻击防护能力。
高可用保障：多副本部署配合 Pod 反亲和性和 PDB，确保 Controller 的可用性。HPA 实现弹性扩缩容，应对流量波动。
监控运维：Prometheus 指标采集配合 Grafana 可视化，实现全链路监控。告警规则覆盖错误率、延迟、可用性等关键指标。日志分析和配置备份为故障排查和灾难恢复提供支撑。

版本演进建议

随着 Kubernetes Gateway API 逐渐成熟（截至 2025 年已达到 GA 状态），建议新项目评估 Gateway API 作为 Ingress 的替代方案。Gateway API 提供了更强的表达能力和更好的扩展性，是 Kubernetes 流量管理的未来方向。对于存量 Ingress 配置，可通过渐进式迁移策略逐步过渡。

实践建议

生产环境务必开启 HTTPS 并配置 HSTS
根据业务特点选择合适的负载均衡算法
建立完善的监控告警体系，覆盖 SLI/SLO 指标
定期进行灾难恢复演练，验证备份恢复流程
通过 GitOps 实现配置版本化管理

本文由云栈社区整理发布，期待您的交流与反馈。

附录

A. 常用注解速查表

注解	说明	默认值
`nginx.ingress.kubernetes.io/ssl-redirect`	强制 HTTPS 重定向	true
`nginx.ingress.kubernetes.io/proxy-body-size`	请求体大小限制	1m
`nginx.ingress.kubernetes.io/proxy-read-timeout`	后端读取超时	60s
`nginx.ingress.kubernetes.io/proxy-send-timeout`	后端发送超时	60s
`nginx.ingress.kubernetes.io/proxy-connect-timeout`	后端连接超时	5s
`nginx.ingress.kubernetes.io/rewrite-target`	路径重写目标	-
`nginx.ingress.kubernetes.io/use-regex`	启用正则路径匹配	false
`nginx.ingress.kubernetes.io/limit-rps`	每秒请求限制	-
`nginx.ingress.kubernetes.io/limit-connections`	连接数限制	-
`nginx.ingress.kubernetes.io/whitelist-source-range`	IP 白名单	-
`nginx.ingress.kubernetes.io/affinity`	会话亲和性类型	-
`nginx.ingress.kubernetes.io/canary`	启用金丝雀发布	false
`nginx.ingress.kubernetes.io/canary-weight`	金丝雀流量权重	0
`nginx.ingress.kubernetes.io/backend-protocol`	后端协议	HTTP
`nginx.ingress.kubernetes.io/upstream-hash-by`	负载均衡哈希键	-

B. 故障排查命令速查

# 查看 Ingress 状态
kubectl get ingress -A
kubectl describe ingress <name> -n <namespace>

# 查看 Controller 状态
kubectl get pods -n ingress-nginx -l app.kubernetes.io/component=controller
kubectl logs -n ingress-nginx -l app.kubernetes.io/component=controller -f

# 查看 NGINX 配置
kubectl exec -n ingress-nginx <controller-pod> -- cat /etc/nginx/nginx.conf
kubectl exec -n ingress-nginx <controller-pod> -- nginx -t

# 查看 Endpoints
kubectl get endpoints <service-name> -o wide

# 查看事件
kubectl get events -n ingress-nginx --sort-by='.lastTimestamp'

# 测试连通性
kubectl exec -n ingress-nginx <controller-pod> -- curl -v <backend-service>

# 查看指标
kubectl exec -n ingress-nginx <controller-pod> -- curl localhost:10254/metrics

C. 参考资源

Kubernetes Ingress 官方文档：https://kubernetes.io/docs/concepts/services-networking/ingress/
Ingress-NGINX 官方文档：https://kubernetes.github.io/ingress-nginx/
Ingress-NGINX GitHub 仓库：https://github.com/kubernetes/ingress-nginx
cert-manager 文档：https://cert-manager.io/docs/
Kubernetes Gateway API：https://gateway-api.sigs.k8s.io/

上一篇：Redis 7.0 分片发布订阅：告别集群消息风暴，真正实现水平扩展
下一篇：Anthropic Claude Code 负责人专访：AI 闭循环开发内幕、Token 消耗争议与 Agent 生态的未来

Kubernetes, Ingress, Nginx, 七层负载均衡, 灰度发布