一、 概述
为什么选择 ArgoCD + Tekton 替代 Jenkins
传统的 Jenkins 在云原生环境下常面临一些挑战,促使我们寻求更现代化的解决方案,其主要问题包括:
- 资源利用效率低:Jenkins master 节点通常需要分配大量内存(如16GB),但在大部分时间处于空闲状态,造成资源浪费。
- 插件管理复杂:庞大的插件生态带来版本冲突和安全漏洞的风险,维护成本高。
- 配置即代码不彻底:虽然 JCasC (Jenkins Configuration as Code) 提供了部分解决方案,但许多配置仍需通过界面手动调整,难以实现完全声明式管理。
- 扩展性受限:即使在 Kubernetes 环境中使用相关插件,其 Agent 的调度和资源管理也不够灵活高效。
- 用户体验待提升:Blue Ocean 项目已停止维护,默认界面对于新手而言上手曲线较陡。
GitOps 的核心原则
GitOps 不仅仅是把配置文件存入 Git 仓库。它是一种以 Git 为单一事实来源(Single Source of Truth)的运维模式,其核心原则可概括为:
┌─────────────────────────────────────────────────────────────────────────┐
│ GitOps Principles │
├─────────────────────────────────────────────────────────────────────────┤
│ │
│ 1. Declarative - 所有系统与应用的期望状态均以声明式文件定义 │
│ │
│ 2. Versioned - Git 作为唯一事实来源,所有变更可追踪、可回滚 │
│ │
│ 3. Automated - 自动同步集群状态至 Git 中定义的期望状态 │
│ │
│ 4. Reconciled - 持续比对与调和,确保实际状态始终趋向声明状态 │
│ │
└─────────────────────────────────────────────────────────────────────────┘
技术栈选型
经过评估,我们采用了以下云原生技术栈构建完整的 GitOps 流水线:
| 组件 |
选择 |
备选方案 |
选择原因 |
| CI 引擎 |
Tekton |
Drone, GitHub Actions |
云原生设计、Kubernetes 原生、扩展性强 |
| CD 引擎 |
ArgoCD |
FluxCD, Spinnaker |
社区活跃、UI 友好、功能丰富 |
| 镜像仓库 |
Harbor |
Nexus, ACR |
企业级特性,支持安全扫描与镜像签名 |
| 制品管理 |
Helm Charts OCI |
ChartMuseum |
统一使用 OCI 格式,简化管理流程 |
| 密钥管理 |
External Secrets |
Vault Agent |
与云厂商 KMS 集成良好,使用简便 |
| 策略引擎 |
Kyverno |
OPA/Gatekeeper |
策略即 YAML,学习成本低,易集成 |
环境要求
基础设施:
- Kubernetes 集群:v1.28.4+
- 存储:支持 RWX (ReadWriteMany) 的存储方案(如 Rook-Ceph)
- 网络:Cilium + Ingress-NGINX
- 证书管理:cert-manager + Let's Encrypt
组件版本:
- Tekton Pipelines: v0.56.0
- Tekton Triggers: v0.27.0
- Tekton Dashboard: v0.43.0
- ArgoCD: v2.10.1
- Harbor: v2.10.0
二、 Tekton 部署与配置
2.1 安装 Tekton 核心组件
# 安装 Tekton Pipelines
kubectl apply --filename https://storage.googleapis.com/tekton-releases/pipeline/previous/v0.56.0/release.yaml
# 安装 Tekton Triggers
kubectl apply --filename https://storage.googleapis.com/tekton-releases/triggers/previous/v0.27.0/release.yaml
kubectl apply --filename https://storage.googleapis.com/tekton-releases/triggers/previous/v0.27.0/interceptors.yaml
# 安装 Tekton Dashboard
kubectl apply --filename https://storage.googleapis.com/tekton-releases/dashboard/previous/v0.43.0/release.yaml
# 等待所有组件就绪
kubectl wait --for=condition=available --timeout=300s deployment --all -n tekton-pipelines
2.2 Tekton 配置优化
通过 ConfigMap 调整参数以优化构建性能和资源管理:
apiVersion: v1
kind: ConfigMap
metadata:
name: feature-flags
namespace: tekton-pipelines
data:
# 启用 Beta 特性
enable-api-fields: "beta"
# 增大结果数据大小限制(默认 4096 字节)
max-result-size: "10485760"
# 启用步骤级资源需求
enable-step-actions: "true"
# 为长时间运行任务优化
running-in-environment-with-injected-sidecars: "true"
# 启用亲和性协同调度
coschedule: "pipelineruns"
---
apiVersion: v1
kind: ConfigMap
metadata:
name: config-defaults
namespace: tekton-pipelines
data:
# 默认使用的 ServiceAccount
default-service-account: "tekton-build"
# 默认任务超时时间(分钟)
default-timeout-minutes: "60"
# 默认 Pod 模板
default-pod-template: |
nodeSelector:
node-role.kubernetes.io/build: "true"
tolerations:
- key: "build"
operator: "Equal"
value: "true"
effect: "NoSchedule"
securityContext:
fsGroup: 65532
---
apiVersion: v1
kind: ConfigMap
metadata:
name: config-artifact-pvc
namespace: tekton-pipelines
data:
# 为工作空间使用 PVC 存储
size: "20Gi"
storageClassName: "rook-ceph-block"
2.3 创建构建专用 ServiceAccount
为流水线任务创建具有必要权限的服务账号和密钥。
apiVersion: v1
kind: ServiceAccount
metadata:
name: tekton-build
namespace: tekton-pipelines
secrets:
- name: docker-credentials
- name: git-credentials
---
apiVersion: v1
kind: Secret
metadata:
name: docker-credentials
namespace: tekton-pipelines
annotations:
tekton.dev/docker-0: https://harbor.internal.company.com
type: kubernetes.io/basic-auth
stringData:
username: robot$tekton-builder
password: "${HARBOR_ROBOT_PASSWORD}"
---
apiVersion: v1
kind: Secret
metadata:
name: git-credentials
namespace: tekton-pipelines
annotations:
tekton.dev/git-0: https://gitlab.internal.company.com
type: kubernetes.io/basic-auth
stringData:
username: tekton-ci
password: "${GITLAB_ACCESS_TOKEN}"
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tekton-build-admin
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: tekton-pipelines-controller-cluster-access
subjects:
- kind: ServiceAccount
name: tekton-build
namespace: tekton-pipelines
三、 完整的 Pipeline 示例
以下是我们生产环境中使用的完整流水线定义,涵盖了从代码拉取、构建、测试、安全扫描到制品推送的全流程。
3.1 Task 定义
将流水线分解为可复用的 Task 模块。
# tasks/git-clone.yaml
apiVersion: tekton.dev/v1
kind: Task
metadata:
name: git-clone
namespace: tekton-pipelines
spec:
description: |
Clone a git repository to workspace
params:
- name: url
description: Repository URL
type: string
- name: revision
description: Branch, tag, or commit SHA
type: string
default: "main"
- name: depth
description: Clone depth (0 for full clone)
type: string
default: "1"
- name: submodules
description: Initialize submodules
type: string
default: "false"
workspaces:
- name: output
description: Workspace to clone into
- name: ssh-directory
optional: true
description: SSH key directory
results:
- name: commit
description: The commit SHA that was cloned
- name: url
description: The repository URL
- name: committer-date
description: The committer date
steps:
- name: clone
image: harbor.internal.company.com/library/git:2.43.0
env:
- name: HOME
value: "$(params.userHome)"
- name: PARAM_URL
value: $(params.url)
- name: PARAM_REVISION
value: $(params.revision)
- name: PARAM_DEPTH
value: $(params.depth)
- name: PARAM_SUBMODULES
value: $(params.submodules)
- name: WORKSPACE_OUTPUT_PATH
value: $(workspaces.output.path)
- name: WORKSPACE_SSH_DIRECTORY_PATH
value: $(workspaces.ssh-directory.path)
- name: WORKSPACE_SSH_DIRECTORY_BOUND
value: $(workspaces.ssh-directory.bound)
securityContext:
runAsNonRoot: true
runAsUser: 65532
script: |
#!/usr/bin/env sh
set -eu
# Configure SSH if provided
if [ "${WORKSPACE_SSH_DIRECTORY_BOUND}" = "true" ]; then
cp -R "${WORKSPACE_SSH_DIRECTORY_PATH}" "${HOME}/.ssh"
chmod 700 "${HOME}/.ssh"
chmod -R 400 "${HOME}/.ssh/*"
fi
# Clone repository
git clone --depth="${PARAM_DEPTH}" \
--branch="${PARAM_REVISION}" \
"${PARAM_URL}" \
"${WORKSPACE_OUTPUT_PATH}/source"
# Handle submodules
if [ "${PARAM_SUBMODULES}" = "true" ]; then
cd "${WORKSPACE_OUTPUT_PATH}/source"
git submodule update --init --recursive
fi
# Output results
cd "${WORKSPACE_OUTPUT_PATH}/source"
git rev-parse HEAD > $(results.commit.path)
echo "${PARAM_URL}" > $(results.url.path)
git log -1 --format=%cI > $(results.committer-date.path)
# tasks/kaniko-build.yaml
apiVersion: tekton.dev/v1
kind: Task
metadata:
name: kaniko-build
namespace: tekton-pipelines
spec:
description: |
Build and push container image using Kaniko
params:
- name: image
description: Full image reference including registry
type: string
- name: dockerfile
description: Path to Dockerfile
type: string
default: "./Dockerfile"
- name: context
description: Build context directory
type: string
default: "."
- name: build-args
description: Build arguments
type: array
default: []
- name: cache-repo
description: Cache repository for layers
type: string
default: ""
workspaces:
- name: source
description: Source code workspace
- name: dockerconfig
description: Docker config for registry auth
optional: true
results:
- name: IMAGE_DIGEST
description: Digest of the built image
- name: IMAGE_URL
description: Full URL of the built image
steps:
- name: build-and-push
image: harbor.internal.company.com/library/kaniko-executor:v1.21.1
args:
- --dockerfile=$(params.dockerfile)
- --context=$(workspaces.source.path)/source/$(params.context)
- --destination=$(params.image)
- --digest-file=$(results.IMAGE_DIGEST.path)
- --cache=true
- --cache-repo=$(params.cache-repo)
- --snapshot-mode=redo
- --use-new-run
- --compressed-caching=false
- --cleanup
- $(params.build-args)
env:
- name: DOCKER_CONFIG
value: /kaniko/.docker
securityContext:
runAsUser: 0
volumeMounts:
- name: docker-config
mountPath: /kaniko/.docker
- name: write-url
image: harbor.internal.company.com/library/bash:5.2
script: |
#!/bin/bash
set -e
DIGEST=$(cat $(results.IMAGE_DIGEST.path))
echo -n "$(params.image)@${DIGEST}" > $(results.IMAGE_URL.path)
volumes:
- name: docker-config
secret:
secretName: docker-credentials
items:
- key: .dockerconfigjson
path: config.json
# tasks/trivy-scan.yaml
apiVersion: tekton.dev/v1
kind: Task
metadata:
name: trivy-scan
namespace: tekton-pipelines
spec:
description: |
Scan container image for vulnerabilities using Trivy
params:
- name: image
description: Image to scan
type: string
- name: severity
description: Severity levels to report
type: string
default: "HIGH,CRITICAL"
- name: ignore-unfixed
description: Ignore unfixed vulnerabilities
type: string
default: "true"
- name: exit-code
description: Exit code on vulnerability found
type: string
default: "1"
results:
- name: scan-result
description: Scan summary
steps:
- name: scan
image: harbor.internal.company.com/library/trivy:0.48.3
env:
- name: TRIVY_NO_PROGRESS
value: "true"
- name: TRIVY_CACHE_DIR
value: "/tmp/trivy-cache"
- name: DOCKER_CONFIG
value: "/kaniko/.docker"
args:
- image
- --severity=$(params.severity)
- --ignore-unfixed=$(params.ignore-unfixed)
- --exit-code=$(params.exit-code)
- --format=table
- --output=/tmp/scan-report.txt
- $(params.image)
volumeMounts:
- name: docker-config
mountPath: /kaniko/.docker
- name: trivy-cache
mountPath: /tmp/trivy-cache
securityContext:
runAsNonRoot: true
runAsUser: 65532
- name: report
image: harbor.internal.company.com/library/bash:5.2
script: |
#!/bin/bash
if [ -f /tmp/scan-report.txt ]; then
cat /tmp/scan-report.txt
echo "Scan completed" > $(results.scan-result.path)
else
echo "No vulnerabilities found" > $(results.scan-result.path)
fi
volumes:
- name: docker-config
secret:
secretName: docker-credentials
items:
- key: .dockerconfigjson
path: config.json
- name: trivy-cache
emptyDir: {}
# tasks/helm-package.yaml
apiVersion: tekton.dev/v1
kind: Task
metadata:
name: helm-package
namespace: tekton-pipelines
spec:
description: |
Package Helm chart and push to OCI registry
params:
- name: chart-path
description: Path to Helm chart
type: string
- name: registry
description: OCI registry URL
type: string
- name: version
description: Chart version
type: string
workspaces:
- name: source
description: Source code workspace
results:
- name: chart-url
description: Full OCI URL of the packaged chart
steps:
- name: package-and-push
image: harbor.internal.company.com/library/helm:3.14.0
env:
- name: HELM_REGISTRY_CONFIG
value: /home/helm/.config/helm/registry/config.json
script: |
#!/bin/bash
set -e
CHART_PATH="$(workspaces.source.path)/source/$(params.chart-path)"
# Update chart version
sed -i "s/^version:.*/version: $(params.version)/" "${CHART_PATH}/Chart.yaml"
# Update dependencies
helm dependency update "${CHART_PATH}"
# Package chart
helm package "${CHART_PATH}" -d /tmp/charts
# Login to registry
echo "${HELM_PASSWORD}" | helm registry login $(params.registry) \
--username "${HELM_USERNAME}" \
--password-stdin
# Push to OCI registry
CHART_FILE=$(ls /tmp/charts/*.tgz)
helm push "${CHART_FILE}" oci://$(params.registry)/charts
# Output chart URL
CHART_NAME=$(basename "${CHART_FILE}" .tgz)
echo "oci://$(params.registry)/charts/${CHART_NAME}" > $(results.chart-url.path)
envFrom:
- secretRef:
name: helm-credentials
securityContext:
runAsNonRoot: true
runAsUser: 65532
3.2 Pipeline 定义
组合各个 Task,定义完整的 CI/CD 工作流。
# pipelines/build-and-deploy.yaml
apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
name: build-and-deploy
namespace: tekton-pipelines
spec:
description: |
Complete CI pipeline: clone -> build -> scan -> package -> deploy
params:
- name: git-url
description: Git repository URL
type: string
- name: git-revision
description: Git revision to build
type: string
default: "main"
- name: image-name
description: Container image name
type: string
- name: image-tag
description: Container image tag
type: string
- name: chart-path
description: Path to Helm chart in repository
type: string
default: "deploy/helm"
- name: environment
description: Target environment
type: string
default: "dev"
workspaces:
- name: shared-workspace
description: Shared workspace for all tasks
- name: docker-config
description: Docker registry credentials
- name: ssh-creds
optional: true
description: SSH credentials for git
results:
- name: image-digest
description: Digest of built image
value: $(tasks.build.results.IMAGE_DIGEST)
- name: image-url
description: Full image URL with digest
value: $(tasks.build.results.IMAGE_URL)
- name: commit-sha
description: Git commit SHA
value: $(tasks.clone.results.commit)
tasks:
- name: clone
taskRef:
name: git-clone
params:
- name: url
value: $(params.git-url)
- name: revision
value: $(params.git-revision)
- name: depth
value: "0"
workspaces:
- name: output
workspace: shared-workspace
- name: ssh-directory
workspace: ssh-creds
- name: unit-test
runAfter:
- clone
taskSpec:
workspaces:
- name: source
steps:
- name: test
image: harbor.internal.company.com/library/golang:1.22
workingDir: $(workspaces.source.path)/source
script: |
#!/bin/bash
set -e
go mod download
go test -v -race -coverprofile=coverage.out ./...
go tool cover -func=coverage.out
env:
- name: GOPROXY
value: "https://goproxy.io,direct"
- name: GOFLAGS
value: "-mod=readonly"
workspaces:
- name: source
workspace: shared-workspace
- name: lint
runAfter:
- clone
taskSpec:
workspaces:
- name: source
steps:
- name: golangci-lint
image: harbor.internal.company.com/library/golangci-lint:v1.55.2
workingDir: $(workspaces.source.path)/source
script: |
#!/bin/bash
golangci-lint run --timeout 5m --out-format=colored-line-number
workspaces:
- name: source
workspace: shared-workspace
- name: build
runAfter:
- unit-test
- lint
taskRef:
name: kaniko-build
params:
- name: image
value: harbor.internal.company.com/$(params.image-name):$(params.image-tag)
- name: dockerfile
value: "./Dockerfile"
- name: context
value: "."
- name: cache-repo
value: harbor.internal.company.com/cache/$(params.image-name)
- name: build-args
value:
- "--build-arg=VERSION=$(params.image-tag)"
- "--build-arg=COMMIT=$(tasks.clone.results.commit)"
workspaces:
- name: source
workspace: shared-workspace
- name: dockerconfig
workspace: docker-config
- name: scan
runAfter:
- build
taskRef:
name: trivy-scan
params:
- name: image
value: harbor.internal.company.com/$(params.image-name):$(params.image-tag)
- name: severity
value: "HIGH,CRITICAL"
- name: exit-code
value: "1"
- name: helm-package
runAfter:
- scan
taskRef:
name: helm-package
params:
- name: chart-path
value: $(params.chart-path)
- name: registry
value: harbor.internal.company.com
- name: version
value: $(params.image-tag)
workspaces:
- name: source
workspace: shared-workspace
- name: update-manifest
runAfter:
- helm-package
taskSpec:
params:
- name: image-tag
type: string
- name: image-name
type: string
- name: environment
type: string
steps:
- name: update-values
image: harbor.internal.company.com/library/git:2.43.0
env:
- name: GIT_TOKEN
valueFrom:
secretKeyRef:
name: git-credentials
key: password
script: |
#!/bin/bash
set -e
# Clone GitOps repository
git clone https://tekton-ci:${GIT_TOKEN}@gitlab.internal.company.com/platform/gitops-manifests.git /tmp/gitops
cd /tmp/gitops
# Update image tag in values file
ENV_VALUES="environments/$(params.environment)/$(params.image-name)/values.yaml"
yq e -i ".image.tag = \"$(params.image-tag)\"" "${ENV_VALUES}"
yq e -i ".image.digest = \"$(tasks.build.results.IMAGE_DIGEST)\"" "${ENV_VALUES}"
# Commit and push
git config user.email "tekton@company.com"
git config user.name "Tekton CI"
git add .
git commit -m "chore($(params.environment)): update $(params.image-name) to $(params.image-tag)
Image: harbor.internal.company.com/$(params.image-name):$(params.image-tag)
Digest: $(tasks.build.results.IMAGE_DIGEST)
Commit: $(tasks.clone.results.commit)"
git push origin main
params:
- name: image-tag
value: $(params.image-tag)
- name: image-name
value: $(params.image-name)
- name: environment
value: $(params.environment)
finally:
- name: notify
taskSpec:
params:
- name: status
type: string
- name: image-name
type: string
- name: image-tag
type: string
steps:
- name: send-notification
image: harbor.internal.company.com/library/curl:8.5.0
script: |
#!/bin/bash
STATUS="${PIPELINE_STATUS:-unknown}"
# Send to Slack
curl -X POST "${SLACK_WEBHOOK_URL}" \
-H "Content-Type: application/json" \
-d "{
\"text\": \"Pipeline ${STATUS}\",
\"blocks\": [
{
\"type\": \"section\",
\"text\": {
\"type\": \"mrkdwn\",
\"text\": \"*Pipeline ${STATUS}*\n*Image:* $(params.image-name):$(params.image-tag)\"
}
}
]
}"
env:
- name: SLACK_WEBHOOK_URL
valueFrom:
secretKeyRef:
name: slack-webhook
key: url
- name: PIPELINE_STATUS
value: "$(tasks.status)"
params:
- name: status
value: "$(tasks.status)"
- name: image-name
value: $(params.image-name)
- name: image-tag
value: $(params.image-tag)
3.3 Trigger 配置
配置 Tekton Triggers,实现 GitLab Webhook 自动触发流水线。
# triggers/gitlab-push-trigger.yaml
apiVersion: triggers.tekton.dev/v1beta1
kind: TriggerTemplate
metadata:
name: gitlab-push-template
namespace: tekton-pipelines
spec:
params:
- name: git-url
description: Git repository URL
- name: git-revision
description: Git revision (commit SHA)
- name: git-branch
description: Git branch name
- name: project-name
description: GitLab project name
- name: commit-message
description: Commit message
resourcetemplates:
- apiVersion: tekton.dev/v1
kind: PipelineRun
metadata:
generateName: build-$(tt.params.project-name)-
namespace: tekton-pipelines
labels:
tekton.dev/pipeline: build-and-deploy
app.kubernetes.io/project: $(tt.params.project-name)
spec:
pipelineRef:
name: build-and-deploy
params:
- name: git-url
value: $(tt.params.git-url)
- name: git-revision
value: $(tt.params.git-revision)
- name: image-name
value: apps/$(tt.params.project-name)
- name: image-tag
value: $(tt.params.git-revision)
- name: chart-path
value: deploy/helm
- name: environment
value: dev
workspaces:
- name: shared-workspace
volumeClaimTemplate:
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: rook-ceph-block
- name: docker-config
secret:
secretName: docker-credentials
taskRunSpecs:
- pipelineTaskName: build
podTemplate:
nodeSelector:
node-role.kubernetes.io/build: "true"
tolerations:
- key: "build"
operator: "Equal"
value: "true"
effect: "NoSchedule"
---
apiVersion: triggers.tekton.dev/v1beta1
kind: TriggerBinding
metadata:
name: gitlab-push-binding
namespace: tekton-pipelines
spec:
params:
- name: git-url
value: $(body.project.git_http_url)
- name: git-revision
value: $(body.checkout_sha)
- name: git-branch
value: $(body.ref)
- name: project-name
value: $(body.project.name)
- name: commit-message
value: $(body.commits[0].message)
---
apiVersion: triggers.tekton.dev/v1beta1
kind: EventListener
metadata:
name: gitlab-listener
namespace: tekton-pipelines
spec:
serviceAccountName: tekton-triggers-sa
triggers:
- name: gitlab-push
interceptors:
- ref:
name: gitlab
params:
- name: secretRef
value:
secretName: gitlab-webhook-secret
secretKey: token
- name: eventTypes
value:
- Push Hook
- ref:
name: cel
params:
- name: filter
value: |
body.ref.startsWith('refs/heads/main') ||
body.ref.startsWith('refs/heads/release/')
- name: overlays
value:
- key: branch
expression: "body.ref.split('/')[2]"
bindings:
- ref: gitlab-push-binding
template:
ref: gitlab-push-template
resources:
kubernetesResource:
spec:
template:
spec:
containers:
- resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "256Mi"
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: tekton-triggers-sa
namespace: tekton-pipelines
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: tekton-triggers-binding
namespace: tekton-pipelines
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: tekton-triggers-eventlistener-roles
subjects:
- kind: ServiceAccount
name: tekton-triggers-sa
namespace: tekton-pipelines
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: tekton-triggers-clusterbinding
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: tekton-triggers-eventlistener-clusterroles
subjects:
- kind: ServiceAccount
name: tekton-triggers-sa
namespace: tekton-pipelines
四、 ArgoCD 部署与配置
4.1 安装 ArgoCD
# 创建命名空间
kubectl create namespace argocd
# 安装 ArgoCD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/v2.10.1/manifests/install.yaml
# 等待所有组件就绪
kubectl wait --for=condition=available --timeout=300s deployment --all -n argocd
4.2 ArgoCD 配置优化
通过 ConfigMap 对 ArgoCD 进行个性化配置,以更好地集成到现有环境中。
# argocd-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cm
namespace: argocd
data:
# 应用控制器设置
timeout.reconciliation: 180s
# 仓库设置
repositories: |
- url: https://gitlab.internal.company.com/platform/gitops-manifests.git
type: git
passwordSecret:
name: repo-creds
key: password
usernameSecret:
name: repo-creds
key: username
- url: harbor.internal.company.com
type: helm
name: harbor
enableOCI: "true"
passwordSecret:
name: harbor-creds
key: password
usernameSecret:
name: harbor-creds
key: username
# OIDC 配置
oidc.config: |
name: Keycloak
issuer: https://sso.internal.company.com/realms/company
clientID: argocd
clientSecret: $oidc.keycloak.clientSecret
requestedScopes:
- openid
- profile
- email
- groups
# 资源健康检查自定义
resource.customizations: |
networking.k8s.io/Ingress:
health.lua: |
hs = {}
hs.status = "Healthy"
return hs
argoproj.io/Rollout:
health.lua: |
function checkReplicasStatus(obj)
hs = {}
replicasCount = getNumberValueOrDefault(obj.spec.replicas, 1)
replicasStatus = getNumberValueOrDefault(obj.status.replicas, 0)
updatedReplicas = getNumberValueOrDefault(obj.status.updatedReplicas, 0)
availableReplicas = getNumberValueOrDefault(obj.status.availableReplicas, 0)
if updatedReplicas < replicasCount then
hs.status = "Progressing"
hs.message = "Waiting for rollout to finish: updated replicas " .. updatedReplicas .. "/" .. replicasCount
return hs
end
if availableReplicas < updatedReplicas then
hs.status = "Progressing"
hs.message = "Waiting for replicas to become available"
return hs
end
hs.status = "Healthy"
return hs
end
hs = checkReplicasStatus(obj)
return hs
# Kustomize 构建选项
kustomize.buildOptions: --enable-helm --load-restrictor LoadRestrictionsNone
# 启用应用状态徽章
statusbadge.enabled: "true"
---
# argocd-cmd-params-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-cmd-params-cm
namespace: argocd
data:
# 控制器设置
controller.status.processors: "50"
controller.operation.processors: "25"
controller.self.heal.timeout.seconds: "5"
controller.repo.server.timeout.seconds: "180"
# 仓库服务器设置
reposerver.parallelism.limit: "10"
# 服务器设置
server.insecure: "true" # TLS 在 Ingress 层终止
# Redis 设置
redis.server: argocd-redis-ha-haproxy:6379
---
# argocd-rbac-cm.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-rbac-cm
namespace: argocd
data:
policy.default: role:readonly
policy.csv: |
# 平台团队 - 管理员角色
g, platform-team, role:admin
# 开发者角色 - 可同步和查看
p, role:developer, applications, get, */*, allow
p, role:developer, applications, sync, */*, allow
p, role:developer, logs, get, */*, allow
p, role:developer, exec, create, */*, allow
g, developers, role:developer
# 查看者角色
p, role:viewer, applications, get, */*, allow
p, role:viewer, logs, get, */*, allow
g, viewers, role:viewer
scopes: '[groups, email]'
4.3 高可用配置
对于生产环境,建议启用高可用配置以提升可靠性和性能。
# argocd-ha-values.yaml (Helm 安装使用)
controller:
replicas: 2
env:
- name: ARGOCD_CONTROLLER_REPLICAS
value: "2"
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: monitoring
server:
replicas: 3
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
targetCPUUtilizationPercentage: 80
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "1"
memory: "1Gi"
ingress:
enabled: true
ingressClassName: nginx
hosts:
- argocd.internal.company.com
tls:
- secretName: argocd-tls
hosts:
- argocd.internal.company.com
annotations:
nginx.ingress.kubernetes.io/ssl-passthrough: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
metrics:
enabled: true
serviceMonitor:
enabled: true
namespace: monitoring
repoServer:
replicas: 3
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 10
resources:
requests:
cpu: "200m"
memory: "256Mi"
limits:
cpu: "2"
memory: "2Gi"
metrics:
enabled: true
serviceMonitor:
enabled: true
redis:
enabled: false
redis-ha:
enabled: true
haproxy:
enabled: true
replicas: 3
redis:
replicas: 3
resources:
requests:
cpu: "200m"
memory: "256Mi"
applicationSet:
replicas: 2
resources:
requests:
cpu: "100m"
memory: "128Mi"
notifications:
enabled: true
secret:
create: true
cm:
create: true
4.4 ApplicationSet 多环境部署
使用 ArgoCD ApplicationSet 批量管理跨多个环境和服务的应用部署。
# applicationsets/multi-env-apps.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: microservices
namespace: argocd
spec:
generators:
- matrix:
generators:
# 第一个生成器:服务列表
- list:
elements:
- service: order-service
path: apps/order-service
- service: inventory-service
path: apps/inventory-service
- service: payment-service
path: apps/payment-service
- service: user-service
path: apps/user-service
# 第二个生成器:环境列表
- list:
elements:
- env: dev
cluster: https://kubernetes.default.svc
namespace: dev
syncPolicy: automated
- env: staging
cluster: https://kubernetes.default.svc
namespace: staging
syncPolicy: automated
- env: production
cluster: https://prod-cluster.internal:6443
namespace: production
syncPolicy: manual
template:
metadata:
name: '{{service}}-{{env}}'
labels:
app: '{{service}}'
env: '{{env}}'
spec:
project: default
source:
repoURL: https://gitlab.internal.company.com/platform/gitops-manifests.git
targetRevision: main
path: '{{path}}/overlays/{{env}}'
destination:
server: '{{cluster}}'
namespace: '{{namespace}}'
syncPolicy:
automated:
prune: true
selfHeal: true
allowEmpty: false
syncOptions:
- CreateNamespace=true
- PrunePropagationPolicy=foreground
- PruneLast=true
retry:
limit: 5
backoff:
duration: 5s
factor: 2
maxDuration: 3m
ignoreDifferences:
- group: apps
kind: Deployment
jsonPointers:
- /spec/replicas
- group: autoscaling
kind: HorizontalPodAutoscaler
jsonPointers:
- /spec/minReplicas
- /spec/maxReplicas
---
# applicationsets/cluster-addons.yaml
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: cluster-addons
namespace: argocd
spec:
generators:
- clusters:
selector:
matchLabels:
env: production
template:
metadata:
name: 'addons-{{name}}'
spec:
project: platform
source:
repoURL: https://gitlab.internal.company.com/platform/cluster-addons.git
targetRevision: main
path: addons
destination:
server: '{{server}}'
namespace: kube-system
syncPolicy:
automated:
prune: true
selfHeal: true
五、 GitOps 工作流实践
5.1 目录结构设计
一个清晰的 GitOps 仓库结构是成功实施的关键。我们推荐如下结构:
gitops-manifests/
├── apps/ # 应用定义
│ ├── order-service/
│ │ ├── base/ # 基础配置,所有环境通用
│ │ │ ├── deployment.yaml
│ │ │ ├── service.yaml
│ │ │ ├── configmap.yaml
│ │ │ └── kustomization.yaml
│ │ └── overlays/ # 环境覆盖配置
│ │ ├── dev/
│ │ │ ├── kustomization.yaml
│ │ │ ├── values.yaml
│ │ │ └── patches/
│ │ │ └── replica-count.yaml
│ │ ├── staging/
│ │ │ ├── kustomization.yaml
│ │ │ ├── values.yaml
│ │ │ └── patches/
│ │ │ └── resources.yaml
│ │ └── production/
│ │ ├── kustomization.yaml
│ │ ├── values.yaml
│ │ └── patches/
│ │ ├── resources.yaml
│ │ └── hpa.yaml
│ └── inventory-service/
│ └── ...
├── environments/ # 环境特定的值文件(供 Tekton 更新)
│ ├── dev/
│ │ └── order-service/
│ │ └── values.yaml
│ ├── staging/
│ │ └── order-service/
│ │ └── values.yaml
│ └── production/
│ └── order-service/
│ └── values.yaml
├── infrastructure/ # 集群基础设施(CNI, Ingress, 监控等)
│ ├── cert-manager/
│ ├── ingress-nginx/
│ ├── monitoring/
│ └── external-secrets/
├── policies/ # Kyverno 策略
│ ├── require-labels.yaml
│ ├── restrict-registries.yaml
│ └── require-probes.yaml
└── projects/ # ArgoCD 项目定义
├── platform.yaml
└── applications.yaml
5.2 Kustomize 配置示例
利用 Kustomize 实现配置的复用和环境差异化。
# apps/order-service/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- deployment.yaml
- service.yaml
- configmap.yaml
- serviceaccount.yaml
commonLabels:
app.kubernetes.io/name: order-service
app.kubernetes.io/component: backend
configMapGenerator:
- name: order-service-config
files:
- config.yaml
# apps/order-service/overlays/production/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
namespace: production
resources:
- ../../base
patches:
- path: patches/resources.yaml
- path: patches/hpa.yaml
replicas:
- name: order-service
count: 3
images:
- name: order-service
newName: harbor.internal.company.com/apps/order-service
newTag: "v1.5.2"
configMapGenerator:
- name: order-service-config
behavior: merge
literals:
- LOG_LEVEL=info
- METRICS_ENABLED=true
# apps/order-service/overlays/production/patches/resources.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: order-service
spec:
template:
spec:
containers:
- name: order-service
resources:
requests:
cpu: "500m"
memory: "512Mi"
limits:
cpu: "2"
memory: "2Gi"
# apps/order-service/overlays/production/patches/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: order-service
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: order-service
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
5.3 渐进式发布(Progressive Delivery)
集成 Argo Rollouts 实现金丝雀发布,降低生产环境发布风险。如果你正在构建微服务架构,了解如何实现平滑的云原生/IaaS部署至关重要。
# apps/order-service/base/rollout.yaml
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: order-service
spec:
replicas: 3
revisionHistoryLimit: 5
selector:
matchLabels:
app: order-service
template:
metadata:
labels:
app: order-service
spec:
containers:
- name: order-service
image: harbor.internal.company.com/apps/order-service:latest
ports:
- containerPort: 8080
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 15
periodSeconds: 20
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
strategy:
canary:
maxSurge: "25%"
maxUnavailable: 0
steps:
- setWeight: 5
- pause: { duration: 2m }
- setWeight: 20
- pause: { duration: 5m }
- setWeight: 50
- pause: { duration: 10m }
- setWeight: 80
- pause: { duration: 5m }
analysis:
templates:
- templateName: success-rate
startingStep: 1
args:
- name: service-name
value: order-service
trafficRouting:
nginx:
stableIngress: order-service-stable
annotationPrefix: nginx.ingress.kubernetes.io
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 1m
successCondition: result[0] >= 0.95
failureLimit: 3
provider:
prometheus:
address: http://prometheus.monitoring.svc:9090
query: |
sum(rate(http_requests_total{service="{{args.service-name}}", status=~"2.."}[5m]))
/
sum(rate(http_requests_total{service="{{args.service-name}}"}[5m]))
- name: latency-p99
interval: 1m
successCondition: result[0] < 500
failureLimit: 3
provider:
prometheus:
address: http://prometheus.monitoring.svc:9090
query: |
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket{service="{{args.service-name}}"}[5m])) by (le)) * 1000
六、 故障排查与监控
6.1 Tekton 排查命令
掌握以下命令有助于快速定位流水线问题。
# 列出所有 PipelineRun
tkn pipelinerun list
# 查看特定 PipelineRun 详情
tkn pipelinerun describe <pipelinerun-name>
# 查看 PipelineRun 日志
tkn pipelinerun logs <pipelinerun-name> -f
# 列出所有 TaskRun
tkn taskrun list
# 查看 PipelineRun 中特定任务的日志
tkn pipelinerun logs <pipelinerun-name> -t <task-name>
# 取消正在运行的 PipelineRun
tkn pipelinerun cancel <pipelinerun-name>
# 删除已完成的 PipelineRun(保留最近5个)
tkn pipelinerun delete --keep 5
# 检查 EventListener 日志
kubectl logs -l app.kubernetes.io/part-of=tekton-triggers -n tekton-pipelines
# 调试 Webhook 问题
kubectl logs -l eventlistener=gitlab-listener -n tekton-pipelines
6.2 ArgoCD 排查命令
# 获取 ArgoCD 初始管理员密码
kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d
# 使用 CLI 登录 ArgoCD
argocd login argocd.internal.company.com --username admin
# 列出所有应用
argocd app list
# 获取应用详情
argocd app get <app-name>
# 显示应用与 Git 的差异
argocd app diff <app-name>
# 强制同步应用
argocd app sync <app-name> --force
# 回滚到上一个版本
argocd app rollback <app-name>
# 查看应用同步历史
argocd app history <app-name>
# 检查仓库连接状态
argocd repo list
argocd repo get <repo-url>
# 调试同步问题(干跑模式)
argocd app sync <app-name> --dry-run
# 查看控制器日志
kubectl logs -l app.kubernetes.io/name=argocd-application-controller -n argocd
# 查看仓库服务器日志
kubectl logs -l app.kubernetes.io/name=argocd-repo-server -n argocd
6.3 监控配置
为 CI/CD 系统配置 Prometheus 告警规则,实现主动监控。
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: cicd-alerts
namespace: monitoring
spec:
groups:
- name: tekton
rules:
- alert: TektonPipelineRunFailed
expr: |
sum(tekton_pipelinerun_count{status="failed"}) by (pipeline, namespace) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "Tekton pipeline run failed"
description: "Pipeline {{ $labels.pipeline }} in {{ $labels.namespace }} has failed"
- alert: TektonPipelineRunStuck
expr: |
time() - tekton_pipelinerun_start_time{status="running"} > 3600
for: 5m
labels:
severity: warning
annotations:
summary: "Tekton pipeline run stuck"
description: "Pipeline run has been running for more than 1 hour"
- name: argocd
rules:
- alert: ArgoCDApplicationOutOfSync
expr: |
argocd_app_info{sync_status!="Synced"} == 1
for: 30m
labels:
severity: warning
annotations:
summary: "ArgoCD application out of sync"
description: "Application {{ $labels.name }} is out of sync for 30 minutes"
- alert: ArgoCDApplicationDegraded
expr: |
argocd_app_info{health_status="Degraded"} == 1
for: 5m
labels:
severity: critical
annotations:
summary: "ArgoCD application degraded"
description: "Application {{ $labels.name }} is in degraded state"
- alert: ArgoCDSyncFailed
expr: |
increase(argocd_app_sync_total{phase="Failed"}[10m]) > 0
for: 1m
labels:
severity: warning
annotations:
summary: "ArgoCD sync failed"
description: "Application {{ $labels.name }} sync failed"
七、 最佳实践与注意事项
7.1 安全加固
使用 Kyverno 策略引擎实施安全策略。
# Kyverno policy: 限制镜像仓库来源
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: restrict-image-registries
spec:
validationFailureAction: Enforce
background: true
rules:
- name: validate-registries
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Images must come from approved registries"
pattern:
spec:
containers:
- image: "harbor.internal.company.com/*"
initContainers:
- image: "harbor.internal.company.com/*"
---
# Kyverno policy: 要求安全上下文
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-security-context
spec:
validationFailureAction: Enforce
rules:
- name: require-run-as-non-root
match:
any:
- resources:
kinds:
- Pod
validate:
message: "Containers must run as non-root"
pattern:
spec:
securityContext:
runAsNonRoot: true
containers:
- securityContext:
runAsNonRoot: true
allowPrivilegeEscalation: false
7.2 性能优化
Tekton 优化建议:
- 使用 PVC 持久化缓存 Maven、npm 等依赖,加速构建。
- 启用 Kaniko 的缓存层功能,减少镜像层重复构建。
- 为长时间运行的 Task 设置合理的超时时间。
- 使用 Matrix 策略并行执行单元测试或集成测试。
ArgoCD 优化建议:
- 根据仓库数量和复杂度,适当增加
repo-server 的副本数。
- 生产环境配置 Redis HA 模式,提高状态存储的可靠性。
- 使用
ApplicationSet 管理大量类似应用,减少手动配置。
- 根据应用变更频率,合理设置
sync 间隔,避免频繁无意义的同步。
# Tekton PVC cache for dependencies
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: maven-cache
namespace: tekton-pipelines
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
storageClassName: rook-cephfs
---
# 在 Pipeline 中使用缓存 PVC
workspaces:
- name: maven-cache
persistentVolumeClaim:
claimName: maven-cache
7.3 常见问题处理
问题1:Tekton Task 一直处于 Pending 状态
# 检查 Pod 状态
kubectl get pods -n tekton-pipelines -l tekton.dev/pipelineRun=<name>
kubectl describe pod <pod-name> -n tekton-pipelines
# 常见原因:
# 1. 没有符合 nodeSelector 的可用节点。
# 2. PVC 未绑定 (Pending)。
# 3. 镜像拉取失败。
问题2:ArgoCD 应用同步持续失败
# 检查同步状态详情
argocd app get <app-name> -o yaml
# 常见原因:
# 1. 清单文件 YAML 格式错误。
# 2. 集群中缺少所需的 CRD。
# 3. ArgoCD 服务账号权限不足。
# 本地验证清单
kustomize build apps/my-app/overlays/production | kubectl apply --dry-run=client -f -
问题3:Git Webhook 未触发流水线
# 检查 EventListener 日志
kubectl logs -l eventlistener=gitlab-listener -n tekton-pipelines
# 手动测试 Webhook
curl -X POST http://el-gitlab-listener.tekton-pipelines.svc:8080 \
-H "Content-Type: application/json" \
-H "X-Gitlab-Event: Push Hook" \
-d @sample-webhook.json
八、 总结
迁移成果
从传统的 Jenkins 迁移到基于 ArgoCD + Tekton 的云原生 GitOps 流水线后,我们获得了显著的收益:
- 构建效率:平均构建时间从 8 分钟降低至 4 分钟,这得益于更优的缓存策略和任务并行化。
- 资源利用率:从固定占用 16GB 内存的 Jenkins Master,转变为按需调度、用后即焚的 Tekton Task Pod,资源利用率大幅提升。
- 系统可靠性:故障恢复从依赖人工干预转变为基于声明式配置的自动重试与自愈。
- 审计与追踪:所有基础设施与应用变更均通过 Git 提交记录,实现了完整的、不可篡改的审计追踪。
- 部署频率:部署频率从每周 2-3 次提升至每天 10 次以上,真正实现了持续部署。
经验教训
- GitOps 仓库设计至关重要:初期将所有应用置于单一仓库,后期在权限管理和变更隔离上遇到挑战。建议按团队、项目或应用类型划分仓库。
- Tekton Task 模块化设计:避免编写庞大、复杂的 Task。应将其拆分为细粒度、可复用的单元,再通过 Pipeline 进行组合编排。
- 谨慎设置 ArgoCD 同步策略:尤其在生产环境中,慎用自动
Prune(清理)功能,以防止误删除关键资源。建议生产环境采用手动或评审后同步。
- 完善的监控告警体系:必须对 Pipeline 失败、Application 状态异常(OutOfSync, Degraded)等关键事件配置及时告警,这是保障运维/DevOps流程稳定性的基础。
- 采用渐进式迁移策略:我们花费了约3个月时间逐步迁移所有 Jenkins Job,而非一次性切换。这降低了风险,并让团队有时间适应新的工作流。
进阶学习方向
- Tekton Chains:为构建产物提供来源证明和签名,增强供应链安全。
- Crossplane:将基础设施(数据库、消息队列等)也通过声明式 API 进行管理,实现真正意义上的 Everything as Code。
- Argo Events:构建基于事件的自动化工作流,响应更广泛的外部事件。
- Backstage:建立开发者门户,统一管理服务目录、文档和部署入口,提升开发者体验。
参考资料
- Tekton 官方文档
- ArgoCD 官方文档
- CNCF GitOps 工作组最佳实践白皮书
- Argo Rollouts 官方文档
附录
命令速查表
# Tekton CLI (tkn)
tkn pipeline list
tkn pipeline start <pipeline-name>
tkn pipelinerun list
tkn pipelinerun logs <name> -f
tkn task list
tkn taskrun list
# ArgoCD CLI
argocd login <server>
argocd app list
argocd app get <app-name>
argocd app sync <app-name>
argocd app history <app-name>
argocd app rollback <app-name> <revision>
argocd repo list
argocd cluster list
# Kustomize
kustomize build <path>
kustomize build <path> | kubectl apply -f -
# Helm
helm template <chart> -f values.yaml
helm upgrade --install <release> <chart>
术语表
| 术语 |
说明 |
| Pipeline |
Tekton 中定义的一系列有序 Task,构成完整的工作流。 |
| Task |
Tekton 中可复用的最小执行单元,如代码克隆、镜像构建。 |
| TaskRun |
Task 的一次具体执行实例。 |
| PipelineRun |
Pipeline 的一次具体执行实例。 |
| Trigger |
用于自动启动 PipelineRun 的事件机制(如 Git Push)。 |
| Application |
ArgoCD 中管理的一个部署单元,指向 Git 仓库中的配置清单。 |
| ApplicationSet |
ArgoCD 用于批量生成和管理多个 Application 的控制器。 |
| Sync |
ArgoCD 将集群的实际状态同步到 Git 中声明的期望状态的过程。 |
| Prune |
删除那些存在于集群但已不在 Git 配置中定义的资源。 |
| Self-Heal |
ArgoCD 自动检测并修复集群状态与 Git 声明之间差异的能力。 |