找回密码
立即注册
搜索
热搜: Java Python Linux Go
发回帖 发新帖

5059

积分

0

好友

678

主题
发表于 6 小时前 | 查看: 3| 回复: 0

一、概述

为什么选择 KubeVirt

在做技术选型的时候,我们评估过几个方案:

方案 优点 缺点 最终评估
OpenStack 成熟稳定,社区活跃 架构复杂,运维成本高 排除
Proxmox VE 开源免费,界面友好 企业级功能欠缺 备选
oVirt Red Hat 支持 社区萎缩,未来不明 排除
KubeVirt 云原生,统一管理 相对年轻,学习曲线陡 最终选择

最终选择 KubeVirt 的核心原因:

  1. 统一技术栈:我们的容器平台已经在用 Kubernetes,虚机和容器统一管理能显著降低运维复杂度
  2. 渐进式迁移:可以在同一个集群里同时跑容器和虚机,业务可以逐步迁移
  3. 避免厂商锁定:纯开源方案,不想再被任何一个厂商卡脖子
  4. 成本控制:VMware 续费报价涨了 3 倍,这个价格根本不可能接受

KubeVirt 技术架构

简单说一下 KubeVirt 的核心组件:

┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                        │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ virt-api    │  │virt-controller│ │ virt-handler (DaemonSet)│
│  │ (Deployment)│  │ (Deployment) │  │ (per node)          │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────────┐│
│  │                    libvirt + QEMU/KVM                   ││
│  └─────────────────────────────────────────────────────────┘│
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────────┐│
│  │              Node (Linux with KVM support)              ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘
  • virt-api:提供 KubeVirt API 入口,处理 VirtualMachine 等 CRD 的请求
  • virt-controller:监听 VM 相关资源的变化,负责创建对应的 VMI(VirtualMachineInstance)
  • virt-handler:DaemonSet 形式部署在每个节点,负责实际的虚机生命周期管理
  • virt-launcher:每个虚机对应一个 Pod,里面运行 libvirt 和 QEMU 进程

环境要求

这是我们生产环境的配置,供参考:

硬件配置

  • 计算节点:Dell PowerEdge R750xs,双路 Intel Xeon Gold 6348(28 核 56 线程),512GB DDR4 ECC,2 x 1.92TB NVMe SSD(本地缓存)
  • 存储:Dell PowerStore 5200T,100TB 可用容量,iSCSI 连接
  • 网络:Mellanox ConnectX-6 Dx 100GbE 双端口网卡,配置 bond

软件版本

  • 操作系统:Rocky Linux 8.9
  • Kubernetes:1.28.4(通过 kubeadm 部署)
  • KubeVirt:1.1.1
  • CDI(Containerized Data Importer):1.58.1
  • 存储:Longhorn 1.5.3(后来换成了 Rook-Ceph 1.12.9)
  • 网络:Multus + OVN-Kubernetes

节点规模

  • Master 节点:3 台
  • 计算节点:42 台(其中 18 台专门跑虚机负载)

二、详细步骤

2.1 集群准备

检查硬件虚拟化支持

在每个计算节点上执行:

# Check if CPU supports virtualization
cat /proc/cpuinfo | grep -E "(vmx|svm)"

# Check if KVM module is loaded
lsmod | grep kvm

# If not loaded, load it manually
modprobe kvm
modprobe kvm_intel  # For Intel CPU
# modprobe kvm_amd  # For AMD CPU

# Make it persistent
echo "kvm" >> /etc/modules-load.d/kvm.conf
echo "kvm_intel" >> /etc/modules-load.d/kvm.conf

节点标签和污点配置

我们把虚机负载和容器负载隔离开,避免资源争抢:

# Label nodes for VM workloads
kubectl label node node-vm-{01..18} node-role.kubernetes.io/virtualization=true
kubectl label node node-vm-{01..18} kubevirt.io/schedulable=true

# Add taint to prevent regular pods from scheduling
kubectl taint node node-vm-{01..18} virtualization=true:NoSchedule

部署 KubeVirt Operator

# Set KubeVirt version
export KUBEVIRT_VERSION=v1.1.1

# Deploy the KubeVirt operator
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${KUBEVIRT_VERSION}/kubevirt-operator.yaml

# Wait for operator to be ready
kubectl wait --for=condition=available --timeout=300s deployment/virt-operator -n kubevirt

# Create KubeVirt CR to deploy the components
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${KUBEVIRT_VERSION}/kubevirt-cr.yaml

# Verify all components are running
kubectl get pods -n kubevirt

期望看到的输出:

NAME                                       READY   STATUS    RESTARTS   AGE
virt-api-7d5c9b8c8b-4x7k9         1/1     Running   0          5m
virt-api-7d5c9b8c8b-8j2m3         1/1     Running   0          5m
virt-controller-6c7d8f9b7-2k4n5   1/1     Running   0          5m
virt-controller-6c7d8f9b7-9x8m2   1/1     Running   0          5m
virt-handler-4k2j8                1/1     Running   0          4m
virt-handler-7m3n9                1/1     Running   0          4m
... (每个节点一个 virt-handler)

KubeVirt 配置优化

这是我们生产环境用的配置,针对大规模虚机做了专门优化:

apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  name: kubevirt
  namespace: kubevirt
spec:
  certificateRotateStrategy: {}
  configuration:
    developerConfiguration:
      featureGates:
      - LiveMigration
      - HotplugVolumes
      - HotplugNICs
      - Snapshot
      - VMExport
      - ExpandDisks
      - GPU
      - HostDevices
      - Macvtap
      - Passt
    migrations:
      parallelMigrationsPerCluster: 10
      parallelOutboundMigrationsPerNode: 4
      bandwidthPerMigration: 1Gi
      completionTimeoutPerGiB: 800
      progressTimeout: 300
      allowAutoConverge: true
      allowPostCopy: true
    network:
      defaultNetworkInterface: bridge
      permitBridgeInterfaceOnPodNetwork: true
      permitSlirpInterface: false
    smbios:
      manufacturer: "KubeVirt"
      product: "None"
      version: "1.1.1"
    supportedGuestAgentVersions:
    - "4.*"
    - "5.*"
    permittedHostDevices:
      pciHostDevices:
      - pciVendorSelector: "10DE:1EB8"
        resourceName: "nvidia.com/T4"
        externalResourceProvider: true
  customizeComponents:
    patches:
    - resourceType: Deployment
      resourceName: virt-controller
      patch: '{"spec":{"replicas":3}}'
      type: strategic
    - resourceType: Deployment
      resourceName: virt-api
      patch: '{"spec":{"replicas":3}}'
      type: strategic
  imagePullPolicy: IfNotPresent
  workloadUpdateStrategy:
    workloadUpdateMethods:
    - LiveMigrate

2.2 存储配置

存储是整个迁移过程中最头疼的部分。我们最开始用的是 Longhorn,跑了两个月发现性能扛不住,后来切换到了 Rook-Ceph。

Rook-Ceph 集群部署

# Clone Rook repository
git clone --single-branch --branch v1.12.9 https://github.com/rook/rook.git
cd rook/deploy/examples

# Create Rook operator
kubectl create -f crds.yaml -f common.yaml -f operator.yaml

# Wait for operator to be ready
kubectl -n rook-ceph wait --for=condition=available --timeout=600s deployment/rook-ceph-operator

Ceph 集群配置文件(cluster.yaml):

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  dataDirHostPath: /var/lib/rook
  cephVersion:
    image: quay.io/ceph/ceph:v18.2.1
    allowUnsupported: false
  mon:
    count: 3
    allowMultiplePerNode: false
    volumeClaimTemplate:
      spec:
        storageClassName: local-storage
        resources:
          requests:
            storage: 50Gi
  mgr:
    count: 2
    allowMultiplePerNode: false
    modules:
    - name: pg_autoscaler
      enabled: true
    - name: rook
      enabled: true
    - name: prometheus
      enabled: true
  dashboard:
    enabled: true
    ssl: true
  crashCollector:
    disable: false
  storage:
    useAllNodes: false
    useAllDevices: false
    config:
      osdsPerDevice: "1"
      encryptedDevice: "false"
    nodes:
    - name: "storage-node-01"
      devices:
      - name: "nvme0n1"
      - name: "nvme1n1"
      - name: "nvme2n1"
      - name: "nvme3n1"
    - name: "storage-node-02"
      devices:
      - name: "nvme0n1"
      - name: "nvme1n1"
      - name: "nvme2n1"
      - name: "nvme3n1"
    - name: "storage-node-03"
      devices:
      - name: "nvme0n1"
      - name: "nvme1n1"
      - name: "nvme2n1"
      - name: "nvme3n1"
  resources:
    mgr:
      limits:
        cpu: "2"
        memory: "2Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    mon:
      limits:
        cpu: "2"
        memory: "2Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    osd:
      limits:
        cpu: "4"
        memory: "8Gi"
      requests:
        cpu: "2"
        memory: "4Gi"
  priorityClassNames:
    mon: system-node-critical
    osd: system-node-critical
    mgr: system-cluster-critical
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0

创建 RBD StorageClass 供虚机使用:

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool-vm
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
    requireSafeReplicaSize: true
  parameters:
    compression_mode: aggressive
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block-vm
  provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: replicapool-vm
  imageFormat: "2"
  imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate

2.3 CDI 部署和镜像导入

CDI(Containerized Data Importer)负责虚机镜像的导入和管理。

# Deploy CDI
export CDI_VERSION=v1.58.1
kubectl apply -f https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VERSION}/cdi-operator.yaml
kubectl apply -f https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VERSION}/cdi-cr.yaml

# Wait for CDI to be ready
kubectl wait --for=condition=available --timeout=300s deployment/cdi-deployment -n cdi

CDI 配置优化:

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
  name: cdi
spec:
  config:
    uploadProxyURLOverride: "https://cdi-uploadproxy.kubevirt.svc:443"
    scratchSpaceStorageClass: "rook-ceph-block-vm"
    podResourceRequirements:
      limits:
        cpu: "4"
        memory: "4Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    filesystemOverhead:
      global: "0.1"
    preallocation: true
    honorWaitForFirstConsumer: true
    importProxy:
      HTTPProxy: ""
      HTTPSProxy: ""
      noProxy: "*.cluster.local,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
  workload:
    nodeSelector:
      node-role.kubernetes.io/virtualization: "true"

2.4 网络配置

网络配置是另一个复杂的部分。我们需要虚机能够使用原来 VMware 环境的 VLAN 网络,这就需要用到 Multus 和 OVN。

安装 Multus CNI

# Deploy Multus
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/v4.0.2/deployments/multus-daemonset-thick.yml

# Verify Multus is running
kubectl get pods -n kube-system -l app=multus

配置 OVN-Kubernetes

我们使用 OVN-Kubernetes 来实现虚机的二层网络。这里贴一下核心配置:

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: vlan100-production
  namespace: vm-production
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "vlan100-production",
      "type": "ovn-k8s-cni-overlay",
      "topology": "localnet",
      "netAttachDefName": "vm-production/vlan100-production",
      "vlanID": 100,
      "mtu": 9000,
      "subnets": "10.100.0.0/16"
    }
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: vlan200-database
  namespace: vm-production
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "vlan200-database",
      "type": "ovn-k8s-cni-overlay",
      "topology": "localnet",
      "netAttachDefName": "vm-production/vlan200-database",
      "vlanID": 200,
      "mtu": 9000,
      "subnets": "10.200.0.0/16"
    }

在 OVS 上配置对应的 bridge mapping:

# On each node, configure OVS bridge mapping
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="physnet1:br-ex,physnet-vlan100:br-vlan100,physnet-vlan200:br-vlan200"

# Create VLAN bridges
ovs-vsctl add-br br-vlan100
ovs-vsctl add-br br-vlan200

# Add physical interface to bridges
ovs-vsctl add-port br-vlan100 ens2f0.100 tag=100
ovs-vsctl add-port br-vlan200 ens2f0.200 tag=200

三、VMware 虚机迁移实操

这是本文的重点。迁移 2000 台虚机绝对不是简单的导出导入,我们开发了一套完整的迁移工具和流程。

3.1 迁移工具选型

我们评估了几种迁移方法:

方法 优点 缺点 适用场景
virt-v2v 官方工具,功能全面 速度慢,需要关机 小规模迁移
手动导出 VMDK 转 qcow2 简单直接 效率低,无法批量 测试验证
MTV (Migration Toolkit) 自动化程度高 需要 OpenShift OpenShift 用户
自研脚本 灵活可控 开发成本 大规模定制化迁移

最终我们选择了 virt-v2v + 自研批量脚本 的组合。

3.2 迁移前准备

VMware 环境信息收集

先收集所有虚机的信息,我写了个脚本来导出:

#!/usr/bin/env python3
# vm_inventory_export.py
# Export VMware VM inventory for migration planning

from pyVim.connect import SmartConnect, Disconnect
from pyVmomi import vim
import ssl
import csv
import argparse
from datetime import datetime

def get_vm_info(vm):
    """Extract detailed VM information"""
    summary = vm.summary
    config = vm.config

    # Get disk information
    disks = []
    total_disk_size = 0
    for device in config.hardware.device:
        if isinstance(device, vim.vm.device.VirtualDisk):
            disk_size_gb = device.capacityInKB / 1024 / 1024
            disks.append({
                'label': device.deviceInfo.label,
                'size_gb': round(disk_size_gb, 2),
                'thin': getattr(device.backing, 'thinProvisioned', False)
            })
            total_disk_size += disk_size_gb

    # Get network information
    networks = []
    for device in config.hardware.device:
        if isinstance(device, vim.vm.device.VirtualEthernetCard):
            network_name = ""
            if hasattr(device.backing, 'network'):
                network_name = device.backing.network.name if device.backing.network else ""
            elif hasattr(device.backing, 'port'):
                network_name = device.backing.port.portgroupKey
            networks.append({
                'label': device.deviceInfo.label,
                'network': network_name,
                'mac': device.macAddress
            })

    return {
        'name': summary.config.name,
        'power_state': summary.runtime.powerState,
        'cpu': summary.config.numCpu,
        'memory_gb': summary.config.memorySizeMB / 1024,
        'guest_os': summary.config.guestFullName,
        'guest_id': summary.config.guestId,
        'vmware_tools': summary.guest.toolsStatus if summary.guest else 'unknown',
        'ip_address': summary.guest.ipAddress if summary.guest else '',
        'hostname': summary.guest.hostName if summary.guest else '',
        'total_disk_gb': round(total_disk_size, 2),
        'disks': disks,
        'networks': networks,
        'folder': get_folder_path(vm),
        'resource_pool': vm.resourcePool.name if vm.resourcePool else '',
        'cluster': get_cluster_name(vm),
        'datastore': summary.config.vmPathName.split()[0].strip('[]'),
        'uuid': summary.config.instanceUuid,
        'annotation': config.annotation if config.annotation else ''
    }

def get_folder_path(vm):
    """Get full folder path of VM"""
    path = []
    parent = vm.parent
    while parent:
        if hasattr(parent, 'name'):
            path.insert(0, parent.name)
        parent = getattr(parent, 'parent', None)
    return '/'.join(path)

def get_cluster_name(vm):
    """Get cluster name where VM resides"""
    host = vm.runtime.host
    if host and host.parent:
        return host.parent.name
    return ''

def main():
    parser = argparse.ArgumentParser(description='Export VMware VM inventory')
    parser.add_argument('--host', required=True, help='vCenter hostname')
    parser.add_argument('--user', required=True, help='vCenter username')
    parser.add_argument('--password', required=True, help='vCenter password')
    parser.add_argument('--output', default='vm_inventory.csv', help='Output CSV file')
    args = parser.parse_args()

    # Disable SSL certificate verification (for lab environments)
    context = ssl.create_default_context()
    context.check_hostname = False
    context.verify_mode = ssl.CERT_NONE

    # Connect to vCenter
    si = SmartConnect(host=args.host, user=args.user, pwd=args.password, sslContext=context)

    try:
        content = si.RetrieveContent()
        container = content.viewManager.CreateContainerView(
            content.rootFolder, [vim.VirtualMachine], True
        )

        vms = []
        for vm in container.view:
            try:
                vm_info = get_vm_info(vm)
                vms.append(vm_info)
                print(f"Collected: {vm_info['name']}")
            except Exception as e:
                print(f"Error collecting {vm.name}: {str(e)}")

        container.Destroy()

        # Write to CSV
        with open(args.output, 'w', newline='', encoding='utf-8') as f:
            fieldnames = ['name', 'power_state', 'cpu', 'memory_gb', 'total_disk_gb',
                         'guest_os', 'guest_id', 'ip_address', 'hostname', 'folder',
                         'cluster', 'datastore', 'uuid', 'vmware_tools', 'networks',
                         'disks', 'annotation']
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            for vm in vms:
                vm['networks'] = str(vm['networks'])
                vm['disks'] = str(vm['disks'])
                writer.writerow(vm)

        print(f"\nExported {len(vms)} VMs to {args.output}")

    finally:
        Disconnect(si)

if __name__ == '__main__':
    main()

运行后生成的 CSV 包含了所有虚机的详细信息,我们据此制定迁移计划。

虚机分类和优先级

根据收集到的信息,我们把虚机分成几类:

# migration_priority.yaml
priority_1_low_risk:
  criteria:
  - power_state: poweredOff
  - no_shared_disks: true
  - disk_size: <100GB
  estimated_count: 342
  migration_method: batch_offline

priority_2_stateless:
  criteria:
  - tags: ["web-frontend", "api-server"]
  - can_recreate: true
  estimated_count: 567
  migration_method: recreate_from_template

priority_3_standard:
  criteria:
  - power_state: poweredOn
  - disk_size: 100GB-500GB
  - sla: standard
  estimated_count: 845
  migration_method: live_migration_with_downtime

priority_4_critical:
  criteria:
  - tags: ["database", "critical-app"]
  - sla: high
  estimated_count: 198
  migration_method: carefully_planned_window

priority_5_complex:
  criteria:
  - shared_disks: true
  - raw_device_mapping: true
  - gpu_passthrough: true
  estimated_count: 48
  migration_method: manual_with_verification

3.3 批量迁移脚本

这是我们实际用的迁移脚本核心部分:

#!/bin/bash
# vm_migration.sh
# Batch migration script for VMware to KubeVirt

set -euo pipefail

# Configuration
VCENTER_HOST="vcenter.internal.company.com"
VCENTER_USER="administrator@vsphere.local"
VCENTER_PASSWORD="${VCENTER_PASSWORD:?VCENTER_PASSWORD not set}"
ESXI_HOSTS=("esxi01.internal" "esxi02.internal" "esxi03.internal")
NFS_EXPORT_PATH="/mnt/migration-staging"
K8S_NAMESPACE="vm-production"
STORAGE_CLASS="rook-ceph-block-vm"
PARALLEL_JOBS=4
LOG_DIR="/var/log/vm-migration"

# Color output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log_info() { echo -e "${GREEN}[INFO]${NC}$(date '+%Y-%m-%d %H:%M:%S')$1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC}$(date '+%Y-%m-%d %H:%M:%S')$1"; }
log_error() { echo -e "${RED}[ERROR]${NC}$(date '+%Y-%m-%d %H:%M:%S')$1"; }

# Create log directory
mkdir -p "${LOG_DIR}"

# Export VM from VMware using virt-v2v
export_vm() {
  local vm_name=$1
  local esxi_host=$2
  local output_dir="${NFS_EXPORT_PATH}/${vm_name}"

  log_info "Starting export of ${vm_name} from ${esxi_host}"

  mkdir -p "${output_dir}"

  # Run virt-v2v to convert VMware VM to KVM format
  virt-v2v -ic "vpx://${VCENTER_USER}@${VCENTER_HOST}/${esxi_host}?no_verify=1" \
    "${vm_name}" \
    -o local -os "${output_dir}" \
    -of qcow2 \
    --password-file <(echo "${VCENTER_PASSWORD}") \
    2>&1 | tee "${LOG_DIR}/${vm_name}_export.log"

  if [ $? -eq 0 ]; then
    log_info "Successfully exported ${vm_name}"
    return 0
  else
    log_error "Failed to export ${vm_name}"
    return 1
  fi
}

# Generate KubeVirt VM manifest
generate_vm_manifest() {
  local vm_name=$1
  local cpu=$2
  local memory_gb=$3
  local disk_path=$4
  local network_name=$5
  local mac_address=$6

  local manifest_file="${NFS_EXPORT_PATH}/${vm_name}/${vm_name}-vm.yaml"

  cat > "${manifest_file}" << EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: ${vm_name}
  namespace: ${K8S_NAMESPACE}
  labels:
    app: ${vm_name}
    migration-source: vmware
    migration-date: $(date +%Y-%m-%d)
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: ${vm_name}
    spec:
      nodeSelector:
        node-role.kubernetes.io/virtualization: "true"
      tolerations:
        - key: "virtualization"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
      domain:
        cpu:
          cores: ${cpu}
          sockets: 1
          threads: 1
        memory:
          guest: ${memory_gb}Gi
        resources:
          requests:
            memory: ${memory_gb}Gi
          limits:
            memory: $((memory_gb + 1))Gi
        devices:
          disks:
            - name: rootdisk
              disk:
                bus: virtio
              bootOrder: 1
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
            - name: default
              masquerade: {}
            - name: production-net
              bridge: {}
              macAddress: "${mac_address}"
          networkInterfaceMultiqueue: true
          rng: {}
        machine:
          type: q35
        features:
          acpi: {}
          smm:
            enabled: true
        firmware:
          bootloader:
            efi:
              secureBoot: false
      networks:
        - name: default
          pod: {}
        - name: production-net
          multus:
            networkName: ${network_name}
      terminationGracePeriodSeconds: 180
      volumes:
        - name: rootdisk
          dataVolume:
            name: ${vm_name}-rootdisk
        - name: cloudinitdisk
          cloudInitNoCloud:
            userData: |
              #cloud-config
              preserve_hostname: true
              manage_etc_hosts: false
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ${vm_name}-rootdisk
  namespace: ${K8S_NAMESPACE}
spec:
  source:
    upload: {}
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: $(get_disk_size "${disk_path}")
    storageClassName: ${STORAGE_CLASS}
EOF

  log_info "Generated manifest: ${manifest_file}"
}

# Get disk size from qcow2 file
get_disk_size() {
  local disk_path=$1
  local size_bytes=$(qemu-img info --output json "${disk_path}" | jq '.["virtual-size"]')
  local size_gb=$((size_bytes / 1024 / 1024 / 1024))
  # Add 10% buffer
  echo "$((size_gb * 110 / 100))Gi"
}

# Upload disk image to DataVolume
upload_disk_image() {
  local vm_name=$1
  local disk_path=$2

  log_info "Uploading disk image for ${vm_name}"

  # Wait for DataVolume to be ready for upload
  kubectl wait --for=condition=UploadReady datavolume/${vm_name}-rootdisk \
    -n ${K8S_NAMESPACE} --timeout=300s

  # Get upload token
  local upload_url=$(kubectl get dv ${vm_name}-rootdisk -n ${K8S_NAMESPACE} \
    -o jsonpath='{.status.uploadProxy}')

  # Upload using virtctl
  virtctl image-upload dv ${vm_name}-rootdisk \
    --namespace=${K8S_NAMESPACE} \
    --image-path="${disk_path}" \
    --insecure \
    --uploadproxy-url="https://cdi-uploadproxy.cdi.svc:443" \
    2>&1 | tee -a "${LOG_DIR}/${vm_name}_upload.log"

  if [ $? -eq 0 ]; then
    log_info "Successfully uploaded disk for ${vm_name}"
    return 0
  else
    log_error "Failed to upload disk for ${vm_name}"
    return 1
  fi
}

# Full migration workflow for a single VM
migrate_single_vm() {
  local vm_name=$1
  local vm_config=$2

  # Parse VM configuration
  local cpu=$(echo "${vm_config}" | jq -r '.cpu')
  local memory=$(echo "${vm_config}" | jq -r '.memory_gb')
  local esxi_host=$(echo "${vm_config}" | jq -r '.esxi_host')
  local network=$(echo "${vm_config}" | jq -r '.network')
  local mac=$(echo "${vm_config}" | jq -r '.mac_address')

  log_info "Starting migration of ${vm_name}"

  # Step 1: Export from VMware
  if ! export_vm "${vm_name}" "${esxi_host}"; then
    return 1
  fi

  # Step 2: Find the converted disk
  local disk_path=$(find "${NFS_EXPORT_PATH}/${vm_name}" -name "*.qcow2" | head -1)
  if [ -z "${disk_path}" ]; then
    log_error "No qcow2 disk found for ${vm_name}"
    return 1
  fi

  # Step 3: Generate manifest
  generate_vm_manifest "${vm_name}" "${cpu}" "${memory}" "${disk_path}" "${network}" "${mac}"

  # Step 4: Apply manifest
  kubectl apply -f "${NFS_EXPORT_PATH}/${vm_name}/${vm_name}-vm.yaml"

  # Step 5: Upload disk
  if ! upload_disk_image "${vm_name}" "${disk_path}"; then
    return 1
  fi

  # Step 6: Verify VM
  kubectl wait --for=condition=Ready vm/${vm_name} -n ${K8S_NAMESPACE} --timeout=600s

  log_info "Migration completed for ${vm_name}"
  return 0
}

# Batch migration with parallel execution
batch_migrate() {
  local vm_list_file=$1

  log_info "Starting batch migration from ${vm_list_file}"

  # Read VM list and run migrations in parallel
  cat "${vm_list_file}" | while read line; do
    local vm_name=$(echo "${line}" | jq -r '.name')
    echo "${line}" | migrate_single_vm "${vm_name}" &

    # Control parallel jobs
    while [ $(jobs -r | wc -l) -ge ${PARALLEL_JOBS} ]; do
      sleep 10
    done
  done

  # Wait for all jobs to complete
  wait

  log_info "Batch migration completed"
}

# Main entry point
main() {
  case "${1:-}" in
    single)
      shift
      migrate_single_vm "$@"
      ;;
    batch)
      shift
      batch_migrate "$@"
      ;;
    *)
      echo "Usage: $0 {single|batch} [args...]"
      exit 1
      ;;
  esac
}

main "$@"

3.4 Windows 虚机迁移特殊处理

Windows 虚机的迁移比 Linux 复杂得多,主要问题在于驱动。VMware 用的是 VMware Tools 里的 PVSCSI 和 VMXNET3 驱动,迁移到 KubeVirt 后需要换成 VirtIO 驱动。

迁移前需要在 Windows 虚机里安装 VirtIO 驱动:

# Download VirtIO drivers ISO
# https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso

# Mount ISO and install drivers
# Install via Device Manager or use these PowerShell commands:

# Install VirtIO storage driver
pnputil.exe /add-driver E:\vioscsi\w10\amd64\*.inf /install

# Install VirtIO network driver
pnputil.exe /add-driver E:\NetKVM\w10\amd64\*.inf /install

# Install VirtIO balloon driver
pnputil.exe /add-driver E:\Balloon\w10\amd64\*.inf /install

# Install QEMU guest agent
msiexec.exe /i E:\guest-agent\qemu-ga-x86_64.msi /qn

# Verify drivers are installed
Get-WindowsDriver -Online | Where-Object { $_.ProviderName -like "*Red Hat*" }

迁移后的 Windows 虚机配置:

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: windows-server-2022
  namespace: vm-production
spec:
  running: true
  template:
    spec:
      domain:
        clock:
          timer:
            hpet:
              present: false
            hyperv: {}
            pit:
              tickPolicy: delay
            rtc:
              tickPolicy: catchup
            utc: {}
        cpu:
          cores: 4
          sockets: 1
          threads: 2
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: sata
            name: rootdisk
          inputs:
          - bus: usb
            name: tablet
            type: tablet
          interfaces:
          - masquerade: {}
            model: e1000e
            name: default
          tpm: {}
        features:
          acpi: {}
          apic: {}
          hyperv:
            frequencies: {}
            ipi: {}
            relaxed: {}
            reset: {}
            runtime: {}
            spinlocks:
              spinlocks: 8191
            synic: {}
            synictimer:
              direct: {}
            tlbflush: {}
            vapic: {}
            vpindex: {}
          smm:
            enabled: true
        firmware:
          bootloader:
            efi:
              secureBoot: true
        machine:
          type: q35
        memory:
          guest: 8Gi
        resources:
          requests:
            memory: 8Gi
      networks:
      - name: default
        pod: {}
      volumes:
      - dataVolume:
          name: windows-server-2022-rootdisk
        name: rootdisk

四、最佳实践和注意事项

4.1 性能优化

CPU 优化

# Enable CPU pinning for latency-sensitive workloads
spec:
  template:
    spec:
      domain:
        cpu:
          cores: 4
          sockets: 1
          threads: 1
          dedicatedCpuPlacement: true
          isolateEmulatorThread: true
          model: host-passthrough
          numa:
            guestMappingPassthrough: {}

我们实测,开启 CPU 绑定后,数据库虚机的 P99 延迟降低了 40%。

内存优化

# Enable hugepages for memory-intensive workloads
spec:
  template:
    spec:
      domain:
        memory:
          guest: 32Gi
          hugepages:
            pageSize: 1Gi

使用大页内存需要在节点上预先配置:

# Configure 1GB hugepages on node
echo 34 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

# Make it persistent
cat >> /etc/sysctl.conf << EOF
vm.nr_hugepages = 34
EOF

# Label nodes with hugepages
kubectl label node node-vm-01 kubevirt.io/hugepages-1Gi=true

存储 IO 优化

# Tune IO for database workloads
spec:
  template:
    spec:
      domain:
        devices:
          disks:
          - name: datadisk
            disk:
              bus: virtio
            io: native
            cache: none
            dedicatedIOThread: true

对于 SSD 存储,使用 cache: none 配合 io: native 可以获得最佳的随机 IO 性能。我们的 MySQL 虚机在这个配置下,IOPS 达到了 85000(4K 随机读)。

4.2 高可用配置

虚机反亲和性

spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: mysql-cluster
            topologyKey: kubernetes.io/hostname

实时迁移配置

# Global migration settings in KubeVirt CR
spec:
  configuration:
    migrations:
      parallelMigrationsPerCluster: 10
      parallelOutboundMigrationsPerNode: 4
      bandwidthPerMigration: 1Gi
      completionTimeoutPerGiB: 800
      progressTimeout: 300
      allowAutoConverge: true
      allowPostCopy: false  # Disable post-copy for safety

迁移带宽要根据网络情况调整。我们是 100Gbps 网络,设置 1Gi 带宽可以在不影响生产流量的情况下快速完成迁移。

4.3 安全加固

启用 SELinux

确保节点上 SELinux 是 enforcing 模式:

# Check SELinux status
getenforce

# Set to enforcing
setenforce 1

# Make permanent
sed -i 's/SELINUX=permissive/SELINUX=enforcing/' /etc/selinux/config

网络策略

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: vm-production-isolation
  namespace: vm-production
spec:
  podSelector:
    matchLabels:
      kubevirt.io/domain: database-server
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 3306
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 3306
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53

资源限制

apiVersion: v1
kind: ResourceQuota
metadata:
  name: vm-production-quota
  namespace: vm-production
spec:
  hard:
    requests.cpu: "200"
    requests.memory: "800Gi"
    limits.cpu: "400"
    limits.memory: "1Ti"
    persistentvolumeclaims: "100"
    requests.storage: "50Ti"

4.4 常见错误和解决方案

问题1:虚机启动失败,报错 "Guest agent is not connected"

原因:QEMU Guest Agent 未安装或服务未启动。

解决:

# Linux
yum install qemu-guest-agent -y
systemctl enable --now qemu-guest-agent

# Windows
# Install qemu-ga from virtio-win ISO
# Start service
sc start QEMU-GA

问题2:网络不通,虚机无法获取 IP

原因:Multus 配置错误或 VLAN 未正确透传。

排查步骤:

# Check Multus pod logs
kubectl logs -n kube-system -l app=multus

# Check network attachment
kubectl get net-attach-def -n vm-production

# Enter virt-launcher pod to check
kubectl exec -it virt-launcher-myvm-xxx -n vm-production -- ip addr
kubectl exec -it virt-launcher-myvm-xxx -n vm-production -- bridge link

问题3:磁盘性能差

原因:使用了错误的缓存策略或 IO 模式。

优化:

# Change from default to optimized settings
devices:
  disks:
  - name: rootdisk
    disk:
      bus: virtio
    cache: none  # Was: writethrough
    io: native   # Was: threads

问题4:实时迁移失败

常见原因和解决:

# 1. Check source/destination node connectivity
virtctl migrate myvm -n vm-production

# 2. Check migration status
kubectl get vmim -n vm-production

# 3. Common fixes:
# - Increase migration bandwidth
# - Enable allowAutoConverge for busy VMs
# - Check storage is accessible from both nodes

五、故障排查和监控

5.1 日志查看

# KubeVirt operator logs
kubectl logs -n kubevirt -l kubevirt.io=virt-operator

# virt-controller logs (VM scheduling issues)
kubectl logs -n kubevirt -l kubevirt.io=virt-controller

# virt-handler logs (VM lifecycle on node)
kubectl logs -n kubevirt -l kubevirt.io=virt-handler --all-containers

# virt-launcher logs (specific VM)
kubectl logs -n vm-production virt-launcher-myvm-xxxxx -c compute

# libvirt logs inside virt-launcher
kubectl exec -n vm-production virt-launcher-myvm-xxxxx -- cat /var/log/libvirt/qemu/vm-production_myvm.log

5.2 常用排查命令

# Check VM status
kubectl get vm,vmi -n vm-production

# Describe VM for events
kubectl describe vm myvm -n vm-production

# Check VMI conditions
kubectl get vmi myvm -n vm-production -o jsonpath='{.status.conditions}'

# Enter VM console
virtctl console myvm -n vm-production

# SSH to VM (if SSH is configured)
virtctl ssh user@myvm -n vm-production

# VNC access
virtctl vnc myvm -n vm-production

# Stop/Start VM
virtctl stop myvm -n vm-production
virtctl start myvm -n vm-production

# Restart VM
virtctl restart myvm -n vm-production

# Force stop (like pulling power cord)
virtctl stop myvm -n vm-production --grace-period=0 --force

5.3 监控配置

我们使用 Prometheus + Grafana 监控 KubeVirt。

ServiceMonitor 配置:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubevirt
  namespace: monitoring
spec:
  namespaceSelector:
    matchNames:
    - kubevirt
  selector:
    matchLabels:
      prometheus.kubevirt.io: "true"
  endpoints:
  - port: metrics
    interval: 15s
    scrapeTimeout: 10s

关键监控指标:

# VM CPU usage
kubevirt_vmi_cpu_system_usage_seconds_total
kubevirt_vmi_cpu_user_usage_seconds_total

# VM memory usage
kubevirt_vmi_memory_available_bytes
kubevirt_vmi_memory_used_bytes

# VM network IO
kubevirt_vmi_network_receive_bytes_total
kubevirt_vmi_network_transmit_bytes_total

# VM disk IO
kubevirt_vmi_storage_read_traffic_bytes_total
kubevirt_vmi_storage_write_traffic_bytes_total
kubevirt_vmi_storage_iops_read_total
kubevirt_vmi_storage_iops_write_total

# Migration metrics
kubevirt_vmi_migration_data_processed_bytes
kubevirt_vmi_migration_data_remaining_bytes
kubevirt_vmi_migration_phase_transition_time_seconds

Grafana Dashboard 配置就不贴了,可以直接导入社区的 Dashboard ID: 11748。

5.4 备份恢复

使用 KubeVirt 的快照功能进行备份:

# Create snapshot
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
  name: myvm-snapshot-20241219
  namespace: vm-production
spec:
  source:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: myvm
---
# Restore from snapshot
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineRestore
metadata:
  name: myvm-restore
  namespace: vm-production
spec:
  target:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: myvm
  virtualMachineSnapshotName: myvm-snapshot-20241219

定时备份脚本:

#!/bin/bash
# vm_backup.sh - Automated VM snapshot script

NAMESPACE="vm-production"
RETENTION_DAYS=7

# Create snapshots for all VMs
for vm in $(kubectl get vm -n ${NAMESPACE} -o jsonpath='{.items
  • .metadata.name}'); do   snapshot_name="${vm}-snapshot-$(date +%Y%m%d-%H%M%S)"   cat <<EOF | kubectl apply -f - apiVersion: snapshot.kubevirt.io/v1alpha1 kind: VirtualMachineSnapshot metadata:   name: ${snapshot_name}   namespace: ${NAMESPACE}   labels:     backup-type: scheduled     vm-name: ${vm} spec:   source:     apiGroup: kubevirt.io     kind: VirtualMachine     name: ${vm} EOF   echo "Created snapshot: ${snapshot_name}" done # Clean up old snapshots kubectl get vmsnapshot -n ${NAMESPACE} -l backup-type=scheduled \   -o jsonpath='{range .items
  • }{.metadata.name} {.metadata.creationTimestamp}{"\n"}{end}' | \ while read name timestamp; do   age_days=$(( ($(date +%s) - $(date -d "${timestamp}" +%s)) / 86400 ))   if [ ${age_days} -gt ${RETENTION_DAYS} ]; then     kubectl delete vmsnapshot ${name} -n ${NAMESPACE}     echo "Deleted old snapshot: ${name}"   fi done
  • 六、总结

    迁移成果

    经过 6 个月的努力,我们完成了 2000 台虚机从 VMware 到 KubeVirt 的迁移:

    • 迁移完成率:1987 台成功迁移,13 台因特殊硬件需求保留在独立 VMware 集群
    • 成本节省:年度基础设施成本降低 62%(主要是 VMware 许可费)
    • 运维效率:虚机和容器统一管理,运维团队规模从 12 人优化到 8 人
    • 故障恢复:平均故障恢复时间从 45 分钟降低到 12 分钟

    经验教训

    1. 存储是关键:一开始低估了存储的重要性,Longhorn 在大规模虚机场景下确实撑不住。Ceph 虽然复杂,但稳定性和性能都能满足需求。
    2. 网络要提前规划:VLAN 透传的配置相当繁琐,建议在项目初期就把网络架构设计好。
    3. 分批迁移很重要:我们最开始想一次性迁移一个集群的虚机,结果出了问题影响面太大。后来改成每批 50 台,问题可控。
    4. Windows 比 Linux 难搞:Windows 虚机的驱动问题、激活问题让我们头疼了很久。建议准备专门的 Windows 迁移模板。
    5. 监控要先行:迁移前就要把监控体系建好,不然出问题都不知道往哪里排查。

    进阶学习方向

    • GPU 虚拟化:vGPU 透传给 AI 训练虚机
    • 嵌套虚拟化:在 KubeVirt 虚机里跑 Kubernetes(是的,我们有这个需求...)
    • 混合云:结合 Cluster API 实现跨云虚机管理

    参考资料

    • KubeVirt 官方文档
    • KubeVirt GitHub 仓库
    • virt-v2v 文档
    • Rook-Ceph 文档
    • OVN-Kubernetes 文档

    附录

    命令速查表

    # KubeVirt Management
    virtctl start <vm>                    # Start VM
    virtctl stop <vm>                     # Stop VM
    virtctl restart <vm>                  # Restart VM
    virtctl pause <vm>                    # Pause VM
    virtctl unpause <vm>                  # Unpause VM
    virtctl migrate <vm>                  # Trigger live migration
    virtctl console <vm>                  # Serial console access
    virtctl vnc <vm>                      # VNC access
    virtctl ssh <user>@<vm>               # SSH access
    virtctl guestfs <vm>                  # Access VM filesystem
    
    # Disk Management
    virtctl image-upload dv <name> --image-path=<path>   # Upload disk image
    virtctl addvolume <vm> --volume-name=<name>          # Hotplug volume
    virtctl removevolume <vm> --volume-name=<name>       # Hot-unplug volume
    
    # Snapshot
    kubectl get vmsnapshot                 # List snapshots
    kubectl get vmrestore                  # List restores
    
    # Troubleshooting
    kubectl get vm,vmi                     # VM status overview
    kubectl describe vmi <name>            # VM instance details
    kubectl logs virt-launcher-<vm>-xxx    # VM launcher logs

    配置参数详解

    参数 说明 推荐值
    parallelMigrationsPerCluster 集群级别并行迁移数 10
    parallelOutboundMigrationsPerNode 单节点并行迁移数 4
    bandwidthPerMigration 单次迁移带宽限制 1Gi(根据网络调整)
    completionTimeoutPerGiB 每 GB 数据迁移超时 800 秒
    progressTimeout 迁移无进展超时 300 秒

    术语表

    术语 全称 说明
    VM VirtualMachine KubeVirt 虚机资源定义
    VMI VirtualMachineInstance 运行中的虚机实例
    DVs DataVolume CDI 管理的持久化卷
    CDI Containerized Data Importer 虚机镜像导入组件
    virt-v2v - VMware 到 KVM 转换工具
    VMIM VirtualMachineInstanceMigration 虚机迁移任务对象

    希望这篇基于我们真实迁移经验总结的文章,能为你从 VMware 迁移到 KubeVirt 提供有价值的参考。如果你在迁移过程中遇到其他问题,欢迎到 云栈社区运维/DevOps/SRE 板块与大家交流讨论。




    上一篇:Kubernetes 节点频繁 NotReady?从 kubelet 到网络的全链路排查手册
    下一篇:万亿参数大模型是答案吗?对话前地平线高管剖析VLA局限与具身智能新范式
    您需要登录后才可以回帖 登录 | 立即注册

    手机版|小黑屋|网站地图|云栈社区 ( 苏ICP备2022046150号-2 )

    GMT+8, 2026-4-23 11:33 , Processed in 1.080635 second(s), 41 queries , Gzip On.

    Powered by Discuz! X3.5

    © 2025-2026 云栈社区.

    快速回复 返回顶部 返回列表