云栈社区»论坛 › 技术文档「 Note & Doc 」 › KubeVirt 实战：2000台VMware虚拟机迁移至Kubernetes的完整方案 ...

发回帖发新帖

5560 积分	0 好友	731 主题

发消息

KubeVirt 实战：2000台VMware虚拟机迁移至Kubernetes的完整方案与避坑指南

发表于 2026-4-23 04:45:57 | 查看: 84| 回复: 0

一、概述

为什么选择 KubeVirt

在做技术选型的时候，我们评估过几个方案：

方案	优点	缺点	最终评估
OpenStack	成熟稳定，社区活跃	架构复杂，运维成本高	排除
Proxmox VE	开源免费，界面友好	企业级功能欠缺	备选
oVirt	Red Hat 支持	社区萎缩，未来不明	排除
KubeVirt	云原生，统一管理	相对年轻，学习曲线陡	最终选择

最终选择 KubeVirt 的核心原因：

统一技术栈：我们的容器平台已经在用 Kubernetes，虚机和容器统一管理能显著降低运维复杂度
渐进式迁移：可以在同一个集群里同时跑容器和虚机，业务可以逐步迁移
避免厂商锁定：纯开源方案，不想再被任何一个厂商卡脖子
成本控制：VMware 续费报价涨了 3 倍，这个价格根本不可能接受

KubeVirt 技术架构

简单说一下 KubeVirt 的核心组件：

┌─────────────────────────────────────────────────────────────┐
│                    Kubernetes Cluster                        │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────────────┐  │
│  │ virt-api    │  │virt-controller│ │ virt-handler (DaemonSet)│
│  │ (Deployment)│  │ (Deployment) │  │ (per node)          │  │
│  └─────────────┘  └─────────────┘  └─────────────────────┘  │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────────┐│
│  │                    libvirt + QEMU/KVM                   ││
│  └─────────────────────────────────────────────────────────┘│
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────────┐│
│  │              Node (Linux with KVM support)              ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘

virt-api：提供 KubeVirt API 入口，处理 VirtualMachine 等 CRD 的请求
virt-controller：监听 VM 相关资源的变化，负责创建对应的 VMI（VirtualMachineInstance）
virt-handler：DaemonSet 形式部署在每个节点，负责实际的虚机生命周期管理
virt-launcher：每个虚机对应一个 Pod，里面运行 libvirt 和 QEMU 进程

环境要求

这是我们生产环境的配置，供参考：

硬件配置：

计算节点：Dell PowerEdge R750xs，双路 Intel Xeon Gold 6348（28 核 56 线程），512GB DDR4 ECC，2 x 1.92TB NVMe SSD（本地缓存）
存储：Dell PowerStore 5200T，100TB 可用容量，iSCSI 连接
网络：Mellanox ConnectX-6 Dx 100GbE 双端口网卡，配置 bond

软件版本：

操作系统：Rocky Linux 8.9
Kubernetes：1.28.4（通过 kubeadm 部署）
KubeVirt：1.1.1
CDI（Containerized Data Importer）：1.58.1
存储：Longhorn 1.5.3（后来换成了 Rook-Ceph 1.12.9）
网络：Multus + OVN-Kubernetes

节点规模：

Master 节点：3 台
计算节点：42 台（其中 18 台专门跑虚机负载）

二、详细步骤

2.1 集群准备

检查硬件虚拟化支持

在每个计算节点上执行：

# Check if CPU supports virtualization
cat /proc/cpuinfo | grep -E "(vmx|svm)"

# Check if KVM module is loaded
lsmod | grep kvm

# If not loaded, load it manually
modprobe kvm
modprobe kvm_intel  # For Intel CPU
# modprobe kvm_amd  # For AMD CPU

# Make it persistent
echo "kvm" >> /etc/modules-load.d/kvm.conf
echo "kvm_intel" >> /etc/modules-load.d/kvm.conf

节点标签和污点配置

我们把虚机负载和容器负载隔离开，避免资源争抢：

# Label nodes for VM workloads
kubectl label node node-vm-{01..18} node-role.kubernetes.io/virtualization=true
kubectl label node node-vm-{01..18} kubevirt.io/schedulable=true

# Add taint to prevent regular pods from scheduling
kubectl taint node node-vm-{01..18} virtualization=true:NoSchedule

部署 KubeVirt Operator

# Set KubeVirt version
export KUBEVIRT_VERSION=v1.1.1

# Deploy the KubeVirt operator
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${KUBEVIRT_VERSION}/kubevirt-operator.yaml

# Wait for operator to be ready
kubectl wait --for=condition=available --timeout=300s deployment/virt-operator -n kubevirt

# Create KubeVirt CR to deploy the components
kubectl apply -f https://github.com/kubevirt/kubevirt/releases/download/${KUBEVIRT_VERSION}/kubevirt-cr.yaml

# Verify all components are running
kubectl get pods -n kubevirt

期望看到的输出：

NAME                                       READY   STATUS    RESTARTS   AGE
virt-api-7d5c9b8c8b-4x7k9         1/1     Running   0          5m
virt-api-7d5c9b8c8b-8j2m3         1/1     Running   0          5m
virt-controller-6c7d8f9b7-2k4n5   1/1     Running   0          5m
virt-controller-6c7d8f9b7-9x8m2   1/1     Running   0          5m
virt-handler-4k2j8                1/1     Running   0          4m
virt-handler-7m3n9                1/1     Running   0          4m
... (每个节点一个 virt-handler)

KubeVirt 配置优化

这是我们生产环境用的配置，针对大规模虚机做了专门优化：

apiVersion: kubevirt.io/v1
kind: KubeVirt
metadata:
  name: kubevirt
  namespace: kubevirt
spec:
  certificateRotateStrategy: {}
  configuration:
    developerConfiguration:
      featureGates:
      - LiveMigration
      - HotplugVolumes
      - HotplugNICs
      - Snapshot
      - VMExport
      - ExpandDisks
      - GPU
      - HostDevices
      - Macvtap
      - Passt
    migrations:
      parallelMigrationsPerCluster: 10
      parallelOutboundMigrationsPerNode: 4
      bandwidthPerMigration: 1Gi
      completionTimeoutPerGiB: 800
      progressTimeout: 300
      allowAutoConverge: true
      allowPostCopy: true
    network:
      defaultNetworkInterface: bridge
      permitBridgeInterfaceOnPodNetwork: true
      permitSlirpInterface: false
    smbios:
      manufacturer: "KubeVirt"
      product: "None"
      version: "1.1.1"
    supportedGuestAgentVersions:
    - "4.*"
    - "5.*"
    permittedHostDevices:
      pciHostDevices:
      - pciVendorSelector: "10DE:1EB8"
        resourceName: "nvidia.com/T4"
        externalResourceProvider: true
  customizeComponents:
    patches:
    - resourceType: Deployment
      resourceName: virt-controller
      patch: '{"spec":{"replicas":3}}'
      type: strategic
    - resourceType: Deployment
      resourceName: virt-api
      patch: '{"spec":{"replicas":3}}'
      type: strategic
  imagePullPolicy: IfNotPresent
  workloadUpdateStrategy:
    workloadUpdateMethods:
    - LiveMigrate

2.2 存储配置

存储是整个迁移过程中最头疼的部分。我们最开始用的是 Longhorn，跑了两个月发现性能扛不住，后来切换到了 Rook-Ceph。

Rook-Ceph 集群部署

# Clone Rook repository
git clone --single-branch --branch v1.12.9 https://github.com/rook/rook.git
cd rook/deploy/examples

# Create Rook operator
kubectl create -f crds.yaml -f common.yaml -f operator.yaml

# Wait for operator to be ready
kubectl -n rook-ceph wait --for=condition=available --timeout=600s deployment/rook-ceph-operator

Ceph 集群配置文件（cluster.yaml）：

apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
  name: rook-ceph
  namespace: rook-ceph
spec:
  dataDirHostPath: /var/lib/rook
  cephVersion:
    image: quay.io/ceph/ceph:v18.2.1
    allowUnsupported: false
  mon:
    count: 3
    allowMultiplePerNode: false
    volumeClaimTemplate:
      spec:
        storageClassName: local-storage
        resources:
          requests:
            storage: 50Gi
  mgr:
    count: 2
    allowMultiplePerNode: false
    modules:
    - name: pg_autoscaler
      enabled: true
    - name: rook
      enabled: true
    - name: prometheus
      enabled: true
  dashboard:
    enabled: true
    ssl: true
  crashCollector:
    disable: false
  storage:
    useAllNodes: false
    useAllDevices: false
    config:
      osdsPerDevice: "1"
      encryptedDevice: "false"
    nodes:
    - name: "storage-node-01"
      devices:
      - name: "nvme0n1"
      - name: "nvme1n1"
      - name: "nvme2n1"
      - name: "nvme3n1"
    - name: "storage-node-02"
      devices:
      - name: "nvme0n1"
      - name: "nvme1n1"
      - name: "nvme2n1"
      - name: "nvme3n1"
    - name: "storage-node-03"
      devices:
      - name: "nvme0n1"
      - name: "nvme1n1"
      - name: "nvme2n1"
      - name: "nvme3n1"
  resources:
    mgr:
      limits:
        cpu: "2"
        memory: "2Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    mon:
      limits:
        cpu: "2"
        memory: "2Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    osd:
      limits:
        cpu: "4"
        memory: "8Gi"
      requests:
        cpu: "2"
        memory: "4Gi"
  priorityClassNames:
    mon: system-node-critical
    osd: system-node-critical
    mgr: system-cluster-critical
  disruptionManagement:
    managePodBudgets: true
    osdMaintenanceTimeout: 30
    pgHealthCheckTimeout: 0

创建 RBD StorageClass 供虚机使用：

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool-vm
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
    requireSafeReplicaSize: true
  parameters:
    compression_mode: aggressive
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rook-ceph-block-vm
  provisioner: rook-ceph.rbd.csi.ceph.com
parameters:
  clusterID: rook-ceph
  pool: replicapool-vm
  imageFormat: "2"
  imageFeatures: layering,fast-diff,object-map,deep-flatten,exclusive-lock
  csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph
  csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner
  csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph
  csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node
  csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph
  csi.storage.k8s.io/fstype: ext4
reclaimPolicy: Delete
allowVolumeExpansion: true
volumeBindingMode: Immediate

2.3 CDI 部署和镜像导入

CDI（Containerized Data Importer）负责虚机镜像的导入和管理。

# Deploy CDI
export CDI_VERSION=v1.58.1
kubectl apply -f https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VERSION}/cdi-operator.yaml
kubectl apply -f https://github.com/kubevirt/containerized-data-importer/releases/download/${CDI_VERSION}/cdi-cr.yaml

# Wait for CDI to be ready
kubectl wait --for=condition=available --timeout=300s deployment/cdi-deployment -n cdi

CDI 配置优化：

apiVersion: cdi.kubevirt.io/v1beta1
kind: CDI
metadata:
  name: cdi
spec:
  config:
    uploadProxyURLOverride: "https://cdi-uploadproxy.kubevirt.svc:443"
    scratchSpaceStorageClass: "rook-ceph-block-vm"
    podResourceRequirements:
      limits:
        cpu: "4"
        memory: "4Gi"
      requests:
        cpu: "1"
        memory: "1Gi"
    filesystemOverhead:
      global: "0.1"
    preallocation: true
    honorWaitForFirstConsumer: true
    importProxy:
      HTTPProxy: ""
      HTTPSProxy: ""
      noProxy: "*.cluster.local,10.0.0.0/8,172.16.0.0/12,192.168.0.0/16"
  workload:
    nodeSelector:
      node-role.kubernetes.io/virtualization: "true"

2.4 网络配置

网络配置是另一个复杂的部分。我们需要虚机能够使用原来 VMware 环境的 VLAN 网络，这就需要用到 Multus 和 OVN。

安装 Multus CNI

# Deploy Multus
kubectl apply -f https://raw.githubusercontent.com/k8snetworkplumbingwg/multus-cni/v4.0.2/deployments/multus-daemonset-thick.yml

# Verify Multus is running
kubectl get pods -n kube-system -l app=multus

配置 OVN-Kubernetes

我们使用 OVN-Kubernetes 来实现虚机的二层网络。这里贴一下核心配置：

apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: vlan100-production
  namespace: vm-production
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "vlan100-production",
      "type": "ovn-k8s-cni-overlay",
      "topology": "localnet",
      "netAttachDefName": "vm-production/vlan100-production",
      "vlanID": 100,
      "mtu": 9000,
      "subnets": "10.100.0.0/16"
    }
---
apiVersion: k8s.cni.cncf.io/v1
kind: NetworkAttachmentDefinition
metadata:
  name: vlan200-database
  namespace: vm-production
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "vlan200-database",
      "type": "ovn-k8s-cni-overlay",
      "topology": "localnet",
      "netAttachDefName": "vm-production/vlan200-database",
      "vlanID": 200,
      "mtu": 9000,
      "subnets": "10.200.0.0/16"
    }

在 OVS 上配置对应的 bridge mapping：

# On each node, configure OVS bridge mapping
ovs-vsctl set Open_vSwitch . external-ids:ovn-bridge-mappings="physnet1:br-ex,physnet-vlan100:br-vlan100,physnet-vlan200:br-vlan200"

# Create VLAN bridges
ovs-vsctl add-br br-vlan100
ovs-vsctl add-br br-vlan200

# Add physical interface to bridges
ovs-vsctl add-port br-vlan100 ens2f0.100 tag=100
ovs-vsctl add-port br-vlan200 ens2f0.200 tag=200

三、VMware 虚机迁移实操

这是本文的重点。迁移 2000 台虚机绝对不是简单的导出导入，我们开发了一套完整的迁移工具和流程。

3.1 迁移工具选型

我们评估了几种迁移方法：

方法	优点	缺点	适用场景
virt-v2v	官方工具，功能全面	速度慢，需要关机	小规模迁移
手动导出 VMDK 转 qcow2	简单直接	效率低，无法批量	测试验证
MTV (Migration Toolkit)	自动化程度高	需要 OpenShift	OpenShift 用户
自研脚本	灵活可控	开发成本	大规模定制化迁移

最终我们选择了 virt-v2v + 自研批量脚本 的组合。

3.2 迁移前准备

VMware 环境信息收集

先收集所有虚机的信息，我写了个脚本来导出：

#!/usr/bin/env python3
# vm_inventory_export.py
# Export VMware VM inventory for migration planning

from pyVim.connect import SmartConnect, Disconnect
from pyVmomi import vim
import ssl
import csv
import argparse
from datetime import datetime

def get_vm_info(vm):
    """Extract detailed VM information"""
    summary = vm.summary
    config = vm.config

    # Get disk information
    disks = []
    total_disk_size = 0
    for device in config.hardware.device:
        if isinstance(device, vim.vm.device.VirtualDisk):
            disk_size_gb = device.capacityInKB / 1024 / 1024
            disks.append({
                'label': device.deviceInfo.label,
                'size_gb': round(disk_size_gb, 2),
                'thin': getattr(device.backing, 'thinProvisioned', False)
            })
            total_disk_size += disk_size_gb

    # Get network information
    networks = []
    for device in config.hardware.device:
        if isinstance(device, vim.vm.device.VirtualEthernetCard):
            network_name = ""
            if hasattr(device.backing, 'network'):
                network_name = device.backing.network.name if device.backing.network else ""
            elif hasattr(device.backing, 'port'):
                network_name = device.backing.port.portgroupKey
            networks.append({
                'label': device.deviceInfo.label,
                'network': network_name,
                'mac': device.macAddress
            })

    return {
        'name': summary.config.name,
        'power_state': summary.runtime.powerState,
        'cpu': summary.config.numCpu,
        'memory_gb': summary.config.memorySizeMB / 1024,
        'guest_os': summary.config.guestFullName,
        'guest_id': summary.config.guestId,
        'vmware_tools': summary.guest.toolsStatus if summary.guest else 'unknown',
        'ip_address': summary.guest.ipAddress if summary.guest else '',
        'hostname': summary.guest.hostName if summary.guest else '',
        'total_disk_gb': round(total_disk_size, 2),
        'disks': disks,
        'networks': networks,
        'folder': get_folder_path(vm),
        'resource_pool': vm.resourcePool.name if vm.resourcePool else '',
        'cluster': get_cluster_name(vm),
        'datastore': summary.config.vmPathName.split()[0].strip('[]'),
        'uuid': summary.config.instanceUuid,
        'annotation': config.annotation if config.annotation else ''
    }

def get_folder_path(vm):
    """Get full folder path of VM"""
    path = []
    parent = vm.parent
    while parent:
        if hasattr(parent, 'name'):
            path.insert(0, parent.name)
        parent = getattr(parent, 'parent', None)
    return '/'.join(path)

def get_cluster_name(vm):
    """Get cluster name where VM resides"""
    host = vm.runtime.host
    if host and host.parent:
        return host.parent.name
    return ''

def main():
    parser = argparse.ArgumentParser(description='Export VMware VM inventory')
    parser.add_argument('--host', required=True, help='vCenter hostname')
    parser.add_argument('--user', required=True, help='vCenter username')
    parser.add_argument('--password', required=True, help='vCenter password')
    parser.add_argument('--output', default='vm_inventory.csv', help='Output CSV file')
    args = parser.parse_args()

    # Disable SSL certificate verification (for lab environments)
    context = ssl.create_default_context()
    context.check_hostname = False
    context.verify_mode = ssl.CERT_NONE

    # Connect to vCenter
    si = SmartConnect(host=args.host, user=args.user, pwd=args.password, sslContext=context)

    try:
        content = si.RetrieveContent()
        container = content.viewManager.CreateContainerView(
            content.rootFolder, [vim.VirtualMachine], True
        )

        vms = []
        for vm in container.view:
            try:
                vm_info = get_vm_info(vm)
                vms.append(vm_info)
                print(f"Collected: {vm_info['name']}")
            except Exception as e:
                print(f"Error collecting {vm.name}: {str(e)}")

        container.Destroy()

        # Write to CSV
        with open(args.output, 'w', newline='', encoding='utf-8') as f:
            fieldnames = ['name', 'power_state', 'cpu', 'memory_gb', 'total_disk_gb',
                         'guest_os', 'guest_id', 'ip_address', 'hostname', 'folder',
                         'cluster', 'datastore', 'uuid', 'vmware_tools', 'networks',
                         'disks', 'annotation']
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            for vm in vms:
                vm['networks'] = str(vm['networks'])
                vm['disks'] = str(vm['disks'])
                writer.writerow(vm)

        print(f"\nExported {len(vms)} VMs to {args.output}")

    finally:
        Disconnect(si)

if __name__ == '__main__':
    main()

运行后生成的 CSV 包含了所有虚机的详细信息，我们据此制定迁移计划。

虚机分类和优先级

根据收集到的信息，我们把虚机分成几类：

# migration_priority.yaml
priority_1_low_risk:
  criteria:
  - power_state: poweredOff
  - no_shared_disks: true
  - disk_size: <100GB
  estimated_count: 342
  migration_method: batch_offline

priority_2_stateless:
  criteria:
  - tags: ["web-frontend", "api-server"]
  - can_recreate: true
  estimated_count: 567
  migration_method: recreate_from_template

priority_3_standard:
  criteria:
  - power_state: poweredOn
  - disk_size: 100GB-500GB
  - sla: standard
  estimated_count: 845
  migration_method: live_migration_with_downtime

priority_4_critical:
  criteria:
  - tags: ["database", "critical-app"]
  - sla: high
  estimated_count: 198
  migration_method: carefully_planned_window

priority_5_complex:
  criteria:
  - shared_disks: true
  - raw_device_mapping: true
  - gpu_passthrough: true
  estimated_count: 48
  migration_method: manual_with_verification

3.3 批量迁移脚本

这是我们实际用的迁移脚本核心部分：

#!/bin/bash
# vm_migration.sh
# Batch migration script for VMware to KubeVirt

set -euo pipefail

# Configuration
VCENTER_HOST="vcenter.internal.company.com"
VCENTER_USER="administrator@vsphere.local"
VCENTER_PASSWORD="${VCENTER_PASSWORD:?VCENTER_PASSWORD not set}"
ESXI_HOSTS=("esxi01.internal" "esxi02.internal" "esxi03.internal")
NFS_EXPORT_PATH="/mnt/migration-staging"
K8S_NAMESPACE="vm-production"
STORAGE_CLASS="rook-ceph-block-vm"
PARALLEL_JOBS=4
LOG_DIR="/var/log/vm-migration"

# Color output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log_info() { echo -e "${GREEN}[INFO]${NC}$(date '+%Y-%m-%d %H:%M:%S')$1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC}$(date '+%Y-%m-%d %H:%M:%S')$1"; }
log_error() { echo -e "${RED}[ERROR]${NC}$(date '+%Y-%m-%d %H:%M:%S')$1"; }

# Create log directory
mkdir -p "${LOG_DIR}"

# Export VM from VMware using virt-v2v
export_vm() {
  local vm_name=$1
  local esxi_host=$2
  local output_dir="${NFS_EXPORT_PATH}/${vm_name}"

  log_info "Starting export of ${vm_name} from ${esxi_host}"

  mkdir -p "${output_dir}"

  # Run virt-v2v to convert VMware VM to KVM format
  virt-v2v -ic "vpx://${VCENTER_USER}@${VCENTER_HOST}/${esxi_host}?no_verify=1" \
    "${vm_name}" \
    -o local -os "${output_dir}" \
    -of qcow2 \
    --password-file <(echo "${VCENTER_PASSWORD}") \
    2>&1 | tee "${LOG_DIR}/${vm_name}_export.log"

  if [ $? -eq 0 ]; then
    log_info "Successfully exported ${vm_name}"
    return 0
  else
    log_error "Failed to export ${vm_name}"
    return 1
  fi
}

# Generate KubeVirt VM manifest
generate_vm_manifest() {
  local vm_name=$1
  local cpu=$2
  local memory_gb=$3
  local disk_path=$4
  local network_name=$5
  local mac_address=$6

  local manifest_file="${NFS_EXPORT_PATH}/${vm_name}/${vm_name}-vm.yaml"

  cat > "${manifest_file}" << EOF
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: ${vm_name}
  namespace: ${K8S_NAMESPACE}
  labels:
    app: ${vm_name}
    migration-source: vmware
    migration-date: $(date +%Y-%m-%d)
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/vm: ${vm_name}
    spec:
      nodeSelector:
        node-role.kubernetes.io/virtualization: "true"
      tolerations:
        - key: "virtualization"
          operator: "Equal"
          value: "true"
          effect: "NoSchedule"
      domain:
        cpu:
          cores: ${cpu}
          sockets: 1
          threads: 1
        memory:
          guest: ${memory_gb}Gi
        resources:
          requests:
            memory: ${memory_gb}Gi
          limits:
            memory: $((memory_gb + 1))Gi
        devices:
          disks:
            - name: rootdisk
              disk:
                bus: virtio
              bootOrder: 1
            - name: cloudinitdisk
              disk:
                bus: virtio
          interfaces:
            - name: default
              masquerade: {}
            - name: production-net
              bridge: {}
              macAddress: "${mac_address}"
          networkInterfaceMultiqueue: true
          rng: {}
        machine:
          type: q35
        features:
          acpi: {}
          smm:
            enabled: true
        firmware:
          bootloader:
            efi:
              secureBoot: false
      networks:
        - name: default
          pod: {}
        - name: production-net
          multus:
            networkName: ${network_name}
      terminationGracePeriodSeconds: 180
      volumes:
        - name: rootdisk
          dataVolume:
            name: ${vm_name}-rootdisk
        - name: cloudinitdisk
          cloudInitNoCloud:
            userData: |
              #cloud-config
              preserve_hostname: true
              manage_etc_hosts: false
---
apiVersion: cdi.kubevirt.io/v1beta1
kind: DataVolume
metadata:
  name: ${vm_name}-rootdisk
  namespace: ${K8S_NAMESPACE}
spec:
  source:
    upload: {}
  pvc:
    accessModes:
      - ReadWriteOnce
    resources:
      requests:
        storage: $(get_disk_size "${disk_path}")
    storageClassName: ${STORAGE_CLASS}
EOF

  log_info "Generated manifest: ${manifest_file}"
}

# Get disk size from qcow2 file
get_disk_size() {
  local disk_path=$1
  local size_bytes=$(qemu-img info --output json "${disk_path}" | jq '.["virtual-size"]')
  local size_gb=$((size_bytes / 1024 / 1024 / 1024))
  # Add 10% buffer
  echo "$((size_gb * 110 / 100))Gi"
}

# Upload disk image to DataVolume
upload_disk_image() {
  local vm_name=$1
  local disk_path=$2

  log_info "Uploading disk image for ${vm_name}"

  # Wait for DataVolume to be ready for upload
  kubectl wait --for=condition=UploadReady datavolume/${vm_name}-rootdisk \
    -n ${K8S_NAMESPACE} --timeout=300s

  # Get upload token
  local upload_url=$(kubectl get dv ${vm_name}-rootdisk -n ${K8S_NAMESPACE} \
    -o jsonpath='{.status.uploadProxy}')

  # Upload using virtctl
  virtctl image-upload dv ${vm_name}-rootdisk \
    --namespace=${K8S_NAMESPACE} \
    --image-path="${disk_path}" \
    --insecure \
    --uploadproxy-url="https://cdi-uploadproxy.cdi.svc:443" \
    2>&1 | tee -a "${LOG_DIR}/${vm_name}_upload.log"

  if [ $? -eq 0 ]; then
    log_info "Successfully uploaded disk for ${vm_name}"
    return 0
  else
    log_error "Failed to upload disk for ${vm_name}"
    return 1
  fi
}

# Full migration workflow for a single VM
migrate_single_vm() {
  local vm_name=$1
  local vm_config=$2

  # Parse VM configuration
  local cpu=$(echo "${vm_config}" | jq -r '.cpu')
  local memory=$(echo "${vm_config}" | jq -r '.memory_gb')
  local esxi_host=$(echo "${vm_config}" | jq -r '.esxi_host')
  local network=$(echo "${vm_config}" | jq -r '.network')
  local mac=$(echo "${vm_config}" | jq -r '.mac_address')

  log_info "Starting migration of ${vm_name}"

  # Step 1: Export from VMware
  if ! export_vm "${vm_name}" "${esxi_host}"; then
    return 1
  fi

  # Step 2: Find the converted disk
  local disk_path=$(find "${NFS_EXPORT_PATH}/${vm_name}" -name "*.qcow2" | head -1)
  if [ -z "${disk_path}" ]; then
    log_error "No qcow2 disk found for ${vm_name}"
    return 1
  fi

  # Step 3: Generate manifest
  generate_vm_manifest "${vm_name}" "${cpu}" "${memory}" "${disk_path}" "${network}" "${mac}"

  # Step 4: Apply manifest
  kubectl apply -f "${NFS_EXPORT_PATH}/${vm_name}/${vm_name}-vm.yaml"

  # Step 5: Upload disk
  if ! upload_disk_image "${vm_name}" "${disk_path}"; then
    return 1
  fi

  # Step 6: Verify VM
  kubectl wait --for=condition=Ready vm/${vm_name} -n ${K8S_NAMESPACE} --timeout=600s

  log_info "Migration completed for ${vm_name}"
  return 0
}

# Batch migration with parallel execution
batch_migrate() {
  local vm_list_file=$1

  log_info "Starting batch migration from ${vm_list_file}"

  # Read VM list and run migrations in parallel
  cat "${vm_list_file}" | while read line; do
    local vm_name=$(echo "${line}" | jq -r '.name')
    echo "${line}" | migrate_single_vm "${vm_name}" &

    # Control parallel jobs
    while [ $(jobs -r | wc -l) -ge ${PARALLEL_JOBS} ]; do
      sleep 10
    done
  done

  # Wait for all jobs to complete
  wait

  log_info "Batch migration completed"
}

# Main entry point
main() {
  case "${1:-}" in
    single)
      shift
      migrate_single_vm "$@"
      ;;
    batch)
      shift
      batch_migrate "$@"
      ;;
    *)
      echo "Usage: $0 {single|batch} [args...]"
      exit 1
      ;;
  esac
}

main "$@"

3.4 Windows 虚机迁移特殊处理

Windows 虚机的迁移比 Linux 复杂得多，主要问题在于驱动。VMware 用的是 VMware Tools 里的 PVSCSI 和 VMXNET3 驱动，迁移到 KubeVirt 后需要换成 VirtIO 驱动。

迁移前需要在 Windows 虚机里安装 VirtIO 驱动：

# Download VirtIO drivers ISO
# https://fedorapeople.org/groups/virt/virtio-win/direct-downloads/stable-virtio/virtio-win.iso

# Mount ISO and install drivers
# Install via Device Manager or use these PowerShell commands:

# Install VirtIO storage driver
pnputil.exe /add-driver E:\vioscsi\w10\amd64\*.inf /install

# Install VirtIO network driver
pnputil.exe /add-driver E:\NetKVM\w10\amd64\*.inf /install

# Install VirtIO balloon driver
pnputil.exe /add-driver E:\Balloon\w10\amd64\*.inf /install

# Install QEMU guest agent
msiexec.exe /i E:\guest-agent\qemu-ga-x86_64.msi /qn

# Verify drivers are installed
Get-WindowsDriver -Online | Where-Object { $_.ProviderName -like "*Red Hat*" }

迁移后的 Windows 虚机配置：

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: windows-server-2022
  namespace: vm-production
spec:
  running: true
  template:
    spec:
      domain:
        clock:
          timer:
            hpet:
              present: false
            hyperv: {}
            pit:
              tickPolicy: delay
            rtc:
              tickPolicy: catchup
            utc: {}
        cpu:
          cores: 4
          sockets: 1
          threads: 2
        devices:
          disks:
          - bootOrder: 1
            disk:
              bus: sata
            name: rootdisk
          inputs:
          - bus: usb
            name: tablet
            type: tablet
          interfaces:
          - masquerade: {}
            model: e1000e
            name: default
          tpm: {}
        features:
          acpi: {}
          apic: {}
          hyperv:
            frequencies: {}
            ipi: {}
            relaxed: {}
            reset: {}
            runtime: {}
            spinlocks:
              spinlocks: 8191
            synic: {}
            synictimer:
              direct: {}
            tlbflush: {}
            vapic: {}
            vpindex: {}
          smm:
            enabled: true
        firmware:
          bootloader:
            efi:
              secureBoot: true
        machine:
          type: q35
        memory:
          guest: 8Gi
        resources:
          requests:
            memory: 8Gi
      networks:
      - name: default
        pod: {}
      volumes:
      - dataVolume:
          name: windows-server-2022-rootdisk
        name: rootdisk

四、最佳实践和注意事项

4.1 性能优化

CPU 优化

# Enable CPU pinning for latency-sensitive workloads
spec:
  template:
    spec:
      domain:
        cpu:
          cores: 4
          sockets: 1
          threads: 1
          dedicatedCpuPlacement: true
          isolateEmulatorThread: true
          model: host-passthrough
          numa:
            guestMappingPassthrough: {}

我们实测，开启 CPU 绑定后，数据库虚机的 P99 延迟降低了 40%。

内存优化

# Enable hugepages for memory-intensive workloads
spec:
  template:
    spec:
      domain:
        memory:
          guest: 32Gi
          hugepages:
            pageSize: 1Gi

使用大页内存需要在节点上预先配置：

# Configure 1GB hugepages on node
echo 34 > /sys/kernel/mm/hugepages/hugepages-1048576kB/nr_hugepages

# Make it persistent
cat >> /etc/sysctl.conf << EOF
vm.nr_hugepages = 34
EOF

# Label nodes with hugepages
kubectl label node node-vm-01 kubevirt.io/hugepages-1Gi=true

存储 IO 优化

# Tune IO for database workloads
spec:
  template:
    spec:
      domain:
        devices:
          disks:
          - name: datadisk
            disk:
              bus: virtio
            io: native
            cache: none
            dedicatedIOThread: true

对于 SSD 存储，使用 cache: none 配合 io: native 可以获得最佳的随机 IO 性能。我们的 MySQL 虚机在这个配置下，IOPS 达到了 85000（4K 随机读）。

4.2 高可用配置

虚机反亲和性

spec:
  template:
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
          - labelSelector:
              matchLabels:
                app: mysql-cluster
            topologyKey: kubernetes.io/hostname

实时迁移配置

# Global migration settings in KubeVirt CR
spec:
  configuration:
    migrations:
      parallelMigrationsPerCluster: 10
      parallelOutboundMigrationsPerNode: 4
      bandwidthPerMigration: 1Gi
      completionTimeoutPerGiB: 800
      progressTimeout: 300
      allowAutoConverge: true
      allowPostCopy: false  # Disable post-copy for safety

迁移带宽要根据网络情况调整。我们是 100Gbps 网络，设置 1Gi 带宽可以在不影响生产流量的情况下快速完成迁移。

4.3 安全加固

启用 SELinux

确保节点上 SELinux 是 enforcing 模式：

# Check SELinux status
getenforce

# Set to enforcing
setenforce 1

# Make permanent
sed -i 's/SELINUX=permissive/SELINUX=enforcing/' /etc/selinux/config

网络策略

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: vm-production-isolation
  namespace: vm-production
spec:
  podSelector:
    matchLabels:
      kubevirt.io/domain: database-server
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 3306
  egress:
  - to:
    - podSelector:
        matchLabels:
          app: backend
    ports:
    - protocol: TCP
      port: 3306
  - to:
    - namespaceSelector:
        matchLabels:
          name: kube-system
      podSelector:
        matchLabels:
          k8s-app: kube-dns
    ports:
    - protocol: UDP
      port: 53

资源限制

apiVersion: v1
kind: ResourceQuota
metadata:
  name: vm-production-quota
  namespace: vm-production
spec:
  hard:
    requests.cpu: "200"
    requests.memory: "800Gi"
    limits.cpu: "400"
    limits.memory: "1Ti"
    persistentvolumeclaims: "100"
    requests.storage: "50Ti"

4.4 常见错误和解决方案

问题1：虚机启动失败，报错 "Guest agent is not connected"

原因：QEMU Guest Agent 未安装或服务未启动。

解决：

# Linux
yum install qemu-guest-agent -y
systemctl enable --now qemu-guest-agent

# Windows
# Install qemu-ga from virtio-win ISO
# Start service
sc start QEMU-GA

问题2：网络不通，虚机无法获取 IP

原因：Multus 配置错误或 VLAN 未正确透传。

排查步骤：

# Check Multus pod logs
kubectl logs -n kube-system -l app=multus

# Check network attachment
kubectl get net-attach-def -n vm-production

# Enter virt-launcher pod to check
kubectl exec -it virt-launcher-myvm-xxx -n vm-production -- ip addr
kubectl exec -it virt-launcher-myvm-xxx -n vm-production -- bridge link

问题3：磁盘性能差

原因：使用了错误的缓存策略或 IO 模式。

优化：

# Change from default to optimized settings
devices:
  disks:
  - name: rootdisk
    disk:
      bus: virtio
    cache: none  # Was: writethrough
    io: native   # Was: threads

问题4：实时迁移失败

常见原因和解决：

# 1. Check source/destination node connectivity
virtctl migrate myvm -n vm-production

# 2. Check migration status
kubectl get vmim -n vm-production

# 3. Common fixes:
# - Increase migration bandwidth
# - Enable allowAutoConverge for busy VMs
# - Check storage is accessible from both nodes

五、故障排查和监控

5.1 日志查看

# KubeVirt operator logs
kubectl logs -n kubevirt -l kubevirt.io=virt-operator

# virt-controller logs (VM scheduling issues)
kubectl logs -n kubevirt -l kubevirt.io=virt-controller

# virt-handler logs (VM lifecycle on node)
kubectl logs -n kubevirt -l kubevirt.io=virt-handler --all-containers

# virt-launcher logs (specific VM)
kubectl logs -n vm-production virt-launcher-myvm-xxxxx -c compute

# libvirt logs inside virt-launcher
kubectl exec -n vm-production virt-launcher-myvm-xxxxx -- cat /var/log/libvirt/qemu/vm-production_myvm.log

5.2 常用排查命令

# Check VM status
kubectl get vm,vmi -n vm-production

# Describe VM for events
kubectl describe vm myvm -n vm-production

# Check VMI conditions
kubectl get vmi myvm -n vm-production -o jsonpath='{.status.conditions}'

# Enter VM console
virtctl console myvm -n vm-production

# SSH to VM (if SSH is configured)
virtctl ssh user@myvm -n vm-production

# VNC access
virtctl vnc myvm -n vm-production

# Stop/Start VM
virtctl stop myvm -n vm-production
virtctl start myvm -n vm-production

# Restart VM
virtctl restart myvm -n vm-production

# Force stop (like pulling power cord)
virtctl stop myvm -n vm-production --grace-period=0 --force

5.3 监控配置

我们使用 Prometheus + Grafana 监控 KubeVirt。

ServiceMonitor 配置：

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: kubevirt
  namespace: monitoring
spec:
  namespaceSelector:
    matchNames:
    - kubevirt
  selector:
    matchLabels:
      prometheus.kubevirt.io: "true"
  endpoints:
  - port: metrics
    interval: 15s
    scrapeTimeout: 10s

关键监控指标：

# VM CPU usage
kubevirt_vmi_cpu_system_usage_seconds_total
kubevirt_vmi_cpu_user_usage_seconds_total

# VM memory usage
kubevirt_vmi_memory_available_bytes
kubevirt_vmi_memory_used_bytes

# VM network IO
kubevirt_vmi_network_receive_bytes_total
kubevirt_vmi_network_transmit_bytes_total

# VM disk IO
kubevirt_vmi_storage_read_traffic_bytes_total
kubevirt_vmi_storage_write_traffic_bytes_total
kubevirt_vmi_storage_iops_read_total
kubevirt_vmi_storage_iops_write_total

# Migration metrics
kubevirt_vmi_migration_data_processed_bytes
kubevirt_vmi_migration_data_remaining_bytes
kubevirt_vmi_migration_phase_transition_time_seconds

Grafana Dashboard 配置就不贴了，可以直接导入社区的 Dashboard ID: 11748。

5.4 备份恢复

使用 KubeVirt 的快照功能进行备份：

# Create snapshot
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
  name: myvm-snapshot-20241219
  namespace: vm-production
spec:
  source:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: myvm
---
# Restore from snapshot
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineRestore
metadata:
  name: myvm-restore
  namespace: vm-production
spec:
  target:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: myvm
  virtualMachineSnapshotName: myvm-snapshot-20241219

定时备份脚本：

#!/bin/bash
# vm_backup.sh - Automated VM snapshot script

NAMESPACE="vm-production"
RETENTION_DAYS=7

# Create snapshots for all VMs
for vm in $(kubectl get vm -n ${NAMESPACE} -o jsonpath='{.items.metadata.name}'); do
  snapshot_name="${vm}-snapshot-$(date +%Y%m%d-%H%M%S)"

  cat <<EOF | kubectl apply -f -
apiVersion: snapshot.kubevirt.io/v1alpha1
kind: VirtualMachineSnapshot
metadata:
  name: ${snapshot_name}
  namespace: ${NAMESPACE}
  labels:
    backup-type: scheduled
    vm-name: ${vm}
spec:
  source:
    apiGroup: kubevirt.io
    kind: VirtualMachine
    name: ${vm}
EOF

  echo "Created snapshot: ${snapshot_name}"
done

# Clean up old snapshots
kubectl get vmsnapshot -n ${NAMESPACE} -l backup-type=scheduled \
  -o jsonpath='{range .items}{.metadata.name} {.metadata.creationTimestamp}{"\n"}{end}' | \
while read name timestamp; do
  age_days=$(( ($(date +%s) - $(date -d "${timestamp}" +%s)) / 86400 ))
  if [ ${age_days} -gt ${RETENTION_DAYS} ]; then
    kubectl delete vmsnapshot ${name} -n ${NAMESPACE}
    echo "Deleted old snapshot: ${name}"
  fi
done

六、总结

迁移成果

经过 6 个月的努力，我们完成了 2000 台虚机从 VMware 到 KubeVirt 的迁移：

迁移完成率：1987 台成功迁移，13 台因特殊硬件需求保留在独立 VMware 集群
成本节省：年度基础设施成本降低 62%（主要是 VMware 许可费）
运维效率：虚机和容器统一管理，运维团队规模从 12 人优化到 8 人
故障恢复：平均故障恢复时间从 45 分钟降低到 12 分钟

经验教训

存储是关键：一开始低估了存储的重要性，Longhorn 在大规模虚机场景下确实撑不住。Ceph 虽然复杂，但稳定性和性能都能满足需求。
网络要提前规划：VLAN 透传的配置相当繁琐，建议在项目初期就把网络架构设计好。
分批迁移很重要：我们最开始想一次性迁移一个集群的虚机，结果出了问题影响面太大。后来改成每批 50 台，问题可控。
Windows 比 Linux 难搞：Windows 虚机的驱动问题、激活问题让我们头疼了很久。建议准备专门的 Windows 迁移模板。
监控要先行：迁移前就要把监控体系建好，不然出问题都不知道往哪里排查。

进阶学习方向

GPU 虚拟化：vGPU 透传给 AI 训练虚机
嵌套虚拟化：在 KubeVirt 虚机里跑 Kubernetes（是的，我们有这个需求...）
混合云：结合 Cluster API 实现跨云虚机管理

参考资料

KubeVirt 官方文档
KubeVirt GitHub 仓库
virt-v2v 文档
Rook-Ceph 文档
OVN-Kubernetes 文档

附录

命令速查表

# KubeVirt Management
virtctl start <vm>                    # Start VM
virtctl stop <vm>                     # Stop VM
virtctl restart <vm>                  # Restart VM
virtctl pause <vm>                    # Pause VM
virtctl unpause <vm>                  # Unpause VM
virtctl migrate <vm>                  # Trigger live migration
virtctl console <vm>                  # Serial console access
virtctl vnc <vm>                      # VNC access
virtctl ssh <user>@<vm>               # SSH access
virtctl guestfs <vm>                  # Access VM filesystem

# Disk Management
virtctl image-upload dv <name> --image-path=<path>   # Upload disk image
virtctl addvolume <vm> --volume-name=<name>          # Hotplug volume
virtctl removevolume <vm> --volume-name=<name>       # Hot-unplug volume

# Snapshot
kubectl get vmsnapshot                 # List snapshots
kubectl get vmrestore                  # List restores

# Troubleshooting
kubectl get vm,vmi                     # VM status overview
kubectl describe vmi <name>            # VM instance details
kubectl logs virt-launcher-<vm>-xxx    # VM launcher logs

配置参数详解

参数	说明	推荐值
`parallelMigrationsPerCluster`	集群级别并行迁移数	10
`parallelOutboundMigrationsPerNode`	单节点并行迁移数	4
`bandwidthPerMigration`	单次迁移带宽限制	1Gi（根据网络调整）
`completionTimeoutPerGiB`	每 GB 数据迁移超时	800 秒
`progressTimeout`	迁移无进展超时	300 秒

术语表

术语	全称	说明
VM	VirtualMachine	KubeVirt 虚机资源定义
VMI	VirtualMachineInstance	运行中的虚机实例
DVs	DataVolume	CDI 管理的持久化卷
CDI	Containerized Data Importer	虚机镜像导入组件
virt-v2v	-	VMware 到 KVM 转换工具
VMIM	VirtualMachineInstanceMigration	虚机迁移任务对象

希望这篇基于我们真实迁移经验总结的文章，能为你从 VMware 迁移到 KubeVirt 提供有价值的参考。如果你在迁移过程中遇到其他问题，欢迎到云栈社区的运维/DevOps/SRE 板块与大家交流讨论。

上一篇：Kubernetes 节点频繁 NotReady？从 kubelet 到网络的全链路排查手册
下一篇：万亿参数大模型是答案吗？对话前地平线高管剖析VLA局限与具身智能新范式

KubeVirt, 虚拟机迁移, Kubernetes, Rook-Ceph, 运维