找回密码
立即注册
搜索
热搜: Java Python Linux Go
发回帖 发新帖

325

积分

0

好友

44

主题
发表于 昨天 04:16 | 查看: 4| 回复: 0

一、概述

1.1 背景介绍

Elasticsearch作为分布式搜索引擎,在生产环境中承载着海量数据的存储与检索任务。然而,分布式系统天然面临的网络分区、节点故障等问题,可能导致集群出现脑裂(Split-Brain)现象。脑裂是指集群因网络隔离被分割成多个独立子集群,每个子集群都选举出自己的 Master 节点,导致数据不一致、写入冲突等严重问题。本文从 Elasticsearch 的选主机制入手,深入分析脑裂产生的原因、危害,并提供系统化的预防和恢复方案。

1.2 技术特点

  • Zen Discovery 机制:Elasticsearch 7.x 之前的选主协议,基于 Ping-Pong 机制
  • Quorum 机制:通过 discovery.zen.minimum_master_nodes 参数防止脑裂
  • Raft 协议:Elasticsearch 7.0+ 引入,提供更强的一致性保证
  • Master 选举:基于节点 ID、版本号、集群状态版本号的多轮投票机制
  • 故障检测:通过心跳检测节点存活状态,触发重新选举

1.3 适用场景

  • 场景一:大规模 Elasticsearch 集群(节点数 > 50),网络环境复杂
  • 场景二:跨数据中心部署,存在网络延迟和不稳定性
  • 场景三:金融、电商等对数据一致性要求极高的业务场景
  • 场景四:7x24 小时不间断服务,需要自动故障恢复能力

1.4 环境要求

组件 版本要求 说明
Elasticsearch 7.17+/8.x 推荐使用最新稳定版
操作系统 CentOS 7+/Ubuntu 20.04+ Linux Kernel 4.0+
JDK OpenJDK 17 ES 8.x 要求 JDK 17
内存 32GB+ 堆内存建议不超过 32GB
网络 万兆网卡 低延迟网络环境

二、详细步骤

2.1 Elasticsearch 选主机制详解

◆ 2.1.1 Master 节点的职责

Master 节点负责集群的全局管理工作,但不处理数据写入和查询请求:

# elasticsearch.yml - Master 节点配置
node.roles: [ master ]
# Master 节点职责
# 1. 管理集群元数据(索引创建/删除、映射变更)
# 2. 节点加入/离开集群的协调
# 3. 分片分配决策(Shard Allocation)
# 4. 集群状态(Cluster State)的维护和同步

集群状态(Cluster State)包含

  • 集群配置信息
  • 索引元数据(mappings、settings)
  • 分片路由表(Shard Routing Table)
  • 节点信息
◆ 2.1.2 Zen Discovery 选主流程(ES 7.0 之前)

第一阶段:Ping 阶段

// 节点启动后,向配置的 discovery.seed_hosts 发送 Ping 请求
discovery.seed_hosts: ["10.0.1.101", "10.0.1.102", "10.0.1.103"]
// Ping 响应包含
// 1. 节点 ID
// 2. 节点角色
// 3. 集群名称
// 4. 集群状态版本号

第二阶段:选举阶段

选举规则(按优先级):
1. 集群状态版本号最新的节点
2. 节点 ID 最小的节点(字典序)

投票过程:
1. 候选 Master 节点向其他 Master-eligible 节点发送投票请求
2. 收到 (N/2 + 1) 票即当选(N 为 master-eligible 节点总数)
3. 当选后广播自己为新 Master

第三阶段:Master 确认

1. 新 Master 等待至少 minimum_master_nodes 个节点连接
2. 发布新的集群状态
3. 其他节点接受新 Master 的领导
◆ 2.1.3 脑裂产生的根本原因

案例一:网络分区

原始集群:3 个 Master-eligible 节点(Node1, Node2, Node3)
discovery.zen.minimum_master_nodes: 2

网络分区发生:
分区 A: Node1, Node2 (2 个节点,满足 minimum_master_nodes)
分区 B: Node3 (1 个节点,不满足条件)

结果:
- 分区 A 选举出 Master(Node1)
- 分区 B 无法选举 Master,进入只读状态

正确行为:分区 B 应该停止服务,等待网络恢复

案例二:配置错误导致脑裂

# 错误配置示例
# 5 个 Master-eligible 节点
discovery.zen.minimum_master_nodes: 2 # 错误!应该设置为 3

# 网络分区
分区A: Node1, Node2 (2个节点,满足minimum_master_nodes=2)
分区B: Node3, Node4, Node5 (3个节点,也满足=2)

结果:两个分区都能选举Master,产生脑裂

正确公式

minimum_master_nodes = (master_eligible_nodes / 2) + 1
示例:
3 个节点:(3/2) + 1 = 2
5 个节点:(5/2) + 1 = 3
7 个节点:(7/2) + 1 = 4

2.2 Elasticsearch 7.0+ 的改进

◆ 2.2.1 自动 Quorum 机制
# Elasticsearch 7.0+ 不再需要手动配置 minimum_master_nodes
# 系统自动计算并维护投票配置(Voting Configuration)

# 初始化集群时指定初始 Master 节点
cluster.initial_master_nodes: ["node-1", "node-2", "node-3"]

# 节点加入/离开时,自动调整投票配置
# 例如:3 节点集群 -> 5 节点集群
# 投票配置自动从 2 调整为 3
◆ 2.2.2 基于 Raft 的选主
Raft 协议优势:
1. 日志复制:Master 变更通过日志复制到多数节点
2. 任期(Term):每次选举递增 Term,防止旧 Master 干扰
3. 随机超时:避免同时选举导致的分票

选举过程:
1. Follower 节点超时未收到 Leader 心跳
2. 转为 Candidate,Term +1,发起投票
3. 获得多数票后成为 Leader
4. Leader 定期发送心跳维持地位
◆ 2.2.3 集群引导(Cluster Bootstrapping)
# 首次启动集群时必须指定初始 Master 节点
# elasticsearch.yml
cluster.initial_master_nodes:
  - node-1
  - node-2
  - node-3

# 注意事项
# 1. 只在首次启动时需要配置
# 2. 集群成功启动后应删除此配置
# 3. 不要在已运行的集群中修改此配置

自动化部署脚本

#!/bin/bash
# 初始化 ES 集群
MASTER_NODES=("es-master-1" "es-master-2" "es-master-3")

# 第一次启动时添加初始配置
for node in "${MASTER_NODES[@]}"; do
    ssh $node "echo 'cluster.initial_master_nodes: [\"${MASTER_NODES[@]}\"]' >> /etc/elasticsearch/elasticsearch.yml"
    ssh $node "systemctl start elasticsearch"
done

# 等待集群启动
sleep 30
# 检查集群状态
curl -X GET "http://es-master-1:9200/_cluster/health?pretty"

# 移除初始配置(避免误操作)
for node in "${MASTER_NODES[@]}"; do
    ssh $node "sed -i '/cluster.initial_master_nodes/d' /etc/elasticsearch/elasticsearch.yml"
done

echo "Cluster initialized successfully"

2.3 高可用架构设计

◆ 2.3.1 节点角色分离
# Master 节点(3 个)
node.roles: [ master ]
node.attr.box_type: hot

# Data-Hot 节点(用于热数据,SSD)
node.roles: [ data_hot, data_content ]
node.attr.box_type: hot

# Data-Warm 节点(用于温数据,SATA)
node.roles: [ data_warm, data_content ]
node.attr.box_type: warm

# Data-Cold 节点(用于冷数据,归档存储)
node.roles: [ data_cold, data_content ]
node.attr.box_type: cold

# Coordinating 节点(协调节点,不存储数据)
node.roles: [ ]

# Ingest 节点(数据预处理)
node.roles: [ ingest ]

推荐架构

负载均衡器
    ↓
Coordinating 节点(2+)
    ↓
Master 节点(3 个)
    ↓
Data-Hot 节点(5+)  Data-Warm 节点(3+)  Data-Cold 节点(2+)
◆ 2.3.2 跨机架/跨数据中心部署
# 启用分片分配感知
cluster.routing.allocation.awareness.attributes: zone, rack

# 节点配置
# 节点 1-3(机架 A)
node.attr.zone: zone-a
node.attr.rack: rack-1

# 节点 4-6(机架 B)
node.attr.zone: zone-a
node.attr.rack: rack-2

# 节点 7-9(机架 C)
node.attr.zone: zone-a
node.attr.rack: rack-3

# 强制分片分配(主副分片不在同一机架)
cluster.routing.allocation.awareness.force.zone.values: zone-a, zone-b

跨数据中心注意事项

# 1. 网络延迟优化
transport.tcp.compress: true
transport.ping_schedule: 30s

# 2. 选举超时调整(默认 30s,跨 DC 可增加到 60s)
cluster.election.duration: 60s
cluster.election.initial_timeout: 10s

# 3. 心跳超时
cluster.fault_detection.follower_check.timeout: 60s
cluster.fault_detection.leader_check.timeout: 60s

# 4. 禁用跨数据中心副本(减少网络流量)
# 使用 Shard Allocation Filtering
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.awareness.attributes": "datacenter",
    "cluster.routing.allocation.awareness.force.datacenter.values": "dc1,dc2"
  }
}
◆ 2.3.3 分片分配策略
// 索引模板配置
PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 5,
      "number_of_replicas": 1,
      // 分片分配规则
      "index.routing.allocation.require.box_type": "hot",
      // 自动分片分配
      "index.routing.allocation.total_shards_per_node": 2,
      // 延迟分配(节点离线后等待 5 分钟再重新分配分片)
      "index.unassigned.node_left.delayed_timeout": "5m"
    }
  }
}

分片计算公式

单个分片大小建议:20GB - 50GB
分片数量 = 数据量 / 目标分片大小

示例:
日志数据 500GB/天,保留 30 天 = 15TB
目标分片大小 30GB
分片数 = 15000GB / 30GB = 500 个主分片

按索引切分(每天一个索引):
500 / 30 = 约 17 个主分片/天

在设计和实施高可用的分布式架构时,分片的合理规划是关键一环。

2.4 脑裂预防与检测

◆ 2.4.1 网络隔离测试
#!/bin/bash
# 模拟网络分区(在测试环境使用)

# 节点 1 隔离节点 3
ssh node-1 "iptables -A INPUT -s 10.0.1.103 -j DROP"
ssh node-1 "iptables -A OUTPUT -d 10.0.1.103 -j DROP"

# 观察集群行为
watch -n 1 'curl -s http://10.0.1.101:9200/_cluster/health | jq'

# 恢复网络
ssh node-1 "iptables -D INPUT -s 10.0.1.103 -j DROP"
ssh node-1 "iptables -D OUTPUT -d 10.0.1.103 -j DROP"
◆ 2.4.2 监控脑裂指标
import requests
import time

def detect_split_brain(es_nodes):
    """
    检测集群是否出现脑裂
    """
    master_nodes = []
    for node in es_nodes:
        try:
            response = requests.get(f'http://{node}:9200/_cat/master?format=json')
            if response.status_code == 200:
                master_info = response.json()[0]
                master_nodes.append({
                    'queried_node': node,
                    'master_node': master_info['node'],
                    'master_id': master_info['id']
                })
        except Exception as e:
            print(f'Error querying {node}: {e}')

    # 检查是否有多个不同的 Master
    unique_masters = set(m['master_id'] for m in master_nodes)
    if len(unique_masters) > 1:
        print('[ALERT] Split-brain detected!')
        print(f'Multiple masters found: {unique_masters}')
        return True
    return False

# 持续监控
es_nodes = ['10.0.1.101', '10.0.1.102', '10.0.1.103']
while True:
    detect_split_brain(es_nodes)
    time.sleep(10)
◆ 2.4.3 自动化健康检查
#!/bin/bash
# es_health_check.sh
ES_HOST="localhost:9200"
LOG_FILE="/var/log/es_health.log"

check_cluster_health() {
    health=$(curl -s "http://$ES_HOST/_cluster/health")
    status=$(echo $health | jq -r '.status')
    echo "[$(date)] Cluster status: $status" >> $LOG_FILE
    if [ "$status" != "green" ]; then
        echo "[$(date)] WARNING: Cluster not green!" >> $LOG_FILE
        # 检查未分配分片
        unassigned=$(curl -s "http://$ES_HOST/_cat/shards?h=index,shard,prirep,state,unassigned.reason" | grep UNASSIGNED)
        echo "$unassigned" >> $LOG_FILE
        # 发送告警
        send_alert "ES cluster status: $status"
    fi
}

check_master_stability() {
    # 连续 3 次检查 Master 是否变化
    master1=$(curl -s "http://$ES_HOST/_cat/master?h=id")
    sleep 5
    master2=$(curl -s "http://$ES_HOST/_cat/master?h=id")
    sleep 5
    master3=$(curl -s "http://$ES_HOST/_cat/master?h=id")
    if [ "$master1" != "$master2" ] || [ "$master2" != "$master3" ]; then
        echo "[$(date)] WARNING: Master node unstable!" >> $LOG_FILE
        send_alert "ES master node changed multiple times"
    fi
}

send_alert() {
    local message=$1
    # 集成告警系统
    curl -X POST "http://alert.example.com/api/alert" \
         -H "Content-Type: application/json" \
         -d "{\"message\": \"$message\"}"
}

# 主循环
while true; do
    check_cluster_health
    check_master_stability
    sleep 60
done

这种自动化的监控与告警是运维/DevOps实践中保障系统稳定性的重要组成部分。

2.5 脑裂恢复方案

◆ 2.5.1 手动恢复步骤
# 1. 识别正确的 Master 分区(数据最新、节点最多)
curl -X GET "http://node-1:9200/_cluster/state/master_node,version?pretty"
# 输出示例
{
  "master_node" : "node-1-id",
  "version" : 12345,
  "state_uuid" : "abc123"
}

# 2. 停止错误分区的所有节点
ssh node-4 "systemctl stop elasticsearch"
ssh node-5 "systemctl stop elasticsearch"

# 3. 清理错误分区节点的数据(谨慎操作)
ssh node-4 "rm -rf /var/lib/elasticsearch/nodes/*/indices/*"

# 4. 重新加入集群
ssh node-4 "systemctl start elasticsearch"

# 5. 验证集群状态
curl -X GET "http://node-1:9200/_cluster/health?pretty"
◆ 2.5.2 自动化恢复脚本
import requests
import json
import subprocess
import time

class SplitBrainRecovery:
    def __init__(self, all_nodes):
        self.all_nodes = all_nodes
        self.partitions = []

    def detect_partitions(self):
        """
        检测网络分区
        """
        masters = {}
        for node in self.all_nodes:
            try:
                resp = requests.get(f'http://{node}:9200/_cat/master?format=json', timeout=5)
                if resp.status_code == 200:
                    master_id = resp.json()[0]['id']
                    if master_id not in masters:
                        masters[master_id] = []
                    masters[master_id].append(node)
            except:
                pass
        self.partitions = list(masters.values())
        return len(self.partitions) > 1

    def choose_primary_partition(self):
        """
        选择主分区(节点数最多且数据最新)
        """
        best_partition = None
        max_score = -1
        for partition in self.partitions:
            # 计算分数:节点数 + 集群状态版本号
            node = partition[0]
            try:
                resp = requests.get(f'http://{node}:9200/_cluster/state/version', timeout=5)
                version = resp.json()['version']
                score = len(partition) * 1000 + version
                if score > max_score:
                    max_score = score
                    best_partition = partition
            except:
                pass
        return best_partition

    def stop_secondary_partitions(self, primary_partition):
        """
        停止次要分区的节点
        """
        for partition in self.partitions:
            if partition != primary_partition:
                for node in partition:
                    print(f'Stopping node {node}')
                    subprocess.run(['ssh', node, 'systemctl stop elasticsearch'])

    def recover(self):
        """
        执行恢复
        """
        if not self.detect_partitions():
            print('No split-brain detected')
            return

        print(f'Split-brain detected! Found {len(self.partitions)} partitions')
        primary_partition = self.choose_primary_partition()
        print(f'Primary partition: {primary_partition}')

        self.stop_secondary_partitions(primary_partition)

        # 等待节点停止
        time.sleep(30)

        # 重启次要分区节点
        for partition in self.partitions:
            if partition != primary_partition:
                for node in partition:
                    print(f'Restarting node {node}')
                    subprocess.run(['ssh', node, 'systemctl start elasticsearch'])

        print('Recovery completed')

# 使用示例
nodes = ['10.0.1.101', '10.0.1.102', '10.0.1.103', '10.0.1.104', '10.0.1.105']
recovery = SplitBrainRecovery(nodes)
recovery.recover()

三、示例代码和配置

3.1 完整配置示例

◆ 3.1.1 生产环境配置文件
# /etc/elasticsearch/elasticsearch.yml
# 生产环境标准配置

# ======================== 集群配置 ========================
cluster.name: production-cluster
node.name: es-master-1

# ======================== 节点角色 ========================
node.roles: [ master ]

# ======================== 路径配置 ========================
path.data: /var/lib/elasticsearch
path.logs: /var/log/elasticsearch

# ======================== 网络配置 ========================
network.host: 10.0.1.101
http.port: 9200
transport.port: 9300

# ======================== 集群发现 ========================
discovery.seed_hosts:
  - 10.0.1.101
  - 10.0.1.102
  - 10.0.1.103

# 初始 Master 节点(仅首次启动)
cluster.initial_master_nodes:
  - es-master-1
  - es-master-2
  - es-master-3

# ======================== 选举配置 ========================
cluster.election.duration: 30s
cluster.election.initial_timeout: 10s
cluster.election.back_off_time: 100ms
cluster.election.max_timeout: 10s

# ======================== 故障检测 ========================
cluster.fault_detection.follower_check.interval: 10s
cluster.fault_detection.follower_check.timeout: 30s
cluster.fault_detection.follower_check.retry_count: 3
cluster.fault_detection.leader_check.interval: 10s
cluster.fault_detection.leader_check.timeout: 30s
cluster.fault_detection.leader_check.retry_count: 3

# ======================== 分片分配 ========================
cluster.routing.allocation.node_concurrent_recoveries: 2
cluster.routing.allocation.disk.threshold_enabled: true
cluster.routing.allocation.disk.watermark.low: 85%
cluster.routing.allocation.disk.watermark.high: 90%
cluster.routing.allocation.disk.watermark.flood_stage: 95%

# ======================== 内存配置 ========================
bootstrap.memory_lock: true

# ======================== JVM 配置 ========================
# 在 jvm.options 中配置
# -Xms16g
# -Xmx16g
# -XX:+UseG1GC

# ======================== 安全配置 ========================
xpack.security.enabled: true
xpack.security.transport.ssl.enabled: true
xpack.security.transport.ssl.verification_mode: certificate
xpack.security.transport.ssl.keystore.path: certs/elastic-certificates.p12
xpack.security.transport.ssl.truststore.path: certs/elastic-certificates.p12
◆ 3.1.2 系统优化配置
# /etc/sysctl.conf
vm.max_map_count=262144
vm.swappiness=1
fs.file-max=65535

# /etc/security/limits.conf
elasticsearch soft nofile 65535
elasticsearch hard nofile 65535
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

# 禁用 Swap
swapoff -a
sed -i '/ swap / s/^\(.*\)$/#\1/g' /etc/fstab

# 禁用 THP
echo never > /sys/kernel/mm/transparent_hugepage/enabled
echo never > /sys/kernel/mm/transparent_hugepage/defrag

3.2 实际应用案例

◆ 案例一:日志分析平台
# 索引生命周期管理(ILM)策略
PUT _ilm/policy/logs_policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "50GB",
            "max_age": "1d"
          },
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "7d",
        "actions": {
          "allocate": {
            "require": {
              "box_type": "warm"
            }
          },
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": {
          "allocate": {
            "require": {
              "box_type": "cold"
            }
          },
          "freeze": {},
          "set_priority": {
            "priority": 0
          }
        }
      },
      "delete": {
        "min_age": "90d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

# 索引模板
PUT _index_template/logs_template
{
  "index_patterns": ["logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "logs_policy",
      "index.lifecycle.rollover_alias": "logs",
      "index.routing.allocation.require.box_type": "hot"
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "level": { "type": "keyword" },
        "message": { "type": "text" },
        "host": { "type": "keyword" },
        "service": { "type": "keyword" }
      }
    }
  }
}
◆ 案例二:跨数据中心容灾
# 主数据中心(DC1)配置
# elasticsearch.yml
node.attr.datacenter: dc1
node.attr.zone: zone-1
cluster.routing.allocation.awareness.attributes: datacenter,zone

# 跨数据中心复制(CCR)
PUT _ccr/auto_follow/dc1_to_dc2
{
  "remote_cluster": "dc2_cluster",
  "leader_index_patterns": ["logs-*", "metrics-*"],
  "follow_index_pattern": "{{leader_index}}-replica",
  "settings": {
    "index.number_of_replicas": 0
  },
  "max_read_request_operation_count": 5120,
  "max_outstanding_read_requests": 12,
  "max_read_request_size": "32mb",
  "max_write_request_operation_count": 5120,
  "max_write_request_size": "9mb",
  "max_outstanding_write_requests": 9,
  "max_write_buffer_count": 2147483647,
  "max_write_buffer_size": "512mb",
  "max_retry_delay": "500ms",
  "read_poll_timeout": "1m"
}

四、最佳实践和注意事项

4.1 最佳实践

◆ 4.1.1 Master 节点配置建议
1. Master 节点数量:3 个或 5 个(奇数)
2. 专用 Master 节点:不承载数据和查询
3. 硬件配置:CPU 优先,内存 8GB-16GB 即可
4. 网络要求:低延迟、高带宽
5. 监控告警:Master 切换、选举超时
◆ 4.1.2 分片数量规划
单节点分片数建议:< 1000 个
单分片大小建议:20GB - 50GB
副本数量:1 个(两副本)

计算示例:
数据量:10TB
节点数:20 个
单分片大小:30GB
主分片数 = 10TB / 30GB = 340 个
每节点分片数 = 340 * 2 / 20 = 34 个(合理)
◆ 4.1.3 监控指标
# 关键监控指标
GET _cluster/health
GET _cluster/stats
GET _nodes/stats
GET _cat/master?v
GET _cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m,node.role

# Prometheus Exporter
# https://github.com/prometheus-community/elasticsearch_exporter

4.2 注意事项

◆ 4.2.1 版本升级注意事项
# 1. 滚动升级步骤
# 关闭分片分配
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "primaries"
  }
}

# 2. 停止一个节点
systemctl stop elasticsearch

# 3. 升级软件包
yum update elasticsearch

# 4. 启动节点
systemctl start elasticsearch

# 5. 等待节点加入集群
GET _cat/nodes?v

# 6. 重新启用分片分配
PUT _cluster/settings
{
  "persistent": {
    "cluster.routing.allocation.enable": "all"
  }
}

# 7. 重复 2-6 步骤升级所有节点
◆ 4.2.2 常见错误避免
1. 避免使用通配符删除索引(DELETE /*)
2. 避免单个索引分片数过多(> 100 个)
3. 避免 Master 节点同时承载数据
4. 避免跨版本集群(最多允许一个大版本差异)
5. 避免频繁重启 Master 节点

五、故障排查和监控

5.1 故障排查

◆ 5.1.1 集群状态异常
# 查看集群健康状态
GET _cluster/health?pretty

# 查看未分配分片原因
GET _cat/shards?h=index,shard,prirep,state,unassigned.reason&v
GET _cluster/allocation/explain

# 示例输出
{
  "index": "logs-2025.01.26",
  "shard": 0,
  "primary": true,
  "current_state": "unassigned",
  "unassigned_info": {
    "reason": "NODE_LEFT",
    "details": "node_left[node-4]"
  },
  "can_allocate": "no",
  "allocate_explanation": "cannot allocate because allocation is disabled"
}

# 手动分配分片
POST _cluster/reroute
{
  "commands": [
    {
      "allocate_replica": {
        "index": "logs-2025.01.26",
        "shard": 0,
        "node": "node-5"
      }
    }
  ]
}
◆ 5.1.2 Master 选举失败
# 查看选举日志
tail -f /var/log/elasticsearch/production-cluster.log | grep "master"

# 常见日志
[2025-01-26 10:00:00] [INFO ] master not discovered yet
[2025-01-26 10:00:30] [WARN ] timed out while waiting for initial discovery state
[2025-01-26 10:01:00] [INFO ] elected-as-master

# 排查步骤
1. 检查网络连通性:ping、telnet 9300 端口
2. 检查防火墙规则
3. 检查 discovery.seed_hosts 配置
4. 检查 cluster.name 是否一致
5. 检查节点时间是否同步(NTP)
◆ 5.1.3 性能问题排查
# 查看慢查询日志
GET _nodes/stats/indices/search?filter_path=**.took

# 查看线程池状态
GET _cat/thread_pool?v&h=node_name,name,active,queue,rejected

# 查看热点线程
GET _nodes/hot_threads

# 查看磁盘使用
GET _cat/allocation?v&h=node,disk.used,disk.total,disk.percent

5.2 性能监控

◆ 5.2.1 Elasticsearch Exporter
# docker-compose.yml
version: '3'
services:
  elasticsearch-exporter:
    image: quay.io/prometheuscommunity/elasticsearch-exporter:latest
    command:
      - '--es.uri=http://elasticsearch:9200'
      - '--es.all'
      - '--es.indices'
      - '--es.cluster_settings'
    ports:
      - "9114:9114"
◆ 5.2.2 Grafana Dashboard
// 导入 Dashboard ID: 266 (Elasticsearch Overview)
// 或自定义 Dashboard
{
  "panels":[
    {
      "title":"Cluster Health",
      "targets":[
        {"expr":"elasticsearch_cluster_health_status{cluster='production'}"}
      ]
    },
    {
      "title":"Master Node",
      "targets":[
        {"expr":"elasticsearch_cluster_health_number_of_nodes{cluster='production'}"}
      ]
    }
  ]
}

5.3 备份与恢复

◆ 5.3.1 快照备份
# 配置快照仓库
PUT _snapshot/backup_repo
{
  "type": "fs",
  "settings": {
    "location": "/mount/backups/elasticsearch",
    "compress": true
  }
}

# 创建快照
PUT _snapshot/backup_repo/snapshot_2025_01_26
{
  "indices": "logs-*,metrics-*",
  "ignore_unavailable": true,
  "include_global_state": false
}

# 查看快照状态
GET _snapshot/backup_repo/snapshot_2025_01_26/_status

# 恢复快照
POST _snapshot/backup_repo/snapshot_2025_01_26/_restore
{
  "indices": "logs-2025.01.26",
  "ignore_unavailable": true,
  "include_global_state": false
}
◆ 5.3.2 自动化备份脚本
#!/bin/bash
# es_backup.sh
ES_HOST="localhost:9200"
REPO_NAME="backup_repo"
DATE=$(date +%Y%m%d_%H%M%S)
SNAPSHOT_NAME="snapshot_$DATE"

# 创建快照
curl -X PUT "http://$ES_HOST/_snapshot/$REPO_NAME/$SNAPSHOT_NAME?wait_for_completion=true" \
  -H 'Content-Type: application/json' \
  -d '{
    "indices": "logs-*,metrics-*",
    "ignore_unavailable": true,
    "include_global_state": false
  }'

# 删除 30 天前的快照
OLD_DATE=$(date -d '30 days ago' +%Y%m%d)
curl -X GET "http://$ES_HOST/_snapshot/$REPO_NAME/_all" | \
  jq -r ".snapshots[] | select(.snapshot | startswith(\"snapshot_$OLD_DATE\")) | .snapshot" | \
  while read snapshot; do
    curl -X DELETE "http://$ES_HOST/_snapshot/$REPO_NAME/$snapshot"
  done

六、总结

6.1 技术要点回顾

  • 脑裂的根本原因是网络分区和配置不当,通过正确配置 minimum_master_nodes 或使用 ES 7.0+ 的自动 Quorum 机制可有效预防
  • Master 节点负责集群元数据管理,选举基于集群状态版本号和节点 ID,ES 7.0+ 引入 Raft 协议提供更强一致性
  • 高可用架构应采用节点角色分离、跨机架部署、合理的分片分配策略
  • 监控和自动化是保证集群稳定的关键,应建立完善的告警和故障恢复机制

6.2 进阶学习方向

  1. Elasticsearch 内部原理:Lucene 索引结构、倒排索引、段合并
  2. 查询性能优化:查询 DSL、聚合优化、缓存策略
  3. 大规模集群运维:数千节点集群的管理和优化
  4. Elastic Stack 整合:Logstash、Kibana、Beats 的协同使用

6.3 参考资料

  • Elasticsearch 官方文档
  • Elasticsearch 权威指南
  • Elastic Stack 实战手册
  • Elasticsearch 源码分析

附录

A. 命令速查表

# 集群管理
GET _cluster/health
GET _cluster/state
GET _cluster/stats
GET _cluster/pending_tasks

# 节点管理
GET _cat/nodes?v
GET _cat/master?v
GET _nodes/stats

# 索引管理
GET _cat/indices?v
GET _cat/shards?v
DELETE /index_name

# 分片管理
POST _cluster/reroute
GET _cluster/allocation/explain
PUT _cluster/settings

# 快照管理
GET _snapshot
GET _snapshot/repo_name/_all
PUT _snapshot/repo_name/snapshot_name
POST _snapshot/repo_name/snapshot_name/_restore

B. 配置参数速查

参数 说明 推荐值
cluster.name 集群名称 唯一标识
node.name 节点名称 唯一标识
discovery.seed_hosts 种子节点列表 所有 Master
cluster.initial_master_nodes 初始 Master 节点 首次启动配置
cluster.election.duration 选举超时时间 30s-60s
network.host 绑定地址 0.0.0.0
http.port HTTP 端口 9200
transport.port 传输端口 9300

C. 故障排查 Checklist

  • 检查集群健康状态:GET _cluster/health
  • 查看 Master 节点:GET _cat/master
  • 检查节点状态:GET _cat/nodes
  • 查看未分配分片:GET _cat/shards?h=index,shard,state,unassigned.reason
  • 检查网络连通性:telnet node-ip 9300
  • 查看日志文件:tail -f /var/log/elasticsearch/cluster.log
  • 验证配置文件:elasticsearch-yml-validator
  • 检查磁盘空间:df -h
  • 检查 JVM 堆内存:GET _nodes/stats/jvm
  • 测试快照恢复:POST _snapshot/repo/_restore
您需要登录后才可以回帖 登录 | 立即注册

手机版|小黑屋|网站地图|云栈社区(YunPan.Plus) ( 苏ICP备2022046150号-2 )

GMT+8, 2025-12-3 14:50 , Processed in 0.063455 second(s), 39 queries , Gzip On.

Powered by Discuz! X3.5

© 2025-2025 CloudStack.

快速回复 返回顶部 返回列表