找回密码
立即注册
搜索
热搜: Java Python Linux Go
发回帖 发新帖

5371

积分

0

好友

696

主题
发表于 2 小时前 | 查看: 4| 回复: 0

本文是基于前文《Hadoop 3.3.6分布式集群优化实战:HDFS Federation》的延续。我们在已扩展为HDFS Federation架构的集群基础上,进一步为HDFS和YARN引入High Availability (HA) 机制,确保核心服务的高可用性。

1. 集群规划与环境概述

1.1 硬件与操作系统

  • 操作系统: Red Hat Enterprise Linux Server 7.5 (Maipo)
  • 节点数量: 5台物理/虚拟机
  • 用户账户: hadoop (用户目录: /data/hadoop)
  • 网络环境: 内网千兆/万兆互联

1.2 集群拓扑规划 (zk=QuorumPeerMain、zkfc=DFSZKFailoverController)

IP地址 主机名 服务组件
172.253.81.116 hadoop1 DataNode + zk + NameNode(ns1.nn1) + zkfc1 + JournalNode + NodeManager
172.253.81.117 hadoop2 DataNode + zk + NameNode(ns1.nn2) + zkfc1 + JournalNode + ResourceManager(rm1) + NodeManager + WebAppProxyServer
172.253.62.172 hadoop3 DataNode + zk + NameNode(ns2.nn3) + zkfc2 + JournalNode + ResourceManager(rm2) + NodeManager + JobHistoryServer
172.253.62.173 hadoop4 DataNode + NameNode(ns2.nn4) + zkfc2 + NodeManager
172.253.80.122 hadoop5 DataNode + NodeManager

2. 软件获取与基础准备

2.1 下载必要软件包

# zookeeper 3.8.3 下载地址:
https://archive.apache.org/dist/zookeeper/zookeeper-3.8.3/apache-zookeeper-3.8.3-bin.tar.gz

2.2 软件解压与目录结构

在一台机器上完成配置,然后通过scp分发至其他节点即可。

cd /data/hadoop/software
tar -zvxf apache-zookeeper-3.8.3-bin.tar.gz -C ../tools/

# 最终目录结构
/data/hadoop/
├── software/           # 存放下载的压缩包
└── tools/             # 存放解压后的软件
    ├── jdk1.8.0_461/
    ├── hadoop-3.3.6/
    └── apache-zookeeper-3.8.3-bin/
└── hadoop_data         # 存放数据和日志
    ├── data_datanode     # hdfs-site.xml:dfs.datanode.data.dir,DataNode存储位置配置
    ├── data_journalnode  # hdfs-site.xml:dfs.journalnode.edits.dir,JournalNode服务存储NameNode编辑日志的本地目录
    ├── data_namenode     # hdfs-site.xml:dfs.namenode.name.dir,NameNode存储位置
    ├── data_tmp          # core-site.xml:hadoop.tmp.dir,Hadoop临时数据存储目录
    └── logs              # hadoop-env.sh:HADOOP_LOG_DIR,hadoop日志目录

3. 部署 ZooKeeper

cd /data/hadoop/tools/apache-zookeeper-3.8.3-bin
mkdir data
cp conf/zoo_sample.cfg conf/zoo.cfg

# 修改conf/zoo.cfg
dataDir=/data/hadoop/tools/apache-zookeeper-3.8.3-bin/data
server.1=172.253.81.116:2888:3888
server.2=172.253.81.117:2888:3888
server.3=172.253.62.172:2888:3888

# 在节点hadoop1上执行
cd /data/hadoop/tools/
rsync -avz --delete apache-zookeeper-3.8.3-bin/ hadoop@172.253.81.117:/data/hadoop/tools/apache-zookeeper-3.8.3-bin/
rsync -avz --delete apache-zookeeper-3.8.3-bin/ hadoop@172.253.62.172:/data/hadoop/tools/apache-zookeeper-3.8.3-bin/

# 在节点hadoop1-hadoop3上分别执行,分配id
cd /data/hadoop/tools/apache-zookeeper-3.8.3-bin && echo {1/2/3} > data/myid

3.1 启动 ZooKeeper (QuorumPeerMain)

# 在节点hadoop1-hadoop3上执行,顺序无所谓
cd /data/hadoop/tools/apache-zookeeper-3.8.3-bin
./bin/zkServer.sh start

启动日志:

ZooKeeper JMX enabled by default
Using config: /data/hadoop/tools/apache-zookeeper-3.8.3-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

状态日志

[hadoop@172.253.81.116 ~/tools/apache-zookeeper-3.8.3-bin]$ ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /data/hadoop/tools/apache-zookeeper-3.8.3-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower

[hadoop@172.253.81.117 ~/tools/apache-zookeeper-3.8.3-bin]$ ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /data/hadoop/tools/apache-zookeeper-3.8.3-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader

[hadoop@172.253.62.172 ~/tools/apache-zookeeper-3.8.3-bin]$ ./bin/zkServer.sh status
ZooKeeper JMX enabled by default
Using config: /data/hadoop/tools/apache-zookeeper-3.8.3-bin/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower

4. HDFS和YARN新增HA配置

4.1 配置 hdfs-site.xml

<configuration>
 <!-- 每个数据块在集群中保存3个副本 -->
 <property>
  <name>dfs.replication</name>
  <value>3</value>
 </property>
 <!-- 禁用HDFS权限检查,简化开发调试 -->
 <property>
  <name>dfs.permissions.enabled</name>
  <value>false</value>
 </property>

 <!-- NameNode存储位置配置 -->
 <property>
  <name>dfs.namenode.name.dir</name>
  <value>/data/hadoop/tools/hadoop_data/data_namenode</value>
 </property>

 <!-- DataNode存储位置配置 -->
 <property>
  <name>dfs.datanode.data.dir</name>
  <value>/data/hadoop/tools/hadoop_data/data_datanode</value>
 </property>

 <!-- JournalNode服务存储NameNode编辑日志的本地目录 -->
 <property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/data/hadoop/tools/hadoop_data/data_journalnode</value>
 </property>

 <!-- NameNode服务列表 -->
 <property>
  <name>dfs.nameservices</name>
  <value>ns1,ns2</value>
 </property>
 <!-- ns1的HA -->
 <property>
  <name>dfs.ha.namenodes.ns1</name>
  <value>nn1,nn2</value>
 </property>
 <!-- ns1.nn1的NameNode服务地址 -->
 <property>
  <name>dfs.namenode.rpc-address.ns1.nn1</name>
  <value>172.253.81.116:8020</value>
 </property>
 <!-- ns1.nn2的NameNode服务地址 -->
 <property>
  <name>dfs.namenode.rpc-address.ns1.nn2</name>
  <value>172.253.81.117:8020</value>
 </property>
 <!-- ns1.nn1的NameNode的web管理地址 -->
 <property>
  <name>dfs.namenode.http-address.ns1.nn1</name>
  <value>172.253.81.116:9870</value>
 </property>
 <!-- ns1.nn2的NameNode的web管理地址 -->
 <property>
  <name>dfs.namenode.http-address.ns1.nn2</name>
  <value>172.253.81.117:9870</value>
 </property>
 <!-- ns1的共享edits在哪些JournalNode上 -->
 <property>
  <name>dfs.namenode.shared.edits.dir.ns1</name>
  <value>qjournal://172.253.81.116:8485;172.253.81.117:8485;172.253.62.172:8485/ns1</value>
 </property>
 <!-- ns1开启自动故障转移 -->
 <property>
  <name>dfs.ha.automatic-failover.enabled.ns1</name>
  <value>true</value>
 </property>
 <!-- ns1,客户端访问时使用的故障转移代理类 -->
 <property>
  <name>dfs.client.failover.proxy.provider.ns1</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>
 <!-- ns2的HA -->
 <property>
  <name>dfs.ha.namenodes.ns2</name>
  <value>nn3,nn4</value>
 </property>
 <!-- ns2.nn3的NameNode服务地址 -->
 <property>
  <name>dfs.namenode.rpc-address.ns2.nn3</name>
  <value>172.253.62.172:8020</value>
 </property>
 <!-- ns2.nn4的NameNode服务地址 -->
 <property>
  <name>dfs.namenode.rpc-address.ns2.nn4</name>
  <value>172.253.62.173:8020</value>
 </property>
 <!-- ns2.nn3的NameNode的web管理地址 -->
 <property>
  <name>dfs.namenode.http-address.ns2.nn3</name>
  <value>172.253.62.172:9870</value>
 </property>
 <!-- ns2.nn4的NameNode的web管理地址 -->
 <property>
  <name>dfs.namenode.http-address.ns2.nn4</name>
  <value>172.253.62.173:9870</value>
 </property>
 <!-- ns2的共享edits在哪些JournalNode上 -->
 <property>
  <name>dfs.namenode.shared.edits.dir.ns2</name>
  <value>qjournal://172.253.81.116:8485;172.253.81.117:8485;172.253.62.172:8485/ns2</value>
 </property>
 <!-- ns2开启自动故障转移 -->
 <property>
  <name>dfs.ha.automatic-failover.enabled.ns2</name>
  <value>true</value>
 </property>
 <!-- ns2,客户端访问时使用的故障转移代理类 -->
 <property>
  <name>dfs.client.failover.proxy.provider.ns2</name>
  <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
 </property>

 <!-- HA,自动故障转移的隔离机制,使用sshfence -->
 <property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
 </property>
 <!-- HA,sshfence需要用到的私钥路径 -->
 <property>
 <name>dfs.ha.fencing.ssh.private-key-files</name>
 <value>/data/hadoop/.ssh/id_rsa</value>
 </property>
</configuration>

4.2 配置 yarn-site.xml

<configuration>
 <!-- NodeManager上运行的附加服务 -->
 <property>
  <name>yarn.nodemanager.aux-services</name>
  <value>mapreduce_shuffle</value>
 </property>
 <!-- HA,启用ResouceManager HA -->
 <property>
  <name>yarn.resourcemanager.ha.enabled</name>
  <value>true</value>
 </property>
 <!-- HA,Yarn集群名,标识用 -->
 <property>
  <name>yarn.resourcemanager.cluster-id</name>
  <value>yarn_cluster_test</value>
 </property>
 <!-- HA,Yarn的实例 -->
 <property>
  <name>yarn.resourcemanager.ha.rm-ids</name>
  <value>rm1,rm2</value>
 </property>
 <!-- HA,Yarn集群rm1的服务地址 -->
 <property>
  <name>yarn.resourcemanager.hostname.rm1</name>
  <value>172.253.81.117</value>
 </property>
 <!-- HA,Yarn集群rm1的web监控和REST接口地址 -->
 <property>
  <name>yarn.resourcemanager.webapp.address.rm1</name>
  <value>172.253.81.117:8088</value>
 </property>
 <!-- HA,Yarn集群rm2的服务地址 -->
 <property>
  <name>yarn.resourcemanager.hostname.rm2</name>
  <value>172.253.62.172</value>
 </property>
 <!-- HA,Yarn集群rm2的web监控和REST接口地址 -->
 <property>
  <name>yarn.resourcemanager.webapp.address.rm2</name>
  <value>172.253.62.172:8088</value>
 </property>
 <!-- HA,Yarn集群rm的存储状态在哪些JournalNode上 -->
 <property>
  <name>yarn.resourcemanager.zk-address</name>
  <value>172.253.81.116:2181,172.253.81.117:2181,172.253.62.172:2181</value>
 </property>
 <!-- Web应用代理地址 -->
 <property>
  <name>yarn.web-proxy.address</name>
  <value>172.253.81.117:8888</value>
 </property>
 <!-- 启用日志聚合功能 -->
 <property>
  <name>yarn.log-aggregation-enable</name>
  <value>true</value>
 </property>
 <!-- 聚合日志保留时间(秒),604800秒=7天 -->
 <property>
  <name>yarn.log-aggregation.retain-seconds</name>
  <value>604800</value>
 </property>
 <!-- NodeManager可用内存资源,8192Mb -->
 <property>
  <name>yarn.nodemanager.resource.memory-mb</name>
  <value>8192</value>
 </property>
 <!-- NodeManager可用CPU核心数 -->
 <property>
  <name>yarn.nodemanager.resource.cpu-vcores</name>
  <value>8</value>
 </property>
</configuration>

4.3 配置 core-site.xml

<configuration>
 <!-- 设置HDFS默认访问地址和端口,改为ViewFS统一入口 -->
 <property>
  <name>fs.defaultFS</name>
  <value>viewfs:///</value>
 </property>

 <!-- ViewFS挂载点配置 - 将命名空间(/ns1、/ns2、/tmp)映射到统一视图 -->
 <property>
  <name>fs.viewfs.mounttable.default.link./ns1</name>
  <value>hdfs://172.253.81.116:8020/</value>
 </property>
 <property>
  <name>fs.viewfs.mounttable.default.link./ns2</name>
  <value>hdfs://172.253.62.172:8020/</value>
 </property>
 <property>
  <name>fs.viewfs.mounttable.default.link./tmp</name>
  <value>hdfs://172.253.81.116:8020/tmp</value>
 </property>

 <!-- NameNode服务列表-->
 <property>
  <name>dfs.nameservices</name>
  <value>ns1,ns2</value>
 </property>
 <!-- ns1的NameNode服务地址-->
 <property>
  <name>dfs.namenode.rpc-address.ns1</name>
  <value>172.253.81.116:8020</value>
 </property>
 <!-- ns2的NameNode服务地址-->
 <property>
  <name>dfs.namenode.rpc-address.ns2</name>
  <value>172.253.62.172:8020</value>
 </property>

 <!-- zookeeper集群地址 -->
 <property>
 <name>ha.zookeeper.quorum</name>
 <value>172.253.81.116:2181,172.253.81.117:2181,172.253.62.172:2181</value>
 </property>

 <!-- Hadoop临时数据存储目录 -->
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/data/hadoop/tools/hadoop_data/data_tmp</value>
 </property>
 <!-- 回收站保留时间(分钟),10080分钟=7天 -->
 <property>
  <name>fs.trash.interval</name>
  <value>10080</value>
 </property>
</configuration>

4.4 配置 hadoop-env.sh

# 新增配置
export HADOOP_LOG_DIR=/data/hadoop/tools/hadoop_data/logs

4.5 配置分发

# 在主节点hadoop1上执行
cd /data/hadoop/tools/
rsync -avz --delete hadoop-3.3.6/ hadoop@172.253.81.117:/data/hadoop/tools/hadoop-3.3.6/
rsync -avz --delete hadoop-3.3.6/ hadoop@172.253.62.172:/data/hadoop/tools/hadoop-3.3.6/
rsync -avz --delete hadoop-3.3.6/ hadoop@172.253.62.173:/data/hadoop/tools/hadoop-3.3.6/
rsync -avz --delete hadoop-3.3.6/ hadoop@172.253.80.122:/data/hadoop/tools/hadoop-3.3.6/

5. 集群启动与验证

注意:

  1. 若修改了etc下的配置文件,使用sh脚本停止集群可能会失败,可直接强制终止:jps | awk '{print $1;}' | xargs kill -9

5.1 执行初始化

# 节点hadoop1-hadoop5上执行,清理NameNode、DataNode、JournalNode和日志数据
rm -rf /data/hadoop/tools/hadoop_data/data_namenode/*
rm -rf /data/hadoop/tools/hadoop_data/data_datanode/*
rm -rf /data/hadoop/tools/hadoop_data/data_journalnode/*
rm -rf /data/hadoop/tools/hadoop_data/data_tmp/*
rm -rf /data/hadoop/tools/hadoop_data/logs/*

# 节点hadoop1-hadoop3上格式化JournalNode
./bin/hdfs journalnode -format -force -journalId ns1
./bin/hdfs journalnode -format -force -journalId ns2

# 在节点hadoop1-hadoop3上执行,顺序执行
cd /data/hadoop/tools/hadoop-3.3.6
./bin/hdfs --daemon start journalnode

# 节点hadoop1和hadoop3上格式化NameNode,cluster_id_test_hadoop为自定义的字符串,为clusterId
./bin/hdfs namenode -format -clusterId cluster_id_test_hadoop -force

5.2 启动ZooKeeper(QuorumPeerMain)

# 在节点hadoop1-hadoop3上执行,顺序无所谓
cd /data/hadoop/tools/apache-zookeeper-3.8.3-bin
./bin/zkServer.sh start

启动日志:

ZooKeeper JMX enabled by default
Using config: /data/hadoop/tools/apache-zookeeper-3.8.3-bin/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED

5.3 启动JournalNode(JournalNode)

# 在节点hadoop1-hadoop3上执行,顺序执行
cd /data/hadoop/tools/hadoop-3.3.6
./bin/hdfs --daemon start journalnode

5.4 启动NameNode和ZKFC(NameNode+DFSZKFailoverController)

节点ns1

# 在节点hadoop1上执行
./bin/hdfs --daemon start namenode
# 在节点hadoop2上执行
./bin/hdfs namenode -bootstrapStandby
./bin/hdfs --daemon start namenode

# 此时,ns1的nn1和nn2都是standby,需要在节点hadoop1-hadoop2上启动zkfc配置选主
[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs haadmin -ns ns1 -getServiceState nn1
standby
[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs haadmin -ns ns1 -getServiceState nn2
standby

# 在节点hadoop1上执行,首次才需要格式化
./bin/hdfs zkfc -formatZK
# 在节点hadoop1-hadoop2上执行
./bin/hdfs --daemon start zkfc

# 此时
[hadoop@172.253.81.117 ~/tools/hadoop-3.3.6]$ ./bin/hdfs haadmin -ns ns1 -getServiceState nn1
active
[hadoop@172.253.81.117 ~/tools/hadoop-3.3.6]$ ./bin/hdfs haadmin -ns ns1 -getServiceState nn2
standby

节点ns2

# 在节点hadoop3上执行
./bin/hdfs --daemon start namenode
# 在节点hadoop4上执行
./bin/hdfs namenode -bootstrapStandby
./bin/hdfs --daemon start namenode

# 此时,ns2的nn3和nn4都是standby,需要在节点hadoop3-hadoop4上启动zkfc配置选主
[hadoop@172.253.62.173 ~/tools/hadoop-3.3.6]$ ./bin/hdfs haadmin -ns ns2 -getServiceState nn3
standby
[hadoop@172.253.62.173 ~/tools/hadoop-3.3.6]$ ./bin/hdfs haadmin -ns ns2 -getServiceState nn4
standby

# 在节点hadoop3上执行,首次才需要格式化
./bin/hdfs zkfc -formatZK
# 在节点hadoop1-hadoop2上执行
./bin/hdfs --daemon start zkfc

# 此时
[hadoop@172.253.62.172 ~/tools/hadoop-3.3.6]$ ./bin/hdfs haadmin -ns ns2 -getServiceState nn3
active
[hadoop@172.253.62.172 ~/tools/hadoop-3.3.6]$ ./bin/hdfs haadmin -ns ns2 -getServiceState nn4
standby

5.5 启动DataNode

# 在节点hadoop1-hadoop5上执行
./bin/hdfs --daemon start datanode

5.6 启动YARN服务(ResourceManager+NodeManager+WebAppProxyServer)

# 在节点hadoop2上执行
./sbin/start-yarn.sh

# 启动日志
[hadoop@172.253.81.117 ~/tools/hadoop-3.3.6]$ ./sbin/start-yarn.sh 
Starting resourcemanagers on [ 172.253.81.117 172.253.62.172]
Starting nodemanagers

# 此时,可以看到172.253.62.172为主=active,172.253.81.117为standby
[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ curl -sS http://172.253.81.117:8088/ws/v1/cluster/info | python -m json.tool
{
"clusterInfo": {
"haState": "STANDBY",
"haZooKeeperConnectionState": "CONNECTED",
"hadoopBuildVersion": "3.3.6 from 1be78238728da9266a4f88195058f08fd012bf9c by ubuntu source checksum 5652179ad55f76cb287d9c633bb53bbd",
"hadoopVersion": "3.3.6",
"hadoopVersionBuiltOn": "2023-06-18T08:22Z",
"id": 1768354648295,
"resourceManagerBuildVersion": "3.3.6 from 1be78238728da9266a4f88195058f08fd012bf9c by ubuntu source checksum d42eb795a5eadb0febf5e44a7f87a9",
"resourceManagerVersion": "3.3.6",
"resourceManagerVersionBuiltOn": "2023-06-18T08:31Z",
"rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore",
"startedOn": 1768354648295,
"state": "STARTED"
 }
}
[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ curl -sS http://172.253.62.172:8088/ws/v1/cluster/info | python -m json.tool
{
"clusterInfo": {
"haState": "ACTIVE",
"haZooKeeperConnectionState": "CONNECTED",
"hadoopBuildVersion": "3.3.6 from 1be78238728da9266a4f88195058f08fd012bf9c by ubuntu source checksum 5652179ad55f76cb287d9c633bb53bbd",
"hadoopVersion": "3.3.6",
"hadoopVersionBuiltOn": "2023-06-18T08:22Z",
"id": 1768354652073,
"resourceManagerBuildVersion": "3.3.6 from 1be78238728da9266a4f88195058f08fd012bf9c by ubuntu source checksum d42eb795a5eadb0febf5e44a7f87a9",
"resourceManagerVersion": "3.3.6",
"resourceManagerVersionBuiltOn": "2023-06-18T08:31Z",
"rmStateStoreName": "org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore",
"startedOn": 1768354652073,
"state": "STARTED"
 }
}

5.7 启动JobHistory Server服务(JobHistoryServer)

# 在指定节点(hadoop3)执行
cd /data/hadoop/tools/hadoop-3.3.6
./bin/mapred --daemon start historyserver

5.8 验证集群状态

# 在各节点使用jps命令检查Java进程

# hadoop1节点应有:
73793 QuorumPeerMain
127552 NameNode
129776 DFSZKFailoverController
127141 JournalNode
10584 NodeManager
588 DataNode

# hadoop2节点应有:
26864 NameNode
28256 DFSZKFailoverController
25857 JournalNode
28932 ResourceManager
29396 WebAppProxyServer
29783 DataNode
110797 QuorumPeerMain
29084 NodeManager

# hadoop3节点应有:
128368 JobHistoryServer
47458 JournalNode
50882 NameNode
125907 NodeManager
52662 DataNode
107030 QuorumPeerMain
52041 DFSZKFailoverController
125790 ResourceManager

# hadoop4节点应有:
60214 DataNode
58360 NameNode
59739 DFSZKFailoverController
128991 NodeManager

# hadoop5节点应有:
128321 NodeManager
127860 DataNode

6. Web管理界面访问

6.1 HDFS Federation HA管理界面

6.2 YARN HA管理界面

6.3 JobHistory界面

7. 集群功能测试

7.1 基本HDFS操作测试

在节点hadoop1上测试hdfs。

[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs dfs -ls /
Found 3 items
-r-xr-xr-x   - hadoop hadoop          0 2026-01-09 15:34 /ns1
-r-xr-xr-x   - hadoop hadoop          0 2026-01-09 15:34 /ns2
-r-xr-xr-x   - hadoop hadoop          0 2026-01-09 15:34 /tmp

[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs dfs -ls /ns1
Found 2 items
drwxrwx---   - hadoop supergroup          0 2026-01-14 09:51 /ns1/tmp
drwx------   - hadoop supergroup          0 2026-01-14 09:55 /ns1/user

[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs dfs -ls /ns2
Found 1 items
drwx------   - hadoop supergroup          0 2026-01-14 09:55 /ns2/user

[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs dfs -put LICENSE.txt /ns1
[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs dfs -put README.txt /ns2

[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs dfs -ls /ns1
Found 3 items
-rw-r--r--   3 hadoop supergroup      15217 2026-01-14 11:08 /ns1/LICENSE.txt
drwxrwx---   - hadoop supergroup          0 2026-01-14 09:51 /ns1/tmp
drwx------   - hadoop supergroup          0 2026-01-14 09:55 /ns1/user

[hadoop@172.253.81.116 ~/tools/hadoop-3.3.6]$ ./bin/hdfs dfs -ls /ns2
Found 2 items
-rw-r--r--   3 hadoop supergroup        175 2026-01-14 11:08 /ns2/README.txt
drwx------   - hadoop supergroup          0 2026-01-14 09:55 /ns2/user

7.2 运行 MapReduce 示例作业


# 执行WordCount示例
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /ns1/LICENSE.txt /ns1/out1
./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /ns2/README.txt /ns2/out2

# 查看输出结果
./bin/hdfs dfs -cat /ns1/out1/part-r-00000
./bin/hdfs dfs -cat /ns2/out2/part-r-00000

# 查看作业日志
yarn logs -applicationId application_1768358074981_0003 -logFiles syslog



上一篇:AI Skills 五大安全隐患与实战防御:从LLM越权到沙箱隔离
下一篇:LangGraph实战:从Node/Edge核心概念到结构化输出,手搓会反思的邮件起草Agent
您需要登录后才可以回帖 登录 | 立即注册

手机版|小黑屋|网站地图|云栈社区 ( 苏ICP备2022046150号-2 )

GMT+8, 2026-6-7 21:25 , Processed in 0.625570 second(s), 41 queries , Gzip On.

Powered by Discuz! X3.5

© 2025-2026 云栈社区.

快速回复 返回顶部 返回列表