Hadoop 3.2.2 记录

安装前

  1. 务必新建用户及用户组安装,千万不要用 root 用户,否则启动时会遇到非常多权限上的坑
  2. 环境变量最好设置在 /etc/profile 中,用户目录下的 .bashrc 保持干净
  3. 设置好免密登陆 SSH
  4. 关闭系统防火墙
  5. 安装成功启动前要执行 hdfs namenode -format 格式化
  6. 启动失败多看日志,并在下一次启动前删除 hdfs 使用 data/tmp 等文件夹

安装步骤

  1. 下载指定版本的 Hadoop 版本
  2. 解压至自定义安装目录
  3. 进入 etc/hadoop 进行相关 xml 配置
  4. 把配置好的 Hadoop 分发到 Worker 节点
  5. 启动 HDFS
  6. 启动 YARN

相关配置

/etc/profile

#set java environment
export JAVA_HOME=/home/hadoop/opt/amazon-corretto-8.282.08.1-linux-x64
export JRE_HOME=/home/hadoop/opt/amazon-corretto-8.282.08.1-linux-x64/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH

#set hadoop environment
export HADOOP_HOME=/home/hadoop/opt/hadoop-3.2.2
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"

#set maven
export MAVEN_HOME=/home/hadoop/opt/apache-maven-3.6.3
export PATH=$MAVEN_HOME/bin:$PATH

#for read hdfs you must set
#https://github.com/tensorflow/examples/blob/master/community/en/docs/deploy/hadoop.md
export HADOOP_HDFS_HOME=/home/hadoop/opt/hadoop-3.2.2
source ${HADOOP_HOME}/libexec/hadoop-config.sh
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${JAVA_HOME}/jre/lib/amd64/server
export CLASSPATH=$(${HADOOP_HOME}/bin/hadoop classpath --glob)

core-site.xml 配置

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master-12ecb6f03:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/opt/hadoop-3.2.2/storage/tmp</value>
</property>
</configuration>

hadoop-env.sh 找到 JAVA_HOME 所在行并修改

export JAVA_HOME=/home/hadoop/opt/amazon-corretto-8.282.08.1-linux-x64

hdfs-core.xml 配置

<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/opt/hadoop-3.2.2/storage/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/opt/hadoop-3.2.2/storage/data</value>
</property>
</configuration>

yarn-site.xml 配置

<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master-12ecb6f03</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

workers 配置

master-12ecb6f03
worker-19b369d40-1
worker-19b369d40-2
worker-19b369d40-3

mapred-site.xml 配置

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value>
</property>
<property><name>mapreduce.reduce.env</name><value>HADOOP_MAPRED_HOME=$HADOOP_HOME</value></property>
</configuration>

测试

提交 mapreduce 任务

hadoop jar ${HADOOP_HOME}/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.0.jar wordcount /input /output