【摘要】之前一直使用Hadoop1.x的版本,计划升级到2.x,找了3台测试vm机器,搭建了一下,本以为非常简单,但由于Hadoop2.x版本与Hadoop1.x版本变化较大,还是费了一些周折,下面是一些详细的步骤。本例的步骤应该可以跑起来,Hadoop参数还需要一些具体的优化。
一、搭建:
下载需要的包:JDK,svn,cmake,ncurses,openssl,gcc,maven,protobuf
mkdir software cd software wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz wget http://mirrors.cnnic.cn/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz wget https://github.com/twitter/hadoop-lzo/archive/master.zip -O lzo.tgz wget wget http://mirror.bit.edu.cn/apache//ant/binaries/apache-ant-1.9.4-bin.tar.gz wget "http://download.oracle.com/otn-pub/java/jdk/7u67-b01/jdk-7u67-linux-x64.tar.gz?AuthParam=1411887643_cf59aa6f30309ae6b7447b4621e645a1" wget --no-check-certificate "https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz"
操作系统为:CentOS6.5-64bit
1. 安装需要的包:
yum install svn ncurses-devel autoconf automake libtool cmake openssl-devel gcc* telnet screen wget curl -y
2.安装maven,下载并解压maven:
wget http://mirrors.cnnic.cn/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz tar zxvf apache-maven-3.2.3-bin.tar.gz mv apache-maven-3.2.3 /usr/local/
3. 安装ant,下载并解压ant:
wget http://mirror.bit.edu.cn/apache//ant/binaries/apache-ant-1.9.4-bin.tar.gz tar zvxf apache-ant-1.9.4-bin.tar.gz mv apache-ant-1.9.4 /usr/local/
4.安装protobuf
tar zxvf protobuf-2.5.0.tar.gz cd protobuf-2.5.0 .configure make && make install $protoc --version libprotoc 2.5.0
5.配置环境变量:
#cat /etc/profile PATH=/usr/java/jdk/bin:/home/hadoop/hadoop/sbin:/home/hadoop/hadoop/bin:/usr/local/maven/bin:/usr/local/ant/bin:$PATH JAVA_HOME=/usr/java/jdk HADOOP_INSTALL=/home/hadoop/hadoop HADOOP_CONF_DIR=/home/hadoop/hadoop/etc/hadoop CLASSPATH=:/usr/java/jdk/lib/:/usr/java/jdk/jre/lib export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native" export JAVA_HOME CLASSPATH HADOOP_INSTALL HADOOP_CONF_DIR PATH
6. 源码编译Hadoop
官方提供的二进制下载文件是32位的,因为虚拟机是64位的,所以需要下载源码进行手动编译:
tar zxvf hadoop-2.2.0-src.tar.gz cd hadoop-2.2.0-src/ mvn package -Pdist,native -DskipTests -Dtar mvn编译时间很长,虚拟机里面编译了11分钟。 最后提示整个编译情况: [INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 0.057 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 11:02 min [INFO] Finished at: 2014-09-29T18:51:40+08:00 [INFO] Final Memory: 137M/335M [INFO] ------------------------------------------------------------------------
编译后的包:
cd hadoop-2.2.0-src/hadoop-dist/target ls -tlr drwxr-xr-x. 2 root root 4096 9月 29 18:51 antrun drwxr-xr-x. 2 root root 4096 9月 29 18:51 test-dir -rw-r--r--. 1 root root 1627 9月 29 18:51 dist-layout-stitching.sh drwxr-xr-x. 9 root root 4096 9月 29 18:51 hadoop-2.2.0 drwxr-xr-x. 2 root root 4096 9月 29 18:51 maven-archiver -rw-r--r--. 1 root root 2743 9月 29 18:51 hadoop-dist-2.2.0.jar -rw-r--r--. 1 root root 644 9月 29 18:51 dist-tar-stitching.sh -rw-r--r--. 1 root root 96166470 9月 29 18:51 hadoop-2.2.0.tar.gz drwxr-xr-x. 2 root root 4096 9月 29 18:51 javadoc-bundle-options -rw-r--r--. 1 root root 192884820 9月 29 18:51 hadoop-dist-2.2.0-javadoc.jar
需要的是hadoop-2.2.0.tar.gz,64bit的哦。
二、Hadoop配置
三个节点:
/etc/hosts 172.16.1.32 kvm27-v02.sudops.com 172.16.1.35 kvm26-v02.sudops.com 172.16.1.31 kvm28-v01.sudops.com
$HADOOP_HOME=/home/hadoop/hadoop
配置文件不在conf中了,新版本的目录是在: $HADOOP_HOME/etc/hadoop
hdfs-site.xml
mapred-site.xml
yarn-site.xml
core-site.xml
yarn-env.sh
hadoop-env.sh
cat core-site.xml
<configuration>
<!-- fs config -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://172.16.1.35:9000</value>
</property>
<property>
<name>hadoop.http.staticuser.user</name>
<value>hadoop</value>
</property>
</configuration>
cat hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hdfs/data/dfs.name.dir</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hdfs/data/dfs.data.dir</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
cat mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>Yarn</value> </property> </configuration>
cat yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>172.16.1.35:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>172.16.1.35:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>172.16.1.35:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>172.16.1.35:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>172.16.1.35:8088</value> </property> </configuration>
三、Hadoop启动
hdfs namenode -format
启动hdfs
hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
hadoop-daemon.sh start secondarynamenode
或者:start-dfs.sh
启动yarn
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
或者:start-yarn.sh
$ start-dfs.sh Starting namenodes on [kvm26-v02.sudops.com] kvm26-v02.sudops.com: starting namenode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-namenode-kvm26-v02.sudops.com.out 172.16.1.32: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-datanode-kvm27-v02.sudops.com.out 172.16.1.31: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-datanode-kvm28-v01.sudops.com.out 172.16.1.35: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-datanode-kvm26-v02.sudops.com.out Starting secondary namenodes [kvm27-v02.sudops.com] kvm27-v02.sudops.com: starting secondarynamenode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-secondarynamenode-kvm27-v02.sudops.com.out $ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-hadoop-resourcemanager-kvm26-v02.sudops.com.out 172.16.1.32: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-kvm27-v02.sudops.com.out 172.16.1.31: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-kvm28-v01.sudops.com.out 172.16.1.35: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-kvm26-v02.sudops.com.out
$ hdfs dfsadmin -report Configured Capacity: 1032277524480 (961.38 GB) Present Capacity: 978602246144 (911.39 GB) DFS Remaining: 978602172416 (911.39 GB) DFS Used: 73728 (72 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 3 (3 total, 0 dead) Live datanodes: Name: 172.16.1.32:50010 (kvm27-v02.sudops.com) Hostname: kvm27-v02.sudops.com Decommission Status : Normal Configured Capacity: 344092508160 (320.46 GB) DFS Used: 24576 (24 KB) Non DFS Used: 17817788416 (16.59 GB) DFS Remaining: 326274695168 (303.87 GB) DFS Used%: 0.00% DFS Remaining%: 94.82% Last contact: Tue Sep 30 10:19:35 CST 2014 Name: 172.16.1.31:50010 (kvm28-v01.sudops.com) Hostname: kvm28-v01.sudops.com Decommission Status : Normal Configured Capacity: 344092508160 (320.46 GB) DFS Used: 24576 (24 KB) Non DFS Used: 17817526272 (16.59 GB) DFS Remaining: 326274957312 (303.87 GB) DFS Used%: 0.00% DFS Remaining%: 94.82% Last contact: Tue Sep 30 10:19:35 CST 2014 Name: 172.16.1.35:50010 (kvm26-v02.sudops.com) Hostname: kvm26-v02.sudops.com Decommission Status : Normal Configured Capacity: 344092508160 (320.46 GB) DFS Used: 24576 (24 KB) Non DFS Used: 18039963648 (16.80 GB) DFS Remaining: 326052519936 (303.66 GB) DFS Used%: 0.00% DFS Remaining%: 94.76% Last contact: Tue Sep 30 10:19:35 CST 2014
参考文档
https://hadoop.apache.org/docs/r2.2.0/
测试
cd $HADOOP_HOME
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 20 10
Number of Maps = 20
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Wrote input for Map #10
Wrote input for Map #11
Wrote input for Map #12
Wrote input for Map #13
Wrote input for Map #14
Wrote input for Map #15
Wrote input for Map #16
Wrote input for Map #17
Wrote input for Map #18
Wrote input for Map #19
Starting Job
14/09/30 11:45:40 INFO client.RMProxy: Connecting to ResourceManager at /172.16.1.35:8032
14/09/30 11:45:41 INFO input.FileInputFormat: Total input paths to process : 20
14/09/30 11:45:41 INFO mapreduce.JobSubmitter: number of splits:20
14/09/30 11:45:41 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
14/09/30 11:45:41 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name
14/09/30 11:45:41 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class
14/09/30 11:45:41 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir
14/09/30 11:45:41 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class
14/09/30 11:45:41 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir
14/09/30 11:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1412048707137_0001
14/09/30 11:45:41 INFO impl.YarnClientImpl: Submitted application application_1412048707137_0001 to ResourceManager at /172.16.1.35:8032
14/09/30 11:45:41 INFO mapreduce.Job: The url to track the job: http://kvm26-v02.sudops.com:8088/proxy/application_1412048707137_0001/
14/09/30 11:45:41 INFO mapreduce.Job: Running job: job_1412048707137_0001
14/09/30 11:45:48 INFO mapreduce.Job: Job job_1412048707137_0001 running in uber mode : false
14/09/30 11:45:48 INFO mapreduce.Job: map 0% reduce 0%
14/09/30 11:46:07 INFO mapreduce.Job: map 30% reduce 0%
14/09/30 11:46:09 INFO mapreduce.Job: map 60% reduce 0%
14/09/30 11:46:15 INFO mapreduce.Job: map 100% reduce 0%
14/09/30 11:46:16 INFO mapreduce.Job: map 100% reduce 100%
14/09/30 11:46:16 INFO mapreduce.Job: Job job_1412048707137_0001 completed successfully
14/09/30 11:46:16 INFO mapreduce.Job: Counters: 43
File System Counters
FILE: Number of bytes read=446
FILE: Number of bytes written=1691681
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=5370
HDFS: Number of bytes written=215
HDFS: Number of read operations=83
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=20
Launched reduce tasks=1
Data-local map tasks=20
Total time spent by all maps in occupied slots (ms)=411605
Total time spent by all reduces in occupied slots (ms)=7356
Map-Reduce Framework
Map input records=20
Map output records=40
Map output bytes=360
Map output materialized bytes=560
Input split bytes=3010
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=560
Reduce input records=40
Reduce output records=0
Spilled Records=80
Shuffled Maps =20
Failed Shuffles=0
Merged Map outputs=20
GC time elapsed (ms)=3893
CPU time spent (ms)=5160
Physical memory (bytes) snapshot=4045692928
Virtual memory (bytes) snapshot=17784217600
Total committed heap usage (bytes)=2726051840
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=2360
File Output Format Counters
Bytes Written=97
Job Finished in 35.516 seconds
Estimated value of Pi is 3.12000000000000000000
WEB UI显示:
http://172.16.1.35:50070/dfshealth.jsp
http://172.16.1.35:8088/cluster
三个节点的jps状态:
$ jps 1866 ResourceManager 1972 NodeManager 31890 DataNode 3036 Jps 31769 NameNode $ jps 11267 NodeManager 10642 SecondaryNameNode 12202 Jps 10526 DataNode $ pjs 8344 NodeManager 9264 Jps 7599 DataNode
这样基本的MapReduce也可以运行了,搭建基本完成。

