【摘要】之前一直使用Hadoop1.x的版本,计划升级到2.x,找了3台测试vm机器,搭建了一下,本以为非常简单,但由于Hadoop2.x版本与Hadoop1.x版本变化较大,还是费了一些周折,下面是一些详细的步骤。本例的步骤应该可以跑起来,Hadoop参数还需要一些具体的优化。
一、搭建:
下载需要的包:JDK,svn,cmake,ncurses,openssl,gcc,maven,protobuf
mkdir software cd software wget http://mirrors.cnnic.cn/apache/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz wget http://mirrors.cnnic.cn/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz wget https://github.com/twitter/hadoop-lzo/archive/master.zip -O lzo.tgz wget wget http://mirror.bit.edu.cn/apache//ant/binaries/apache-ant-1.9.4-bin.tar.gz wget "http://download.oracle.com/otn-pub/java/jdk/7u67-b01/jdk-7u67-linux-x64.tar.gz?AuthParam=1411887643_cf59aa6f30309ae6b7447b4621e645a1" wget --no-check-certificate "https://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz"
操作系统为:CentOS6.5-64bit
1. 安装需要的包:
yum install svn ncurses-devel autoconf automake libtool cmake openssl-devel gcc* telnet screen wget curl -y
2.安装maven,下载并解压maven:
wget http://mirrors.cnnic.cn/apache/maven/maven-3/3.2.3/binaries/apache-maven-3.2.3-bin.tar.gz tar zxvf apache-maven-3.2.3-bin.tar.gz mv apache-maven-3.2.3 /usr/local/
3. 安装ant,下载并解压ant:
wget http://mirror.bit.edu.cn/apache//ant/binaries/apache-ant-1.9.4-bin.tar.gz tar zvxf apache-ant-1.9.4-bin.tar.gz mv apache-ant-1.9.4 /usr/local/
4.安装protobuf
tar zxvf protobuf-2.5.0.tar.gz cd protobuf-2.5.0 .configure make && make install $protoc --version libprotoc 2.5.0
5.配置环境变量:
#cat /etc/profile PATH=/usr/java/jdk/bin:/home/hadoop/hadoop/sbin:/home/hadoop/hadoop/bin:/usr/local/maven/bin:/usr/local/ant/bin:$PATH JAVA_HOME=/usr/java/jdk HADOOP_INSTALL=/home/hadoop/hadoop HADOOP_CONF_DIR=/home/hadoop/hadoop/etc/hadoop CLASSPATH=:/usr/java/jdk/lib/:/usr/java/jdk/jre/lib export HADOOP_HOME=/home/hadoop/hadoop export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native" export JAVA_HOME CLASSPATH HADOOP_INSTALL HADOOP_CONF_DIR PATH
6. 源码编译Hadoop
官方提供的二进制下载文件是32位的,因为虚拟机是64位的,所以需要下载源码进行手动编译:
tar zxvf hadoop-2.2.0-src.tar.gz cd hadoop-2.2.0-src/ mvn package -Pdist,native -DskipTests -Dtar mvn编译时间很长,虚拟机里面编译了11分钟。 最后提示整个编译情况: [INFO] Apache Hadoop Mini-Cluster ......................... SUCCESS [ 0.057 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 11:02 min [INFO] Finished at: 2014-09-29T18:51:40+08:00 [INFO] Final Memory: 137M/335M [INFO] ------------------------------------------------------------------------
编译后的包:
cd hadoop-2.2.0-src/hadoop-dist/target ls -tlr drwxr-xr-x. 2 root root 4096 9月 29 18:51 antrun drwxr-xr-x. 2 root root 4096 9月 29 18:51 test-dir -rw-r--r--. 1 root root 1627 9月 29 18:51 dist-layout-stitching.sh drwxr-xr-x. 9 root root 4096 9月 29 18:51 hadoop-2.2.0 drwxr-xr-x. 2 root root 4096 9月 29 18:51 maven-archiver -rw-r--r--. 1 root root 2743 9月 29 18:51 hadoop-dist-2.2.0.jar -rw-r--r--. 1 root root 644 9月 29 18:51 dist-tar-stitching.sh -rw-r--r--. 1 root root 96166470 9月 29 18:51 hadoop-2.2.0.tar.gz drwxr-xr-x. 2 root root 4096 9月 29 18:51 javadoc-bundle-options -rw-r--r--. 1 root root 192884820 9月 29 18:51 hadoop-dist-2.2.0-javadoc.jar
需要的是hadoop-2.2.0.tar.gz,64bit的哦。
二、Hadoop配置
三个节点:
/etc/hosts 172.16.1.32 kvm27-v02.sudops.com 172.16.1.35 kvm26-v02.sudops.com 172.16.1.31 kvm28-v01.sudops.com
$HADOOP_HOME=/home/hadoop/hadoop
配置文件不在conf中了,新版本的目录是在: $HADOOP_HOME/etc/hadoop
hdfs-site.xml
mapred-site.xml
yarn-site.xml
core-site.xml
yarn-env.sh
hadoop-env.sh
cat core-site.xml
<configuration> <!-- fs config --> <property> <name>fs.defaultFS</name> <value>hdfs://172.16.1.35:9000</value> </property> <property> <name>hadoop.http.staticuser.user</name> <value>hadoop</value> </property> </configuration>
cat hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>/home/hadoop/hdfs/data/dfs.name.dir</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/hadoop/hdfs/data/dfs.data.dir</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.replication</name> <value>3</value> </property> </configuration>
cat mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>Yarn</value> </property> </configuration>
cat yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>172.16.1.35:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>172.16.1.35:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>172.16.1.35:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>172.16.1.35:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>172.16.1.35:8088</value> </property> </configuration>
三、Hadoop启动
hdfs namenode -format
启动hdfs
hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode
hadoop-daemon.sh start secondarynamenode
或者:start-dfs.sh
启动yarn
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
或者:start-yarn.sh
$ start-dfs.sh Starting namenodes on [kvm26-v02.sudops.com] kvm26-v02.sudops.com: starting namenode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-namenode-kvm26-v02.sudops.com.out 172.16.1.32: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-datanode-kvm27-v02.sudops.com.out 172.16.1.31: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-datanode-kvm28-v01.sudops.com.out 172.16.1.35: starting datanode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-datanode-kvm26-v02.sudops.com.out Starting secondary namenodes [kvm27-v02.sudops.com] kvm27-v02.sudops.com: starting secondarynamenode, logging to /home/hadoop/hadoop-2.2.0/logs/hadoop-hadoop-secondarynamenode-kvm27-v02.sudops.com.out $ start-yarn.sh starting yarn daemons starting resourcemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-hadoop-resourcemanager-kvm26-v02.sudops.com.out 172.16.1.32: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-kvm27-v02.sudops.com.out 172.16.1.31: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-kvm28-v01.sudops.com.out 172.16.1.35: starting nodemanager, logging to /home/hadoop/hadoop-2.2.0/logs/yarn-hadoop-nodemanager-kvm26-v02.sudops.com.out
$ hdfs dfsadmin -report Configured Capacity: 1032277524480 (961.38 GB) Present Capacity: 978602246144 (911.39 GB) DFS Remaining: 978602172416 (911.39 GB) DFS Used: 73728 (72 KB) DFS Used%: 0.00% Under replicated blocks: 0 Blocks with corrupt replicas: 0 Missing blocks: 0 ------------------------------------------------- Datanodes available: 3 (3 total, 0 dead) Live datanodes: Name: 172.16.1.32:50010 (kvm27-v02.sudops.com) Hostname: kvm27-v02.sudops.com Decommission Status : Normal Configured Capacity: 344092508160 (320.46 GB) DFS Used: 24576 (24 KB) Non DFS Used: 17817788416 (16.59 GB) DFS Remaining: 326274695168 (303.87 GB) DFS Used%: 0.00% DFS Remaining%: 94.82% Last contact: Tue Sep 30 10:19:35 CST 2014 Name: 172.16.1.31:50010 (kvm28-v01.sudops.com) Hostname: kvm28-v01.sudops.com Decommission Status : Normal Configured Capacity: 344092508160 (320.46 GB) DFS Used: 24576 (24 KB) Non DFS Used: 17817526272 (16.59 GB) DFS Remaining: 326274957312 (303.87 GB) DFS Used%: 0.00% DFS Remaining%: 94.82% Last contact: Tue Sep 30 10:19:35 CST 2014 Name: 172.16.1.35:50010 (kvm26-v02.sudops.com) Hostname: kvm26-v02.sudops.com Decommission Status : Normal Configured Capacity: 344092508160 (320.46 GB) DFS Used: 24576 (24 KB) Non DFS Used: 18039963648 (16.80 GB) DFS Remaining: 326052519936 (303.66 GB) DFS Used%: 0.00% DFS Remaining%: 94.76% Last contact: Tue Sep 30 10:19:35 CST 2014
参考文档
https://hadoop.apache.org/docs/r2.2.0/
测试
cd $HADOOP_HOME $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.2.0.jar pi 20 10 Number of Maps = 20 Samples per Map = 10 Wrote input for Map #0 Wrote input for Map #1 Wrote input for Map #2 Wrote input for Map #3 Wrote input for Map #4 Wrote input for Map #5 Wrote input for Map #6 Wrote input for Map #7 Wrote input for Map #8 Wrote input for Map #9 Wrote input for Map #10 Wrote input for Map #11 Wrote input for Map #12 Wrote input for Map #13 Wrote input for Map #14 Wrote input for Map #15 Wrote input for Map #16 Wrote input for Map #17 Wrote input for Map #18 Wrote input for Map #19 Starting Job 14/09/30 11:45:40 INFO client.RMProxy: Connecting to ResourceManager at /172.16.1.35:8032 14/09/30 11:45:41 INFO input.FileInputFormat: Total input paths to process : 20 14/09/30 11:45:41 INFO mapreduce.JobSubmitter: number of splits:20 14/09/30 11:45:41 INFO Configuration.deprecation: user.name is deprecated. Instead, use mapreduce.job.user.name 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.reduce.tasks is deprecated. Instead, use mapreduce.job.reduces 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.value.class 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 14/09/30 11:45:41 INFO Configuration.deprecation: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.job.name is deprecated. Instead, use mapreduce.job.name 14/09/30 11:45:41 INFO Configuration.deprecation: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class 14/09/30 11:45:41 INFO Configuration.deprecation: mapreduce.inputformat.class is deprecated. Instead, use mapreduce.job.inputformat.class 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 14/09/30 11:45:41 INFO Configuration.deprecation: mapreduce.outputformat.class is deprecated. Instead, use mapreduce.job.outputformat.class 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class 14/09/30 11:45:41 INFO Configuration.deprecation: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir 14/09/30 11:45:41 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1412048707137_0001 14/09/30 11:45:41 INFO impl.YarnClientImpl: Submitted application application_1412048707137_0001 to ResourceManager at /172.16.1.35:8032 14/09/30 11:45:41 INFO mapreduce.Job: The url to track the job: http://kvm26-v02.sudops.com:8088/proxy/application_1412048707137_0001/ 14/09/30 11:45:41 INFO mapreduce.Job: Running job: job_1412048707137_0001 14/09/30 11:45:48 INFO mapreduce.Job: Job job_1412048707137_0001 running in uber mode : false 14/09/30 11:45:48 INFO mapreduce.Job: map 0% reduce 0% 14/09/30 11:46:07 INFO mapreduce.Job: map 30% reduce 0% 14/09/30 11:46:09 INFO mapreduce.Job: map 60% reduce 0% 14/09/30 11:46:15 INFO mapreduce.Job: map 100% reduce 0% 14/09/30 11:46:16 INFO mapreduce.Job: map 100% reduce 100% 14/09/30 11:46:16 INFO mapreduce.Job: Job job_1412048707137_0001 completed successfully 14/09/30 11:46:16 INFO mapreduce.Job: Counters: 43 File System Counters FILE: Number of bytes read=446 FILE: Number of bytes written=1691681 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=5370 HDFS: Number of bytes written=215 HDFS: Number of read operations=83 HDFS: Number of large read operations=0 HDFS: Number of write operations=3 Job Counters Launched map tasks=20 Launched reduce tasks=1 Data-local map tasks=20 Total time spent by all maps in occupied slots (ms)=411605 Total time spent by all reduces in occupied slots (ms)=7356 Map-Reduce Framework Map input records=20 Map output records=40 Map output bytes=360 Map output materialized bytes=560 Input split bytes=3010 Combine input records=0 Combine output records=0 Reduce input groups=2 Reduce shuffle bytes=560 Reduce input records=40 Reduce output records=0 Spilled Records=80 Shuffled Maps =20 Failed Shuffles=0 Merged Map outputs=20 GC time elapsed (ms)=3893 CPU time spent (ms)=5160 Physical memory (bytes) snapshot=4045692928 Virtual memory (bytes) snapshot=17784217600 Total committed heap usage (bytes)=2726051840 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=2360 File Output Format Counters Bytes Written=97 Job Finished in 35.516 seconds Estimated value of Pi is 3.12000000000000000000
WEB UI显示:
http://172.16.1.35:50070/dfshealth.jsp
http://172.16.1.35:8088/cluster
三个节点的jps状态:
$ jps 1866 ResourceManager 1972 NodeManager 31890 DataNode 3036 Jps 31769 NameNode $ jps 11267 NodeManager 10642 SecondaryNameNode 12202 Jps 10526 DataNode $ pjs 8344 NodeManager 9264 Jps 7599 DataNode
这样基本的MapReduce也可以运行了,搭建基本完成。