Sunday, March 22, 2015

Hadoop - How to Install Hadoop-1.2.1

@How to Install Hadoop@

@ Makes the hadoop user (in root).
@ If you don't have a hadoop group, commend like the following this.
$ /usr/sbin/groupadd hadoop
$ /usr/sbin/useradd -d /home/hadoop -m hadoop -g hadoop

You don't need to set the Hadoop's password
@ But you have to set Nonpass between the host server and node server on Linux.

@ Add hosts
$ vim /etc/hosts
192.168.11.23   server01 # (nameNode)
192.168.11.24   server02 # (secondaryNameNode01, dataNode01)
192.168.11.25   server03 # (dataNode02)
192.168.11.26   server04 # (dataNode03)

@Make a public key (in hadoop)
$ ssh-keygen -t rsa

@ Contribute the public key to master and slave server on Hadoop's user
@ Copy a public key in id_rsa.pub into authorized_keys.
$ vim /home/hadoop/.ssh/authorized_keys
@ Changes the permission
$ chmod 644 /home/hadoop/.ssh/authorized_keys

@ Make the name directory (in hadoop, server is nameNode)
$ mkdir -p /home/hadoop/data/name

@ Make the secondary name directory (in hadoop, server is secondaryNameNode01)
$ mkdir -p /home/hadoop/data/checkpoint
$ chown -R hadoop.hadoop /home/hadoop/data/checkpoint

@ Make the name directory (in hadoop, server is dataNode01~03)
$ mkdir -p /home/hadoop/data01 
$ mkdir -p /home/hadoop/data02
$ chown -R hadoop.hadoop /home/hadoop/data01
$ chown -R hadoop.hadoop /home/hadoop/data02

@@ Install Haddop @@

@ There is a hadoop
@ http://ftp.kddilabs.jp/infosystems/apache/hadoop/common/
$ cd /usr/local/src/
@ user is root
$ wget http://ftp.kddilabs.jp/infosystems/apache/hadoop/common/hadoop-1.2.1/hadoop-1.2.1.tar.gz
@ Install Hadoop (master and slave both)
$ tar xvf ./hadoop-1.2.1.tar.gz
$ mv ./hadoop-1.2.1 ./hadoop
$ chown -R hadoop.hadoop /usr/local/hadoop

@Configure the Hadoop(master)

## conf/hadoop-env.sh ##
#=================================
# Not to show the warning deprecated
HADOOP_HOME_WARN_SUPPRESS=TRUE
export HADOOP_HOME_WARN_SUPPRESS
# The java implementation to use.  Required.
export JAVA_HOME=/usr/local/java
#================================

@## core-site.xml ##@
@## hdfs-site.xml ##@


@ Set master server
(This is secondary namenode)
$ vi master
@ Add below this
## conf/master ##
#=================================
server02

@Set slave server
$ vi slave
@ Add below this
server02
server03
server04
## conf/slave ##
#=================================
server02
server03
server04

@ Set the hadoop home directory path
$ vi /etc/profile
@ Add below this at end of file. (master, slave both)
# /etc/profile
# =====================================
export HADOOP_HOME=/usr/local/hadoop
# =====================================
@Apply the value
$ source /etc/profile

@ Install slaves as the master.


@ Make the directory as DataNode (on the nameNode server)
@ You may need to do connection first for SSH
$ ./bin/slaves.sh mkdir -p /home/hadoop/data01/hdfs
$ ./bin/slaves.sh mkdir -p /home/hadoop/data02/hdfs

@ Format the Namenode (on the master)
$ ./bin/hadoop namenode -format
[hadoop@centos04 hadoop]$ ./bin/hadoop namenode -format
15/03/22 01:41:36 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = server01/192.168.11.23
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 1.2.1
STARTUP_MSG: build = https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.2 -r 1503152;
compiled by 'mattf' on Mon Jul 22 15:23:09 PDT 2013
STARTUP_MSG: java = 1.7.0_75
************************************************************/
Re-format filesystem in /home/hadoop/data/name ? (Y or N) Y
@ If this error occur, you'd better check hosts on file.(Delete 127.0.0.1 server01)
#org.apache.hadoop.ipc.RPC: Server at server01/127.0.0.1:9000 not available yet, Zzzzz...


@ Add the follow information to open ports and access each other in server.
$ vim /etc/sysconfig/iptables
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50070 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50090 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50100 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 50105 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9000 -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 9001 -j ACCEPT

@Start the hadoop
$ ./bin/start-dfs.sh

@ If the following error occur, Just Iptables off lisk this ($ chkconfig iptables off)
error: java.io.IOException: File /home/hadoop/data/mapred/system/jobtracker.info could only be replicated to 0 nodes, instead of 1
@ If you want to do reformat ($ ./bin/hadoop namenode -format), you'd better do after delete all of old directorys(name, data, data01, data02)


@Stop the hadoop
$ ./bin/stop-dfs.sh

@You can see the hadoop on Firefox
http://192.168.11.23:50070

@You can see the hadoop on Console
$ ./bin/hadoop dfsadmin -report


@@@Truble Shouting@@@

@ If a dataNode doesn't run, Check It's permission
$ ./bin/slaves.sh chmod 755 /home/hadoop/data01/hdfs
$ ./bin/slaves.sh chmod 755 /home/hadoop/data02/hdfs 
2013-05-17 13:42:37,457 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /home/hadoop/data01/hdfs, expected: rwxr-xr-x, while actual: rwxrwxr-x
2013-05-17 13:42:37,462 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in dfs.data.dir: Incorrect permission for /home/hadoop/data02/hdfs, expected: rwxr-xr-x, while actual: rwxrwxr-x
2013-05-17 13:42:37,462 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in dfs.data.dir are invalid.

@In Addition, This [mapred-site.xml]
#This [/.../mapred/system]directory on Hadoop.
#This [/.../mapred/local]directory on Local System.
$ ./bin/slaves.sh chmod 755 /home/hadoop/data
#==========================================================#

#==========================================================#

I refer to this post 
http://blog.beany.co.kr/archives/412
http://blog.beany.co.kr/archives/1373
http://www.slideshare.net/TaeYoungLee1/20141029-25-hive
Thank you


No comments:

Post a Comment