Big Data And The Technology
Big Data Architect’s Handbook Published by Packt Publishing, 2018
- Hadoop configuration
Configuration files come with the Hadoop package and are configured by default to run as a Single Node, as a single Java Process. There is also an option to run each Hadoop daemon as a separate Java process, which is also known as a pseudo-distributed operation.
To set up pseudo-distributed mode, the following configurations are required.
All configuration files are located in etc/hadoop/ inside the extracted Hadoop package folder. In our case, the complete path is /home/hadoopadmin/hadoop-2.8.1/etc/hadoop/
- Edit the core-site.xml configuration file and copy the following code in between the <configuration>..</configuration> tabs:
<property> <name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value></property>
- Edit the hdfs-site.xml file and copy the following in between the <configuration>..</configuration> tabs. You can change the value from 1 to any number; it depends how many copies of the data you want to replicate:
<property> <name>dfs.replication</name>
<value>1</value></property>
2. Hadoop Path Configuration:
We will now configure the path for the Hadoop bin directory. As a result of this, you will be able to execute Hadoop commands globally from any path.
$ sudo gedit ~/.bashrc
Add the following code at the end of the file:
# Hadoop Environment
export HADOOP_HOME=”/home/hadoopadmin/hadoop-2.8.1/”
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
3. Start Hadoop cluster
3.a. Format NameNode
format NameNode for the first time before we do so. NameNode contains the metadata of the directory tree of all the files in DataNodes. The purpose of formatting NameNode is to utilize all the space available in DataNodes even if any unutilized data is stored
$ bin/hdfs namenode -format
3.b. To start your Hadoop server, execute the following command:
$ sbin/start-dfs.sh
$ sbin/start-yarn.sh
3.c. Browse Hadoop Server
3.d. Check running services:
$ jps
3.e. To stop the Hadoop server, execute the following command:
$ sbin/stop-dfs.sh: stop namenode, datanode
$ sbin/stop-yarn.sh: stop yarn deamons, resourcemanager, nodemanager