Skip to content

Big Data And The Technology

December 1, 2016

Big Data Architect’s Handbook by Syed Muhammad Fahad AkhtarPublished by Packt Publishing, 2018

  1. Hadoop configuration

Configuration files come with the Hadoop package and are configured by default to run as a Single Node, as a single Java Process. There is also an option to run each Hadoop daemon as a separate Java process, which is also known as a pseudo-distributed operation.

To set up pseudo-distributed mode, the following configurations are required.

All configuration files are located in etc/hadoop/ inside the extracted Hadoop package folder. In our case, the complete path is /home/hadoopadmin/hadoop-2.8.1/etc/hadoop/

  • Edit the core-site.xml configuration file and copy the following code in between the <configuration>..</configuration> tabs:

<property> <name>fs.defaultFS</name>

<value>hdfs://localhost:9000</value></property>

  • Edit the hdfs-site.xml file and copy the following in between the <configuration>..</configuration> tabs. You can change the value from 1 to any number; it depends how many copies of the data you want to replicate:

<property> <name>dfs.replication</name>

<value>1</value></property>

2. Hadoop Path Configuration:

We will now configure the path for the Hadoop bin directory. As a result of this, you will be able to execute Hadoop commands globally from any path.

$ sudo gedit ~/.bashrc

Add the following code at the end of the file:

# Hadoop Environment

export HADOOP_HOME=”/home/hadoopadmin/hadoop-2.8.1/”

export PATH=$PATH:$HADOOP_HOME/bin

export PATH=$PATH:$HADOOP_HOME/sbin

3. Start Hadoop cluster

3.a. Format NameNode

format NameNode for the first time before we do so. NameNode contains the metadata of the directory tree of all the files in DataNodes. The purpose of formatting NameNode is to utilize all the space available in DataNodes even if any unutilized data is stored

$ bin/hdfs namenode -format

3.b. To start your Hadoop server, execute the following command:

$ sbin/start-dfs.sh

$ sbin/start-yarn.sh

3.c. Browse Hadoop Server

http://localhost:50070/

3.d. Check running services:

$ jps

3.e. To stop the Hadoop server, execute the following command:

$ sbin/stop-dfs.sh: stop namenode, datanode

$ sbin/stop-yarn.sh: stop yarn deamons, resourcemanager, nodemanager

 

From → Data Science

Leave a Comment

Leave a comment