################################### CS435 Cluster Installation Tutorial 2 ################################### 1. Copy the VM from the previous tutorial in to a new VM. You may call this file master. #We will make changes to this VM so that it becomes part of a cluster consisting of 1 master node, and 2 slave nodes. ######################################################################################################### 2. Rename the machine. Use the gedit to do the following sudo gedit /etc/hostname #This should open the file consisting of the name of your machine. Type the machine name and save the file. #You may logout and login to have the changes get in effect. ######################################################################################################### 3. Re-Configure hadoop #configure hadoop so it knows what hosts are workers gedit /usr/local/hadoop/etc/hadoop/workers #add the following to this file hadoop1 hadoop2 hadoop3 #save the file. <---------------------------<---------------------------<---------------------------> #we have already configured hadoop xml files. You need to edit the core-site.xml file. Change the tag with value https://hadoop1:9000 <---------------------------<---------------------------<---------------------------> Save the file. <---------------------------<---------------------------<---------------------------> #make sure tmp file directories are clean. Run the following on terminal cd /usr/local/hadoop_tmp rm * -R mkdir n mkdir d ls -al chmod 755 n chmod 755 d #This will clean the tmp folders so we can work on the cluster. ######################################################################################################### 4. Lets prepate the network package and tools. Use the apt install to install the net-tools package. sudo apt install net-tools #check the ip address of the host ifconfig #watch out for your ethernet controller item. usually it is eth0 or ens33 or similar #we will setup our slave nodes with these ip addresses #192.168.5.131 hadoop1 which is the master #192.168.5.132 hadoop2 which is a slave #192.168.5.133 hadoop3 which is a slave #WARNING# The above is an Example only. You need to check the IP addresses of your network. ######################################################################################################### ########################################## I M P O R T A N T ########################################## ######################################################################################################### 5. Setting up static IP address of your host sudo ifconfig ens33 192.168.5.131 #WARNING# The above is an Example only. You need to check the IP addresses of your network. <---------------------------<---------------------------<---------------------------> #we can edit the interface file so the changes become permanent sudo gedit /etc/host/interfaces #type the following auto lo iface lo inet loopback auto ens33 iface ens33 inet static address 192.168.5.131 netmask 255.255.255.0 <---------------------------<---------------------------<---------------------------> # we will now reset the hosts file sudo gedit /etc/hosts #type in the following to overwrite the existing info 192.168.5.131 hadoop1 192.168.5.132 hadoop2 192.168.5.133 hadoop3 <---------------------------<---------------------------<---------------------------> #WARNING# The above is an Example only. You need to check the IP addresses of your network. ######################################################################################################### 6. reboot the machine so the changes take effect sudo reboot now #ssh to the machine once to make passwordless ssh ssh hadoop1 ######################################################################################################### 7. Clean the temporary datanode directory and Reboot cd /usr/local/hadoop_tmp rm -rf d mkdir d #Now shutdown your VM. This VM is a Ubuntu host that serves as a Node in the hadoop cluster. #In your host machine, make 3 copies of this Node/VM. Change the name of each of these VMs appropriately. ######################################################################################################### ######################################################################################################### ######################################################################################################### 8. The following are instructions to prepare the worker node. Repeat the same instructions for hadoop2, hadoop3 and so on. Start the VM. Login as before, and make the following changes: #change the machine hostname to hadoop2, where 2 is the slave number sudo nano /etc/hostname #The Vi editor opens. Change the name to hadoop2. Use Ctrl SX to save and quit Ctrl SX # check the name of your machine hostname #It should show hadoop2 ################################### 9. For this VM, we will change the network settings: sudo ifconfig ens33 192.168.5.132 #note, we changed the IP address to 192.168.5.132 ################################### 10. Test if you can ssh to this machine ssh hadoop2 #check the IP address ipconfig #The IP should be 192.168.5.132 ######################################################################################################### ########## Repeat steps 8-9-10 for VMs with hostname hadoop3 and so on ################## ######################################################################################################### 11. We assume that all 4 of the VMs are running on your host machine. We will now enter hadoop1 which serves as master. We will connect to other machines using ssh. ssh hadoop2 #This allows you to connect to hadoop2. To go back -> exit #Test this for all machines hadoop1, 2, 3 and 4. ######################################################################################################### 12. startup the cluster #go the hadoop1 master machine. format namenode hdfs namenode -format #make sure there are no errors. If all is well, start the cluster #start hdfs start-all.sh #Once the prompt becomes available do: jps # You will see a list with NameNode, SecondaryNameNode, DataNode, ResourceManager, NodeManager on hadoop1 ( node-master) # Switch to any other worker VM; jps will give a list of a DataNode and a NodeManager on each of hadoop2, hadoop3 and hadoop4. ######################################################################################################### 13. You can see the webUI here: #for hdfs open browser and type http://hadoop1:9870/ #for yarn http://hadoop1:8088/ ################################### 14. You are familiar with the pi program run from the first tutorial, he we run the MapReduce wordcount program #Lets make some folders and files in hdfs hdfs dfs -mkdir books hdfs dfs -ls -R / #This will make directory book and show all files therein <---------------------------<---------------------------<---------------------------> #download books from projectgutenberg website #assuming that you downloaded these files:alice.txt holmes.txt frankenstein.txt hdfs dfs -put alice.txt holmes.txt frankenstein.txt books #this will copy the 3 files to the dfs in books folder #lets see the directory hdfs dfs -ls books <---------------------------<---------------------------<---------------------------> #run the wordcount program. this will read all the files in the dfs books/ folder and write response to output folder hadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount "books/*" output <---------------------------<---------------------------<---------------------------> #download the output folder from the dfs. It will create folder /home/hadoop1/output hdfs dfs -get output /home/hadoop1/output #use gedit to open the resulting file from the output folder ######################################################################################################### 15. cluster status reports and shutdown #get a report on your hdfs hdfs dfsadmin -report #check yarn cluster details yarn node -list ######################################################################################################### 16. Closing the cluster. #stop the cluster safely stop-all.sh ###################################