################################### CS435 Cluster Installation Tutorial 2 ################################### 1. Copy the VM from the previous tutorial in to a new VM. You may call this file master. We will make changes to this VM so that it becomes part of a cluster consisting of 1 master node, and 3 slave nodes. ######################################################################################################### 2. Rename the machine to master. Use the gedit to do the following sudo gedit /etc/hostname This should open the file consisting of the name of your machine. Type hadoop1 and save the file. You may logout and login to have the changes get in effect. ######################################################################################################### 3. Re-Configure hadoop #configure hadoop so it knows what hosts are workers gedit /usr/local/hadoop/etc/hadoop/workers #add the following to this file hadoop1 hadoop2 hadoop3 hadoop4 #save the file. <---------------------------<---------------------------<---------------------------> #we have already configured hadoop xml files. You need to edit the core-site.xml file. Change the tag with value https://hadoop1:9000 <---------------------------<---------------------------<---------------------------> Save the file. <---------------------------<---------------------------<---------------------------> #make sure tmp file directories are clean. Run the following on terminal cd /usr/local/hadoop_tmp rm * -R mkdir n mkdir d ls -al chmod 755 n chmod 755 d #This will clean the tmp folders so we can work on the cluster. ######################################################################################################### 4. Lets prepate the network package and tools. Use the apt install to install the net-tools package. sudo apt install net-tools #check the ip address of the host ifconfig #watch out for your ethernet controller item. usually it is eth0 or ens33 or similar #we will setup our slave nodes with these ip addresses #192.168.5.131 hadoop1 which is the master #192.168.5.132 hadoop2 which is a slave #192.168.5.133 hadoop3 which is a slave #192.168.5.134 hadoop4 which is a slave ######################################################################################################### ########################################## I M P O R T A N T ########################################## ######################################################################################################### 5. Now change the IP address of your host sudo ifconfig ens33 192.168.5.131 <---------------------------<---------------------------<---------------------------> #we can edit the interface file so the changes become permanent sudo gedit /etc/host/interfaces #type the following auto lo iface lo inet loopback auto ens33 iface ens33 inet static address 192.168.5.131 netmask 255.255.255.0 <---------------------------<---------------------------<---------------------------> # we will now reset the hosts file sudo gedit /etc/hosts #type in the following to overwrite the existing info 127.0.0.1 localhost 192.168.5.131 hadoop1 192.168.5.132 hadoop2 192.168.5.133 hadoop3 192.168.5.134 hadoop4 <---------------------------<---------------------------<---------------------------> ######################################################################################################### 6. reboot the machine so the changes take effect sudo reboot now #ssh to the machine once to make passwordless ssh ssh hadoop1 ######################################################################################################### 7. Now shutdown your VM. This VM is a Ubuntu host that serves as a Node in the hadoop cluster. In your host machine, make 3 copies of this Node/VM. Change the name of each of these VMs appropriately. ######################################################################################################### ######################################################################################################### ######################################################################################################### 8. The following are instructions to prepare the worker node. Repeat the same instructions for hadoop2, hadoop3 and hadoop4. Start the VM. Login as before, and make the following changes: #change the machine hostname to hadoop2, where 2 is the slave number sudo nano /etc/hostname #The Vi editor opens. Change the name to hadoop2. Use Ctrl SX to save and quit Ctrl SX # check the name of your machine hostname #It should show hadoop2 ################################### 9. For this VM, we will change the network settings: sudo ifconfig ens33 192.168.5.132 #note, we changed the IP address to 192.168.5.132 ################################### 10. Test if you can ssh to this machine ssh hadoop2 #check the IP address ipconfig #The IP should be 192.168.5.132 ######################################################################################################### ########## Repeat steps 8-9-10 for VM3 and VM4 with hostname hadoop3 and hadoop4 ################## ######################################################################################################### 11. We assume that all 4 of the VMs are running on your host machine. We will now enter hadoop1 which serves as master. We will connect to other machines using ssh. ssh hadoop2 #This allows you to connect to hadoop2. To go back -> exit #Test this for all machines hadoop1, 2, 3 and 4. ######################################################################################################### 12. startup the cluster #go the hadoop1 master machine. format namenode hdfs namenode -format #make sure there are no errors. If all is well, start the cluster #start hdfs start-all.sh #Once the prompt becomes available do: jps # You will see a list with NameNode, SecondaryNameNode, DataNode, ResourceManager, NodeManager on hadoop1 ( node-master) # Switch to any other worker VM; jps will give a list of a DataNode and a NodeManager on each of hadoop2, hadoop3 and hadoop4. ######################################################################################################### 13. You can see the webUI here: #for hdfs open browser and type http://hadoop1:9870/ #for yarn http://hadoop1:8088/ ################################### 14. You are familiar with the pi program run from the first tutorial, he we run the MapReduce wordcount program #Lets make some folders and files in hdfs hdfs dfs -mkdir books hdfs dfs -ls -R / #This will make directory book and show all files therein <---------------------------<---------------------------<---------------------------> #download books from projectgutenberg website #assuming that you downloaded these files:alice.txt holmes.txt frankenstein.txt hdfs dfs -put alice.txt holmes.txt frankenstein.txt books #this will copy the 3 files to the dfs in books folder #lets see the directory hdfs dfs -ls books <---------------------------<---------------------------<---------------------------> #run the wordcount program. this will read all the files in the dfs books/ folder and write response to output folder hadoop jar hadoop-mapreduce-examples-3.3.6.jar wordcount "books/*" output <---------------------------<---------------------------<---------------------------> #download the output folder from the dfs. It will create folder /home/hadoop1/output hdfs dfs -get output /home/hadoop1/output #use gedit to open the resulting file from the output folder ######################################################################################################### 15. cluster status reports and shutdown #get a report on your hdfs hdfs dfsadmin -report #check yarn cluster details yarn node -list ######################################################################################################### 16. Closing the cluster. #stop the cluster safely stop-all.sh ###################################