################################### CS435 Cluster Installation Tutorial 1. For Intel x64 bit arch machines. For Mac ARM based machines (M1+) scroll down to follow steps 1B-> 4B. #############STEP 1 to 4 for INTEL based installation###################### #############STEP 1 to 4 for INTEL based installation###################### #############STEP 1 to 4 for INTEL based installation###################### 1A. download vmware player workstation https://customerconnect.vmware.com/en/downloads/details?downloadGroup=WKST-PLAYER-1750&productId=1377&rPId=111473 ################################### 2A. download ubuntu linux desktop. you need to download "amd64 desktop" version: https://ubuntu.com/download/desktop ################################### 3A. Run vmware and install ubuntu in a VM. Use the following for configuration # set machine-name, username and pwd to hadoop # set the VM requirements as follows # CPU: 1 vCore # RAM: 2 GB # Storage: 25 GB # Network: set to NAT #It is important that the username and password is identical for this tutorial ################################### 4B. Once the VM is setup, download/install java SE Development Kit JDK for x64 platform on Linux Download the x64 Compressed Archive https://www.oracle.com/java/technologies/downloads/ #Jump to step 5. #############STEP 1 to 4 for Mac ARM based installation###################### #############STEP 1 to 4 for Mac ARM based installation###################### #############STEP 1 to 4 for Mac ARM based installation###################### 1B. download vmware Fusion or Fusion Pro. https://blogs.vmware.com/teamfusion/2024/05/fusion-pro-now-available-free-for-personal-use.html ################################### 2B. download Ubuntu Desktop for ARM architecture https://cdimage.ubuntu.com/releases/24.10/release/ ################################### 3B. Run vmware and install ubuntu in a VM. Use the following for configuration # set machine-name, username and pwd to hadoop # set the VM requirements as follows # CPU: 1 vCore # RAM: 2 GB # Storage: 25 GB # Network: set to NAT #It is important that the username and password is identical for this tutorial ################################### 4B. Once the VM is setup, download/install java SE Development Kit JDK for ARM64 platform on Linux Download the ARM64 Compressed Archive https://www.oracle.com/java/technologies/downloads/ Download version for Linux ARM64 Compressed Archive and install Java on your VM #########################################CONTINUE HERE####################################### #########################################CONTINUE HERE####################################### #########################################CONTINUE HERE####################################### 5. Installing java We assume you have download the java compressed file. Extract the file to a folder called "java" on Desktop #Open terminal and copy the java folder to /usr/local/java sudo mv Desktop/java /usr/local/java #Set the environment variables; go to the home directory. cd gedit ~/.bashrc #Scroll to the bottom of the file and paste: export JAVA_HOME=/usr/local/java export PATH=$PATH:$JAVA_HOME/bin export HADOOP_HOME=/usr/local/hadoop export HADOOP_MAPRED_HOME=$HADOOP_HOME export HADOOP_COMMON_HOME=$HADOOP_HOME export HADOOP_HDFS_HOME=$HADOOP_HOME export YARN_HOME=$HADOOP_HOME export PATH=$PATH:$HADOOP_HOME/bin export PATH=$PATH:$HADOOP_HOME/sbin export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native" export HADOOP_CLASSPATH=$JAVA_HOME/lib/tools.jar:/usr/local/hipi/jars/*:/usr/local/hipi/release/* export MAP=/usr/local/hadoop/share/hadoop/mapreduce #save the file. type the following in terminal . ~/.bashrc # Run java. you should be able to see java output. java ################################### 6. download and install hadoop (700+ MB) # Your VM needs to be connected to the internet. Open Browser and download a compressed archive https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz #alternatively use wget wget https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz #unzip the file tar xzf hadoop-3.3.6.tar.gz #move the folder to a local access folder sudo mv hadoop-3.3.6 /usr/local/hadoop #change permissions sudo chown hadoop:hadoop -R /usr/local/hadoop ################################### 7. set up network and ready ssh for remote access; login without password. This is needed for hadoop #The following needs to be typed in Terminal window: #Rename the machine to hadoop1. Use the gedit to do the following sudo gedit /etc/hostname #Type Hadoop1 and save/close the file. #Lets prepate the network package and tools. Use the apt install to install the net-tools package. sudo apt install net-tools #Check network interfaces ifconfig -a #Look for network inerface in the list. Assuming your interface is ens33; turn on the interface if it is not already up sudo ifconfig ens33 up #Obtain IP address from the DHCP. Note; your VMWare configuration needs to be set to NAT. sudo dhclient ens33 #This will receive a IP address for your VM. Display the IP address ifconfig #You should be able to see the IP address assigned to your VM. For this tutorial, we assume the IP address is 192.168.5.130 #install ssh server sudo apt install openssh-server #Generate SSH keys ssh-keygen -t rsa #add ssh to authorized keys file cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys #change file permissions chmod og-wx ~/.ssh/authorized_keys #now ssh to local machine. Provide the password for user hadoop ssh hadoop1 ################################### 8. Configure hadoop #setup JAVA_HOME so hadoop knows the path of java dir gedit /usr/local/hadoop/etc/hadoop/hadoop-env.sh scroll to line#38; add the following export JAVA_HOME=/usr/local/java/jre #------------------------------------------------------------- #------------------------------------------------------------- #------------------------------------------------------------- #set HADOOP core-site configuration file gedit /usr/local/hadoop/etc/hadoop/core-site.xml #add the following to the flag fs.defaultFS hdfs://hadoop1:9000 #------------------------------------------------------------- #------------------------------------------------------------- #------------------------------------------------------------- #set HADOOP hdfs-site configuration file gedit /usr/local/hadoop/etc/hadoop/hdfs-site.xml #add the following to the flag dfs.replication 2 dfs.name.dir /usr/local/hadoop_tmp/n dfs.data.dir /usr/local/hadoop_tmp/d #------------------------------------------------------------- #------------------------------------------------------------- #------------------------------------------------------------- #set HADOOP mapred-site configuration file gedit /usr/local/hadoop/etc/hadoop/mapred-site.xml #add the following to the flag mapreduce.framework.name yarn yarn.app.mapreduce.am.env HADOOP_MAPRED_HOME=$HADOOP_HOME mapreduce.map.env HADOOP_MAPRED_HOME=$HADOOP_HOME mapreduce.reduce.env HADOOP_MAPRED_HOME=$HADOOP_HOME mapreduce.application.classpath /usr/local/hadoop/etc/hadoop:/usr/local/hadoop/share/hadoop/common/lib/*:/usr/local/hadoop/share/hadoop/common/*:/usr/local/hadoop/share/hadoop/hdfs:/usr/local/hadoop/share/hadoop/hdfs/lib/*:/usr/local/hadoop/share/hadoop/hdfs/*:/usr/local/hadoop/share/hadoop/mapreduce/*:/usr/local/hadoop/share/hadoop/yarn:/usr/local/hadoop/share/hadoop/yarn/lib/*:/usr/local/hadoop/share/hadoop/yarn/*:/usr/local/jdk1.8.0_111/lib/tools.jar #------------------------------------------------------------- #------------------------------------------------------------- #------------------------------------------------------------- #set HADOOP yarn-site configuration file gedit /usr/local/hadoop/etc/hadoop/yarn-site.xml #add the following to the flag yarn.resourcemanager.hostname hadoop1 yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce_shuffle.class org.apache.hadoop.mapred.ShuffleHandler #------------------------------------------------------------- #------------------------------------------------------------- #------------------------------------------------------------- #Open worker file in /usr/local/hadoop/etc/hadoop/ #Add the name of the worker nodes in the hadoop cluster. We assume the machine name is hadoop1. hadoop1 ################################### 9. Starting hadoop #create tmp folders for namenode and datanodes to write to sudo mkdir /usr/local/hadoop_tmp sudo mkdir /usr/local/hadoop_tmp/n sudo mkdir /usr/local/hadoop_tmp/d chown hadoop:hadoop /usr/local/hadoop_tmp chmod 755 /usr/local/hadoop_tmp chown hadoop:hadoop /usr/local/hadoop_tmp/n chmod 755 /usr/local/hadoop_tmp/n chown hadoop:hadoop /usr/local/hadoop_tmp/d chmod 755 /usr/local/hadoop_tmp/d #Need to format the namespace first hdfs namenode -format #start Yarn and HDFS services start-dfs.sh start-yarn.sh #or two in one! start-all.sh ################################### 10. Open browser to verify all is well #run jps to see relevant processes jps #You should be able to see the following Namenode Datanode NodeManager ResourceManager SecondaryNamenode #now open browser and type http://localhost:9870 # you should be able to see Hadoop status ################################### 11. run a pi mapReduce program to verify the installation #running the compute pi program in hadoop hadoop jar $MAP/hadoop-mapreduce-examples-3.3.6.jar pi 3 3 #the above should spawn 3 map processes to compute the value of pi. Node there is only one datanode created so far. Observe how the program runs slow because the parallelism is costly. #Read more about Pi program at: #https://hadoop.apache.org/docs/r3.2.0/api/org/apache/hadoop/examples/pi/package-summary.html ################################### 12. run a wordcount mapReduce program to verify the installation #lets run the word count program #make a directory in hdfs called input hdfs dfs -mkdir /input #the input directory was created in hdfs. You cannot see it from linux. #make a textfile called localfile.txt. Open gedit and copy a few lines gedit localfile.txt #using put, copy a textfile to the/input directory in hdfs. hdfs dfs -put localfile.txt /input #now lets run the word count program hadoop jar $MAP/hadoop-mapreduce-examples-3.3.6.jar grep /input/localfile.txt /output 'this' # this runs the wordcount program. the program reads the textfile /input/localfile.txt and write the response in the directory /output ################################### 13. Observe the Hadoop and YARN GUI # you can see the file you copied to hdfs using the browser. Open the browers and got to http://localhost:9870/dfshealth.html#tab-overview #click on utilities menu and select browse the file system. you should be able to see the files and directories. # Open the browers and check the /output directory. you can download the file to see the output generated by hadoop YARN GUI interface is available at: http://localhost:8088/ ################################### 14. Shutdown your cluster. #stop all deamons stop-all.sh