I just finished building a five node Hadoop cluster in Cloudshare on Ubuntu Linux for development and testing purposes. In the past I've been running a lot of my jobs in Amazon Elastic MapReduce, but since my data is not that large I figured by using Cloudshare I can keep my costs down to a reasonable $50/month.
I used this tutorial from Michael Noll and it was very helpful. However, I did a few things differently than Michael which caused me to trip up a few times during the installation. Hopefully my notes below will aid another fellows if he/she has a similar configuration as my own.
Below are the environment differences that caused me issues as I walked through the tutorial:
- I used Cloudshare's distribution of Ubuntu Linux 10.04 – should be fine but I still had to go through a few hoops.
- I used Hadoop version 188.8.131.52 instead of 1.0.3 that the guide was written off
Issue #1: can't install python
I couldn't run the following command:
sudo apt-get install python-software-properties
The base install first needed the update command:
sudo apt-get update
After running that I was able to install python just fine. Just changed the order.
Issue #2: Update $HOME/.bashrc
I couldn't update the .bashrc fine. I had to switch back to sysadmin because hduser wasn't a root user and sudo doesn't work. When I went to find .bashrc it didn't exist because I was using the wrong user. Switching back to hduser I was able to update the file.
Issue #3: Couldn't find hadoop-env.sh
None of the files the guide specified to edit are in the locations the guide specifies. For me, this was my fault because I chose to use a newer version of hadoop than what the guide used. For any users using version 184.108.40.206 the following paths are the correct translation.
Instead of "hadoop/conf" use "hadoop/etc/hadoop":
Issue #4: Listener didn't work on port 54310
I did have issue with how the core-site.xml was configured. For some reason unknown to me I could not get the listener to run on port 54310. Instead I had to change it to the default configuration to get it to work:
Issue #5: mapred-site.xml didn't exist
One last exception I found is the file "conf/mapred-site.xml". Not only is it NOT under conf, it is also named differently. Instead, edit mapred-site.xml.template:
Issue #6: Starting your single-node cluster
For this portion it took me a few minutes to figure out that the start-all script is NOT in the bin as the guide describes. Instead with 220.127.116.11 the start-all.sh file is in the "sbin" and can be ran as shown below. However, when I ran the script it continued to fail, but then I remembered I had been switching between sysadmin and hduser, and I need to switch back and ssh into the local host. If yours isn't running, try this:
*note – when you run the jps command to check to ensure the hadoop processes are running, your output will show the ResourceManager, NodeManager, DataNode, and SecondaryNameNode pids instead of TaskTracker, etc. This is expected as 18.104.22.168 was rearchitected in how it processes jobs. See below:
Issue #7: I created five nodes, not 1 slave
The guide specified just one master and one "slave". Instead I created four slaves, aka nodes as seen below:
After this I am able to ssh into the master and copy over the keys:
And ssh into the nodes was working just fine:
Issue #8: master file didn't exist
The next step in the guide was to add references to the master by editing the master file, and inputting the slaves in the slaves file. In 22.214.171.124 you don't need to edit the master file, so you can skip that step. Simply put the node names in the slaves file under hadoop/etc/hadoop/slaves:
I left master in mine so that the namenode would also process data.
Issue #9: start-all is deprecated in 126.96.36.199
When starting Hadoop, don't use start-all.sh. Instead use the following commands to start the daemons on the master node:
$ sbin/hadoop-daemon.sh start namenode
$ sbin/hadoop-daemons.sh start datanode
$ sbin/yarn-daemon.sh start resourcemanager
$ sbin/yarn-daemons.sh start nodemanager
$ sbin/mr-jobhistory-daemon.sh start historyserver
After running these scripts, JPS reported everything was fine on the master, but none of the slaves seemed to have been running the services. After freaking out for a while I realized I wasn't using the hduser account AGAIN and after I switched everything was fine after all:
That was it! Not too bad, thanks to that great tutorial. Now I have a real cluster to test against with partial datasets before paying the big bucks to test in Amazon or otherwise.
Take care, and send me your comments below!
Pretty, isn't it! J