Jan 27

Install Hadoop on Windows in 3 Easy Steps for Hortonworks Sandbox Tutorial

Did you know that you can easily install a simple single-node Hadoop cluster on your Windows PC or laptop?  Normally, Hadoop runs on Unix computers.  However, thanks to modern virtualization technology, you can run a complete sandbox version of Hadoop within a virtual Linux server on your personal computer, for free.  This installation is ideal for learning and exploring how to use Hadoop.  I will teach you how to install and run the Hortonworks sandbox version of Hadoop on your Windows computer in this tutorial.

Step 1: Install VMware Player

If you don’t already have VMware running on your computer, you’ll need to install VMware Player v5 or higher on your Windows computer.  This software is free for personal use.  I have found that installing VMware Player v5 on my 64-bit Windows 7 computer to be reliable and causes no problems.  VMware Player can also be installed on a Linux computer.  If you have a Mac, you’ll need to purchase and install the VMware Fusion software instead.

Download and install “VMware Player for Windows 32-bit and 64-bit“.  It took me 4 minutes to download the VMware-player-5.0.1-894247.exe installer file, and 2 minutes to install the software on my Windows 7 computer, with no need to reboot.  VMware Player requires 150 MB free disk space.  Go to your Windows Start Menu and launch VMware Player (you may skip the upgrade to VMware Workstation).

See the full step-by-step instructions on installing VMware Player if you require additional details.

Step 2: Install Hortonworks Hadoop Sandbox

Download the Hortonworks Hadoop Sandbox for VMware.  The VMware Fusion and VMware Workstation versions are compatible with the VMware Player that you had just installed in Step 1.  It took 1 hour 47 minutes to download the 2 GB “Hortonworks+Sandbox+1.2+1-21-2012-1+vmware.ova” file from the Hortonworks Amazon S3 directory.

While waiting for the VMware OVA file to download, you can watch the below sandbox overview video from Hortonworks (8 minutes 35 seconds) and read the sandbox installation instructions.

Step 3: Run Hadoop!

Within the VMware Player application that you had installed and started running in Step 1, either go to the Player menu, and select “File/Open..”, or choose “Open a Virtual Machine” from the welcome screen.  Locate the directory where you had downloaded the VMware image Hortonworks sandbox file “Hortonworks+Sandbox+1.2+1-21-2012-1+vmware.ova” and select that OVA file to open.  You’ll be prompted for the name of the new virtual server instance, and where on your host Windows machine VMware should store the instance image file–okay to accept the defaults.  It may take a few minutes for the new virtual machine to be imported.  FYI, your Windows computer is the host system, whereas the CentOS 64-bit Linux system, bundled free with the Hortonworks Sandbox, is the virtual guest system from the VMware point of view.

VMware Player Open file menu

VMware Player Open file menu

You’re now ready to click “Play virtual machine” to start running your new sandbox instance.

vmware-sandbox_500

Your guest Linux system will now start up, along with all necessary software for Hadoop to run including a web server.  The beauty of using a virtual server is that 1) you don’t need to have another physical computer under your desk along with the associated power and network cables, monitor, and other accessories, to experiment with Hadoop, 2) you can install it on your laptop and run Hadoop there without needing to connect to the network, and 3) your separate virtual server won’t mess up anything you have on your main Windows computer, and can be easily uninstalled when no longer needed.

sandbox_startup_600

Once everything starts up, then you will see instructions on how to access the Hortonworks Sandbox.  Look for the URL with the IP address such as http://192.168.40.128 in the below screenshot.  Note that your IP address may be different than mine.sandbox_vm_600

In a web browser such as Firefox or Chrome, go to the Sandbox URL IP address.  You should see a Hortonworks Sandbox welcome screen with options to view some overview videos, to start the tutorial, or jump straight into the sandbox.  Since we’re eager to run Hadoop, let’s go straight to the third choice: click on the green Start button under “Use the Sandbox.”

You will now see the HUE web-based environment for the sandbox.  By default you will start in the Beeswax screen.  Let’s click on “My Queries” then click on the “Sample: Top salary (copy)” query name.

beeswax-my-queries_600

You will then see the sample Hive query in the Query Editor.  If you have used relational databases before, you will notice that the Hive query looks very similar to standard SQL.  The other method of querying Hadoop would be via Pig, which is more of a pipeline method of constructing Hadoop queries than Hive, but will require a little steeper learning curve than Hive for those already familiar with SQL.  Let’s stick with Hive for this initial run.  Click on the Execute button to start running the Hive query.hive-query_600

Now your query will start running for a minute or so.  The HUE environment will update the log output to the screen for you to see the progress and any error messages.  This is a good time to step away from the computer and do something healthy while your “Big Data” query is running.

hive-log_600  After a few minutes, your query results will show up on the screen. hive-results_600

Congratulations!  You have now installed Hadoop and successfully run your first Hive query.  You are on your way to becoming a wizard in Hadoop!

As the next step, you can continue with following the step-by-step tutorial from the Hortonworks Sandbox welcome screen to get more hands-on practice with Hive and Pig for data processing in Hadoop.

For more information

See also alternative instructions for installing the demo Hadoop VMware images from Cloudera and MapR.

Permanent link to this article: http://www.hadoopwizard.com/install-hadoop-on-windows-in-3-easy-steps-for-hortonworks-sandbox-tutorial/

39 comments

Skip to comment form

    • Sumit on March 7, 2013 at 2:47 pm

    I went to this URL
    http://www.hadoopwizard.com/how-to-install-vmware-player-for-hadoop-tutorials/

    Here it is mentioned – after the VMPlayer install

    Now you can open the VMware image file or ISO file that others provided you. Follow the instructions that came along with those image files. You can then run your Linux virtual system cleanly without interfering with your main host Windows computer.

    Not sure – where to get the ISOs from
    Any help would be appreciated

  1. Hi Sumit, sorry, I was not clear on the other article about installing VMware Player. After installing the VMware Player, you can install a Hadoop VMware image (instead of an ISO image).

    Where to get the VMware image? You can get the Hortonworks Hadoop Sandbox VMware image from this website:
    http://hortonworks.com/products/hortonworks-sandbox/

    I have listed the steps on how to install and start up the Hadoop VMware instance in this article:
    http://www.hadoopwizard.com/install-hadoop-on-windows-in-3-easy-steps-for-hortonworks-sandbox-tutorial/

    I hope this helps!

    Let us know if you are able to install the Hortonworks Sandbox version of Hadoop and run the hands-on tutorials.

      • Narayana on March 19, 2013 at 11:09 pm

      Hi jimmy,

      I install vmware player and the sand box as per you instructions.I am able to omprt the file later i am getting the below error.Could you please help me out.

      Error:
      This virtual machine is configured for 64-bit guest operating systems. However, 64-bit operation is not possible.

      This host supports Intel VT-x, but Intel VT-x is disabled.

      Intel VT-x might be disabled if it has been disabled in the BIOS/firmware settings or the host has not been power-cycled since changing this setting.

      (1) Verify that the BIOS/firmware settings enable Intel VT-x and disable ‘trusted execution.’

      (2) Power-cycle the host if either of these BIOS/firmware settings have been changed.

      (3) Power-cycle the host if you have not done so since installing VMware Player.

      (4) Update the host’s BIOS/firmware to the latest version.

      For more detailed information, see http://vmware.com/info?id=152.

      1. Hi Narayana,

        What is the type and age of your computer? It looks like from the VMware error message, the “host” computer (your computer), is already a 64-bit computer (good!). However, the Intel VT-x virtualization setting in your computer’s BIOS is not turned on. You will need to reboot your computer and go into the BIOS screen during start-up before Windows starts, find the virtualization support section within the BIOS menu, and enable VT-x support.

        Every computer manufacturer has different BIOS. You can see the following web page from Red Hat Linux for more information about changing your BIOS settings.

        https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Administration_Guide/sect-Virtualization-Troubleshooting-Enabling_Intel_VT_and_AMD_V_virtualization_hardware_extensions_in_BIOS.html

        I would think VT-x support would be enabled by default for most computers. Would you know if it was previously disabled on your computer for a specific reason, such as compatibility with a different software? Changing your computer’s BIOS settings is an advanced procedure–you may want to get help from someone who is familiar with PC hardware BIOS settings if you are not comfortable making the settings changes yourself.

        -Jimmy

        • Shailesh on May 25, 2013 at 7:48 pm

        Hi Narayana,

        You have to enable the Intel Vt-x and these are found in the BIOS under the Security menu.
        Security -> Virtualization
        And check if all options in Virtualization are enabled. If not, please enable them. It should solve the problem.


        Shailesh

    • jani on March 20, 2013 at 6:56 am

    hi summit,
    i fallow the above steps to install hadoop. i finished all steps which is given above,but at last stage it ask,s
    log in& password.how can i login to the sandbox. please send me the solution.

  2. Hi Jani,

    After the sandbox starts running you will see a web address that looks like http://192.168.40.128 within the VMware player (the actual IP address number may be different for your computer). This is the web address of the Hadoop server running within VMware on your local computer.

    Open up a separate web browser window from your PC to go to that web address. You will see the screens for the tutorials and how to run Hive or Pig through your web browser. I would suggest using your web browser to use the Hadoop sandbox. No username or password is required.

    It is unnecessary to go to the Unix command line for the Hadoop server because you can run Hadoop and view the results using your web browser. However, if you do want to go into the Unix command line for more advanced exploration of how Hadoop is set up and the processes running, you can either click Alt-F5 within the VMware player, or run a separate SSH program (such as the free PuTTY SSH program) to login to your Hadoop Unix server within your virtual server. Connect to the same IP address when requested (such as 192.168.40.128 for me, but may be different for you). The Unix username is “root” and the password is “hadoop”. However, this is more advanced, and requires knowledge of Unix. You can try the Hadoop Hive and Pig languages using the web browser instead, which is what I recommend for beginners.

    Have a good time learning Hadoop, Jani!

    -Jimmy

      • jani on March 25, 2013 at 7:10 am

      hi jimmy,
      After installing the sand box ,it displays one server address .i am trying to open it my windows browser but it gives the error .Means it is not open in browser.
      Plz give some suggestion tome
      Thank you,

        • Sen on May 5, 2013 at 11:22 am

        Even I am facing the same issue…

        -Sen

          • Sameer on September 23, 2013 at 6:21 pm

          Hi Sen, Jani,

          Were you guys able to resolve this issue? I am also facing the same issue. VM loads up with an IP address in the end but when I open the IP in a browser it gives an error.

          Thanks,
          Sam

    • Savitha on April 8, 2013 at 12:43 am

    Hi Jimmy,
    I want to install hadoop on windows 7..i have downloaded the hadoop`s stable version.. tar.gz file.. hw should i proceed further using VMware??

    • Sudhir Kumar on May 2, 2013 at 3:00 am

    Hi Jammy,

    I want to install HADOOP on windows XP . Please provide me the detailed steps for it.

    • Dan on May 3, 2013 at 6:41 am

    Thank you so much for this link! I was on the verge of installing and configuring Hadoop myself, but now I can put that task in the pending tray, and dive straight into some tutorials.

    • Jordan on May 9, 2013 at 9:22 pm

    Mr. Wong

    I tried to install the development in my laptop with Intel Core i5 and windows 7. However, the virtual machine can not run because the system find the configuration is 64bit not for 32bit. Could you please tell me how to configure it to 32bit. I downloaded “VMware Player for Windows 32-bit and 64-bit“.

    Thank you very much!

    Jordan

    • Ki on May 22, 2013 at 10:49 pm

    To complement the article, this SandBox is actually embedding the open source project Hue (http://gethue.com) which aim at becoming the Apache Hadoop UI.

    • Avinash on July 28, 2013 at 11:39 pm

    Thanks…I followed step by step and everything is working fine till now. Thanks for such a nice tutorial.

      • Jimmy Wong on September 2, 2013 at 10:08 pm
        Author

      I’m glad you were able to get it working and can start learning Hadoop.

    • balamanikandan on July 31, 2013 at 12:09 am

    Hi Mr. Jimmy wong .

    i have downloaded the sandbox vmware version and started the virtual machine and got the url with the ip address.

    When i tried to connect the url in the browser of the parent machine it is not connecting .

    help me with this issue.

    • Guru on August 6, 2013 at 11:38 am

    Does Horton sandbox works on Windows 8 with the same Vmware player?

      • Jimmy Wong on September 2, 2013 at 10:07 pm
        Author

      I haven’t tried Windows 8, but as far as I have read, the same VMware player should work on it as well.

    • Tarun on August 9, 2013 at 10:52 pm

    Hi Jimmy,
    After completed all step, its run perfectly thanks…..
    Now i come to know, how i can store the crawl result of (Apace Nutch) store in Hive????
    How i make the connectivity of these two……

    • Sharath on August 12, 2013 at 5:57 am

    I have a Dell Studio – with Windows 7, 32 bit OS
    RAM 3 GM

    Can this sand box setup work on my computer ?

    Thanks
    Sharath

      • Jimmy Wong on September 2, 2013 at 10:06 pm
        Author

      Yes, it should work. VMware player should be able to simulate Linux 64-bit OS from Windows 7 32-bit as far as I know. Can you try it and let us know if it works for you?

        • Sameer on September 23, 2013 at 6:30 pm

        Hi Jimmy,

        I need your help on this.
        I am done with the installation on Windows XP but when I try to open the IP in a browser it does not open.
        Any ideas?

        Thanks,
        Sam

    • praveen kumar on September 24, 2013 at 8:47 am

    Thank you Jimmy… It works fine for me..

    🙂

    • dinesh on October 8, 2013 at 12:34 am

    I am getting this error on win 7 32 bit.

    This virtual machine is configured for 64-bit guest operating systems. However, 64-bit operation is not possible.

    This host does not support Intel VT-x.

    For more detailed information, see http://vmware.com/info?id=152.

    Please assist.

    • pankaj on October 9, 2013 at 9:08 am

    Hi Jimmy,
    Can we configure the hadoop sandbox provided for multiple nodes so that we can do parallelly processing.
    can you please help me on xml parsing with hadoop.

    Thanks,
    Pankaj

    • bobby on October 13, 2013 at 3:59 am

    hi jimmy,

    After opening the hortonworks+sandbox+1.3+vmware+RC6.ova with vmware i am receiving the following error

    “Failed to open virtual machine: unknown error, please try again. if the problem persists, please contact VMWare technical support”

    any help is appreciated. Thanks in advance.

      • aniket on December 23, 2013 at 4:40 am

      getting same error

    • Srishti Kapoor on October 13, 2013 at 6:10 am

    Hello,
    I am using windows8 / 64bit with 4 GB RAM, but it shows kind of error and I am not able to resolve it.
    Please guide me on that and one more thing, does anybody know how to learn Hadoop from very basics online?
    If yes please guide me on that too. I have one such platform but don’t whether they are good enough to teach me upto cloudera certification level or not in $97. I am attaching the link of that too:
    http://www.wiziq.com/course/21308-hadoop-big-data-training

    Email: kapoor.srishti17@gmail.com

    Thanks & Regards,
    Srishti

    • Hai on October 21, 2013 at 9:09 am

    Hi Wong,

    Thank you for great tutorial, I was able to install Had Roop Sandbox and run the first HiveQL example without any incidence, now where do you suggest me to go about learning next in order to put Hadroop into practical usage. I work for a company where we develop software and sell to clients. We use MS SQL server database. The largest database our application comes to interact with is around 400 GB size and that figure can soon become terabytes. I am looking for whether there is any advantage to incorporate Hadroop into the application?

    -Thank you for your suggestion

    • Greg on November 2, 2013 at 6:40 am

    Thanks for the tutorial! It works great. I’m up and running on my laptop, and just ran a sample query.

    • Ram on November 13, 2013 at 2:11 am

    I’ve installed Sandbox 2.0 in Windows 8. Oracle VM installation was successful. When I tried to start the sandbox I’ve also got the Sandbox IP address such as http://190.111.2.1:8888. I’ve tried this IP address in Google Crome and Firefox bugt the Sandbox welcome screen is NOT getting displayed as I’ve got the typeical error message. Do we have any known issue in Sandbox 2.0 and Windows 8. Or I might have missed something. Can you please suggest options?

    • madhusudan on November 20, 2013 at 6:58 pm

    HI,

    my .ova file size :2,466,504KB

    Error:
    The import failed because C:\Users\Munich\Downloads\Hortonworks+Sandbox+2.0+VirtualBox.ova did not pass OVF specification conformance or virtual hardware compliance checks.
    Click Retry to relax OVF specification and virtual hardware compliance checks and try the import again, or click Cancel to cancel the import. If you retry the import, you might not be able to use the virtual machine in VMware Player.

    thanks,

    • lavanya on December 2, 2013 at 3:30 am

    Hi Jimmy,

    Thanks for your information provided step by step.

    I have installed everything followed by your steps. I opened web browser with IP address which is provided in VMPlayer. It’s opened home page, when I try to click tutorial it’s it’s opened. But the issue is when i try to click “Use Sandbox” , the page is not opening. I tried in firefox.

    It is redirecting to this IP address: http://127.0.0.1:8000/about/

    error:

    Unable to connect

    Firefox can’t establish a connection to the server at 127.0.0.1:8000.

    The site could be temporarily unavailable or too busy. Try again in a few moments.
    If you are unable to load any pages, check your computer’s network connection.
    If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web.

    Could you please help me.

    • nagappa on December 21, 2013 at 2:04 am

    hi friends,
    Can we run this VM in windows 7,32 bit, intel 2 Machine ? ?. I am doubtful before trying…if any one sure please tell me

    • Parag on December 24, 2013 at 5:27 am

    Hello, Mr.Wong,

    I have successfully installed it , thank you so much..

    • Naan on January 15, 2014 at 3:33 pm

    You can update it with the latest Hue UI to get the latest apps and features: http://gethue.com

    • Tania Shwe on January 21, 2014 at 10:05 pm

    Hi Jimmy,
    I tried to run sandbox on my PC which has 2 GB RAM. It throws me the hostmemory low error along with this warning “The VM occupies 75%(1.84 GB) of the host memory. Host memory low.” I am unable to proceed further. Please help.
    Thanks,
    Tania

Comments have been disabled.