«

»

Jan 27

Print this Post

Install Hadoop on Windows in 3 Easy Steps for Hortonworks Sandbox Tutorial

Did you know that you can easily install a simple single-node Hadoop cluster on your Windows PC or laptop?  Normally, Hadoop runs on Unix computers.  However, thanks to modern virtualization technology, you can run a complete sandbox version of Hadoop within a virtual Linux server on your personal computer, for free.  This installation is ideal for learning and exploring how to use Hadoop.  I will teach you how to install and run the Hortonworks sandbox version of Hadoop on your Windows computer in this tutorial.

Step 1: Install VMware Player

If you don’t already have VMware running on your computer, you’ll need to install VMware Player v5 or higher on your Windows computer.  This software is free for personal use.  I have found that installing VMware Player v5 on my 64-bit Windows 7 computer to be reliable and causes no problems.  VMware Player can also be installed on a Linux computer.  If you have a Mac, you’ll need to purchase and install the VMware Fusion software instead.

Download and install “VMware Player for Windows 32-bit and 64-bit“.  It took me 4 minutes to download the VMware-player-5.0.1-894247.exe installer file, and 2 minutes to install the software on my Windows 7 computer, with no need to reboot.  VMware Player requires 150 MB free disk space.  Go to your Windows Start Menu and launch VMware Player (you may skip the upgrade to VMware Workstation).

See the full step-by-step instructions on installing VMware Player if you require additional details.

Step 2: Install Hortonworks Hadoop Sandbox

Download the Hortonworks Hadoop Sandbox for VMware.  The VMware Fusion and VMware Workstation versions are compatible with the VMware Player that you had just installed in Step 1.  It took 1 hour 47 minutes to download the 2 GB “Hortonworks+Sandbox+1.2+1-21-2012-1+vmware.ova” file from the Hortonworks Amazon S3 directory.

While waiting for the VMware OVA file to download, you can watch the below sandbox overview video from Hortonworks (8 minutes 35 seconds) and read the sandbox installation instructions.

Step 3: Run Hadoop!

Within the VMware Player application that you had installed and started running in Step 1, either go to the Player menu, and select “File/Open..”, or choose “Open a Virtual Machine” from the welcome screen.  Locate the directory where you had downloaded the VMware image Hortonworks sandbox file “Hortonworks+Sandbox+1.2+1-21-2012-1+vmware.ova” and select that OVA file to open.  You’ll be prompted for the name of the new virtual server instance, and where on your host Windows machine VMware should store the instance image file–okay to accept the defaults.  It may take a few minutes for the new virtual machine to be imported.  FYI, your Windows computer is the host system, whereas the CentOS 64-bit Linux system, bundled free with the Hortonworks Sandbox, is the virtual guest system from the VMware point of view.

VMware Player Open file menu

VMware Player Open file menu

You’re now ready to click “Play virtual machine” to start running your new sandbox instance.

vmware-sandbox_500

Your guest Linux system will now start up, along with all necessary software for Hadoop to run including a web server.  The beauty of using a virtual server is that 1) you don’t need to have another physical computer under your desk along with the associated power and network cables, monitor, and other accessories, to experiment with Hadoop, 2) you can install it on your laptop and run Hadoop there without needing to connect to the network, and 3) your separate virtual server won’t mess up anything you have on your main Windows computer, and can be easily uninstalled when no longer needed.

sandbox_startup_600

Once everything starts up, then you will see instructions on how to access the Hortonworks Sandbox.  Look for the URL with the IP address such as http://192.168.40.128 in the below screenshot.  Note that your IP address may be different than mine.sandbox_vm_600

In a web browser such as Firefox or Chrome, go to the Sandbox URL IP address.  You should see a Hortonworks Sandbox welcome screen with options to view some overview videos, to start the tutorial, or jump straight into the sandbox.  Since we’re eager to run Hadoop, let’s go straight to the third choice: click on the green Start button under “Use the Sandbox.”

You will now see the HUE web-based environment for the sandbox.  By default you will start in the Beeswax screen.  Let’s click on “My Queries” then click on the “Sample: Top salary (copy)” query name.

beeswax-my-queries_600

You will then see the sample Hive query in the Query Editor.  If you have used relational databases before, you will notice that the Hive query looks very similar to standard SQL.  The other method of querying Hadoop would be via Pig, which is more of a pipeline method of constructing Hadoop queries than Hive, but will require a little steeper learning curve than Hive for those already familiar with SQL.  Let’s stick with Hive for this initial run.  Click on the Execute button to start running the Hive query.hive-query_600

Now your query will start running for a minute or so.  The HUE environment will update the log output to the screen for you to see the progress and any error messages.  This is a good time to step away from the computer and do something healthy while your “Big Data” query is running.

hive-log_600  After a few minutes, your query results will show up on the screen. hive-results_600

Congratulations!  You have now installed Hadoop and successfully run your first Hive query.  You are on your way to becoming a wizard in Hadoop!

As the next step, you can continue with following the step-by-step tutorial from the Hortonworks Sandbox welcome screen to get more hands-on practice with Hive and Pig for data processing in Hadoop.

For more information

See also alternative instructions for installing the demo Hadoop VMware images from Cloudera and MapR.

About the author

Jimmy Wong

Jimmy crunches massive amounts of big data using Hadoop for online advertising and marketing in a public social networking company. He enjoys helping newbies learn more about applying technology to solve business problems. He can be found in the San Francisco Bay Area. For more info, see his http://about.me/jimmy.wong page.

(The views expressed by Jimmy on this blog are his personal opinions and do not represent his employer or other organizations.)

Permanent link to this article: http://www.hadoopwizard.com/install-hadoop-on-windows-in-3-easy-steps-for-hortonworks-sandbox-tutorial/

39 comments

Skip to comment form

  1. Sumit

    I went to this URL
    http://www.hadoopwizard.com/how-to-install-vmware-player-for-hadoop-tutorials/

    Here it is mentioned – after the VMPlayer install

    Now you can open the VMware image file or ISO file that others provided you. Follow the instructions that came along with those image files. You can then run your Linux virtual system cleanly without interfering with your main host Windows computer.

    Not sure – where to get the ISOs from
    Any help would be appreciated

  2. Jimmy Wong

    Hi Sumit, sorry, I was not clear on the other article about installing VMware Player. After installing the VMware Player, you can install a Hadoop VMware image (instead of an ISO image).

    Where to get the VMware image? You can get the Hortonworks Hadoop Sandbox VMware image from this website:
    http://hortonworks.com/products/hortonworks-sandbox/

    I have listed the steps on how to install and start up the Hadoop VMware instance in this article:
    http://www.hadoopwizard.com/install-hadoop-on-windows-in-3-easy-steps-for-hortonworks-sandbox-tutorial/

    I hope this helps!

    Let us know if you are able to install the Hortonworks Sandbox version of Hadoop and run the hands-on tutorials.

    1. Narayana

      Hi jimmy,

      I install vmware player and the sand box as per you instructions.I am able to omprt the file later i am getting the below error.Could you please help me out.

      Error:
      This virtual machine is configured for 64-bit guest operating systems. However, 64-bit operation is not possible.

      This host supports Intel VT-x, but Intel VT-x is disabled.

      Intel VT-x might be disabled if it has been disabled in the BIOS/firmware settings or the host has not been power-cycled since changing this setting.

      (1) Verify that the BIOS/firmware settings enable Intel VT-x and disable ‘trusted execution.’

      (2) Power-cycle the host if either of these BIOS/firmware settings have been changed.

      (3) Power-cycle the host if you have not done so since installing VMware Player.

      (4) Update the host’s BIOS/firmware to the latest version.

      For more detailed information, see http://vmware.com/info?id=152.

      1. Jimmy Wong

        Hi Narayana,

        What is the type and age of your computer? It looks like from the VMware error message, the “host” computer (your computer), is already a 64-bit computer (good!). However, the Intel VT-x virtualization setting in your computer’s BIOS is not turned on. You will need to reboot your computer and go into the BIOS screen during start-up before Windows starts, find the virtualization support section within the BIOS menu, and enable VT-x support.

        Every computer manufacturer has different BIOS. You can see the following web page from Red Hat Linux for more information about changing your BIOS settings.

        https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/html/Virtualization_Administration_Guide/sect-Virtualization-Troubleshooting-Enabling_Intel_VT_and_AMD_V_virtualization_hardware_extensions_in_BIOS.html

        I would think VT-x support would be enabled by default for most computers. Would you know if it was previously disabled on your computer for a specific reason, such as compatibility with a different software? Changing your computer’s BIOS settings is an advanced procedure–you may want to get help from someone who is familiar with PC hardware BIOS settings if you are not comfortable making the settings changes yourself.

        -Jimmy

      2. Shailesh

        Hi Narayana,

        You have to enable the Intel Vt-x and these are found in the BIOS under the Security menu.
        Security -> Virtualization
        And check if all options in Virtualization are enabled. If not, please enable them. It should solve the problem.

        -
        Shailesh

  3. jani

    hi summit,
    i fallow the above steps to install hadoop. i finished all steps which is given above,but at last stage it ask,s
    log in& password.how can i login to the sandbox. please send me the solution.

  4. Jimmy Wong

    Hi Jani,

    After the sandbox starts running you will see a web address that looks like http://192.168.40.128 within the VMware player (the actual IP address number may be different for your computer). This is the web address of the Hadoop server running within VMware on your local computer.

    Open up a separate web browser window from your PC to go to that web address. You will see the screens for the tutorials and how to run Hive or Pig through your web browser. I would suggest using your web browser to use the Hadoop sandbox. No username or password is required.

    It is unnecessary to go to the Unix command line for the Hadoop server because you can run Hadoop and view the results using your web browser. However, if you do want to go into the Unix command line for more advanced exploration of how Hadoop is set up and the processes running, you can either click Alt-F5 within the VMware player, or run a separate SSH program (such as the free PuTTY SSH program) to login to your Hadoop Unix server within your virtual server. Connect to the same IP address when requested (such as 192.168.40.128 for me, but may be different for you). The Unix username is “root” and the password is “hadoop”. However, this is more advanced, and requires knowledge of Unix. You can try the Hadoop Hive and Pig languages using the web browser instead, which is what I recommend for beginners.

    Have a good time learning Hadoop, Jani!

    -Jimmy

    1. jani

      hi jimmy,
      After installing the sand box ,it displays one server address .i am trying to open it my windows browser but it gives the error .Means it is not open in browser.
      Plz give some suggestion tome
      Thank you,

      1. Sen

        Even I am facing the same issue…

        -Sen

        1. Sameer

          Hi Sen, Jani,

          Were you guys able to resolve this issue? I am also facing the same issue. VM loads up with an IP address in the end but when I open the IP in a browser it gives an error.

          Thanks,
          Sam

  5. Savitha

    Hi Jimmy,
    I want to install hadoop on windows 7..i have downloaded the hadoop`s stable version.. tar.gz file.. hw should i proceed further using VMware??

  6. Sudhir Kumar

    Hi Jammy,

    I want to install HADOOP on windows XP . Please provide me the detailed steps for it.

  7. Dan

    Thank you so much for this link! I was on the verge of installing and configuring Hadoop myself, but now I can put that task in the pending tray, and dive straight into some tutorials.

  8. Jordan

    Mr. Wong

    I tried to install the development in my laptop with Intel Core i5 and windows 7. However, the virtual machine can not run because the system find the configuration is 64bit not for 32bit. Could you please tell me how to configure it to 32bit. I downloaded “VMware Player for Windows 32-bit and 64-bit“.

    Thank you very much!

    Jordan

  9. Ki

    To complement the article, this SandBox is actually embedding the open source project Hue (http://gethue.com) which aim at becoming the Apache Hadoop UI.

  10. Avinash

    Thanks…I followed step by step and everything is working fine till now. Thanks for such a nice tutorial.

    1. Jimmy Wong

      I’m glad you were able to get it working and can start learning Hadoop.

  11. balamanikandan

    Hi Mr. Jimmy wong .

    i have downloaded the sandbox vmware version and started the virtual machine and got the url with the ip address.

    When i tried to connect the url in the browser of the parent machine it is not connecting .

    help me with this issue.

  12. Guru

    Does Horton sandbox works on Windows 8 with the same Vmware player?

    1. Jimmy Wong

      I haven’t tried Windows 8, but as far as I have read, the same VMware player should work on it as well.

  13. Tarun

    Hi Jimmy,
    After completed all step, its run perfectly thanks…..
    Now i come to know, how i can store the crawl result of (Apace Nutch) store in Hive????
    How i make the connectivity of these two……

  14. Sharath

    I have a Dell Studio – with Windows 7, 32 bit OS
    RAM 3 GM

    Can this sand box setup work on my computer ?

    Thanks
    Sharath

    1. Jimmy Wong

      Yes, it should work. VMware player should be able to simulate Linux 64-bit OS from Windows 7 32-bit as far as I know. Can you try it and let us know if it works for you?

      1. Sameer

        Hi Jimmy,

        I need your help on this.
        I am done with the installation on Windows XP but when I try to open the IP in a browser it does not open.
        Any ideas?

        Thanks,
        Sam

  15. praveen kumar

    Thank you Jimmy… It works fine for me..

    :)

  16. dinesh

    I am getting this error on win 7 32 bit.

    This virtual machine is configured for 64-bit guest operating systems. However, 64-bit operation is not possible.

    This host does not support Intel VT-x.

    For more detailed information, see http://vmware.com/info?id=152.

    Please assist.

  17. pankaj

    Hi Jimmy,
    Can we configure the hadoop sandbox provided for multiple nodes so that we can do parallelly processing.
    can you please help me on xml parsing with hadoop.

    Thanks,
    Pankaj

  18. bobby

    hi jimmy,

    After opening the hortonworks+sandbox+1.3+vmware+RC6.ova with vmware i am receiving the following error

    “Failed to open virtual machine: unknown error, please try again. if the problem persists, please contact VMWare technical support”

    any help is appreciated. Thanks in advance.

    1. aniket

      getting same error

  19. Srishti Kapoor

    Hello,
    I am using windows8 / 64bit with 4 GB RAM, but it shows kind of error and I am not able to resolve it.
    Please guide me on that and one more thing, does anybody know how to learn Hadoop from very basics online?
    If yes please guide me on that too. I have one such platform but don’t whether they are good enough to teach me upto cloudera certification level or not in $97. I am attaching the link of that too:
    http://www.wiziq.com/course/21308-hadoop-big-data-training

    Email: kapoor.srishti17@gmail.com

    Thanks & Regards,
    Srishti

  20. Hai

    Hi Wong,

    Thank you for great tutorial, I was able to install Had Roop Sandbox and run the first HiveQL example without any incidence, now where do you suggest me to go about learning next in order to put Hadroop into practical usage. I work for a company where we develop software and sell to clients. We use MS SQL server database. The largest database our application comes to interact with is around 400 GB size and that figure can soon become terabytes. I am looking for whether there is any advantage to incorporate Hadroop into the application?

    -Thank you for your suggestion

  21. Greg

    Thanks for the tutorial! It works great. I’m up and running on my laptop, and just ran a sample query.

  22. Ram

    I’ve installed Sandbox 2.0 in Windows 8. Oracle VM installation was successful. When I tried to start the sandbox I’ve also got the Sandbox IP address such as http://190.111.2.1:8888. I’ve tried this IP address in Google Crome and Firefox bugt the Sandbox welcome screen is NOT getting displayed as I’ve got the typeical error message. Do we have any known issue in Sandbox 2.0 and Windows 8. Or I might have missed something. Can you please suggest options?

  23. madhusudan

    HI,

    my .ova file size :2,466,504KB

    Error:
    The import failed because C:\Users\Munich\Downloads\Hortonworks+Sandbox+2.0+VirtualBox.ova did not pass OVF specification conformance or virtual hardware compliance checks.
    Click Retry to relax OVF specification and virtual hardware compliance checks and try the import again, or click Cancel to cancel the import. If you retry the import, you might not be able to use the virtual machine in VMware Player.

    thanks,

  24. lavanya

    Hi Jimmy,

    Thanks for your information provided step by step.

    I have installed everything followed by your steps. I opened web browser with IP address which is provided in VMPlayer. It’s opened home page, when I try to click tutorial it’s it’s opened. But the issue is when i try to click “Use Sandbox” , the page is not opening. I tried in firefox.

    It is redirecting to this IP address: http://127.0.0.1:8000/about/

    error:

    Unable to connect

    Firefox can’t establish a connection to the server at 127.0.0.1:8000.

    The site could be temporarily unavailable or too busy. Try again in a few moments.
    If you are unable to load any pages, check your computer’s network connection.
    If your computer or network is protected by a firewall or proxy, make sure that Firefox is permitted to access the Web.

    Could you please help me.

  25. nagappa

    hi friends,
    Can we run this VM in windows 7,32 bit, intel 2 Machine ? ?. I am doubtful before trying…if any one sure please tell me

  26. Parag

    Hello, Mr.Wong,

    I have successfully installed it , thank you so much..

  27. Naan

    You can update it with the latest Hue UI to get the latest apps and features: http://gethue.com

  28. Tania Shwe

    Hi Jimmy,
    I tried to run sandbox on my PC which has 2 GB RAM. It throws me the hostmemory low error along with this warning “The VM occupies 75%(1.84 GB) of the host memory. Host memory low.” I am unable to proceed further. Please help.
    Thanks,
    Tania

Comments have been disabled.