The new “Hadoop Illuminated” book has arrived today! See below for how you can read it online for free. Read it today to make progress on your way toward becoming a Hadoop professional for the Big Data revolution. Read my review below to find out more.
- Free open access download of the Hadoop Illuminated book (no registration required)
My Review of the Hadoop Illuminated Book
Title: Hadoop Illuminated
Authors: Mark Kerzner and Sujee Maniyam
Publisher: Hadoop illuminated LLC, open source on GitHub
Publication Year: February 26, 2013 (alpha version announced on author’s blog)
Book website: www.hadoopilluminated.com
Help to clarify or explain (a subject or matter)
The English word “illuminated” means to lighten up or otherwise to clarify an otherwise dark or mysterious subject. The authors of the Hadoop Illuminated book have set out to clarify the often mysterious subject of Hadoop. Co-authors Mark Kerzner and Sujee Maniyam have released their book in electronic format as open-access on their book’s website, meaning you can read it online for free.
By the way, be careful not to mispronounce the book title as “Hadoop eliminated” as that would have a totally opposite meaning.
The book is still a work in progress, as evidenced by the incomplete Chapter 7 on “Introduction to MapReduce.” However, this isn’t so much of a problem for me. The open-access movement in book publishing means that works-in-progress are made available to readers online in exchange for early feedback, so that the authors can refine their book earlier, helping them to focus their time and energy on what matters most to the readers. Much like agile software development, this Hadoop Illuminated book, will continue to be iteratively refined and updated by the authors as needed. This totally makes sense for technology books as it keeps the content fresh–how many of us have obsolete printed books on old versions of software on our library bookshelves? With the rapid pace of development in Hadoop, a living, updated, online book totally makes sense.
The book is currently comprised of 17 chapters across 5 parts.
Part 1 introduces the book and the authors. Co-author Mark Kerzner from Texas has an interesting background. He is the President and CEO of SHMsoft which makes an open-source software platform for eDiscovery. His LinkedIn profile shows that he has an education background in Math, Computer Science, and Law. He is a member of Mensa and speaks 10 languages. He is clearly a very smart entrepreneur. Co-author Sujee Maniyam is the CTO of CoverCake, a social fan optimization platform for digital media. He has published a number of articles regarding Hadoop. Both Kerzner and Maniyam train people on Hadoop.
Part 2 gives a “High Level Introduction to Hadoop.” The authors write about the rise of Big Data and how Hadoop solves those problems. They then go on to explain the different concepts and components around Hadoop including HDFS, HBase, Pig, and Hive, and ZooKeeper. They offer a comparison of Hadoop with other open-source and commercial alternatives for Big Data. Most of these chapters are very readable. However, starting in Chapter 4, the authors include Java code fragments for MapReduce. If you don’t know Java, then you might be tempted to gloss over those portions. The intent wasn’t to teach us MapReduce using Java, but to illustrate how MapReduce works for anyone with any basic programming skills. The rest of the section goes into some lab exercises. If you don’t have access to a Hadoop cluster, how can you run the lab exercises? No problem, you can install Hadoop on your PC to run the exercises and tutorials. Hands-on tutorials are a must for solid learning. If you don’t want hands-on practice with Hadoop, you can go back to Chapter 5 “Hadoop for Executives” which gives a quick overview of Hadoop and how to “own” it as per typical management concerns.
Part 3 goes into “Hadoop in Depth.” In this section, the authors write a bit more details of how Hadoop works. There are portions where it seems like the authors write almost about what happens at the physical disk layer. I particularly appreciate the authors’ explanation of the various versions of Hadoop and the what are the key features with each release and related timeline.
Part 4 talks about “Hadoop Administration” However, this section is currently blank.
Part 5 covers a “Hadoop Cookbook.” There is a mixed bag of various Java MapReduce code fragments here, but often without much explanation. This section still needs to be further written by the authors.
Kudos to the authors for undertaking this effort to illuminate Hadoop to the open source community. I look forward to their future contributions to expand this book.
[Alternative spellings: llluminated Illiminated iluminated]
For further reading about Hadoop, see my review of the free Hadoop for Dummies book. If you want a more comprehensive book on Hadoop or Big Data, please consider the following books (the two Dummies books by Jones and Hurwitz will be published some time in 2013, while Tom White’s Hadoop: The Definitive Guide is an established, well-regarded, text on the matter).
|Hadoop For Dummies
by M. Tim Jones
|Big Data For Dummies
by Judith Hurwitz
|Hadoop: The Definitive Guide
by Tom White