I found out today that you can download the book, Cloudera Impala, in PDF format, for free from the Cloudera website, in association with the Strata Conference and Hadoop World in New York this week. See the below link for the book info from the publisher as well as the link to download the free book (registration with the Cloudera website may be required).
My Review of Cloudera Impala
Title: Cloudera Impala
Author: John Russell
Publisher: O’Reilly Media, Inc.
Publication Year: 2013
The Cloudera Impala book consists of 29 pages of content organized into 8 chapters. It’s intended to be a quick technical overview of the Impala framework for Hadoop. Like the fleet-footed African antelope that it is named after, the Apache Impala open-source project created by Cloudera brings speed and agility to database analysis on HDFS for analysts and data scientists. Impala presents a SQL database-like interface to Hadoop, and is optimized to minimize query latencies by bypassing MapReduce. In my experience with using Hive SQL (a predecessor of Impala) on Hadoop, writing analytical queries was quick to do because the SQL syntax is so familiar, but the “big data” results may still take at least 5-15 minutes to return. If Impala is as lightning fast as it claims to be, then the extra speed and nimbleness it gives to analysis work is very welcomed!
The goals of the Impala project toward real-time querying of big data is similar to what Google Dremel and Apache Drill are doing. However, compared to Google’s proprietary Dremel, Impala is open source with the Apache license, and free for all to download and use.
I noticed that the Cloudera Impala book does not feature any diagrams, presumably because the author doesn’t spend much time discussing the nitty-gritty details of technical architectures. However, the book does have numerous SQL code examples which are easy to understand by anyone who has had some familiarity with relational databases. The SQL code examples remind me of MySQL, particularly with the ASCII-drawn boxes surrounding the query results.
Ultimately, the book is directed toward three groups: analysts familiar with SQL relational databases, Unix system administrators, and those coming from other Hadoop experience such as Hive, HBase, and Pig Latin. The author explains the purpose of Impala and gives a sense of how to use it–all while avoiding lines of Java code!
You can read through this 29-page book for a quick overview of Impala within a few minutes. Read it to evaluate Impala versus the Apache Drill and Apache Spark/Shark alternatives. If you want a more comprehensive book on Hadoop or Big Data, consider the following books.
Oracle Big Data Handbook
by Tom Plunkett, et al.
by Edward Capriolo
Hadoop: The Definitive Guide
by Tom White