The Next Wave of SQL: Building Analytic Applications for Big Data with Impala
Apache Hadoop has become an attractive platform for exabyte-capacity data storage. In the analytic applications area in particular, Hadoop now increasingly serves as complementary technology for cost-efficient data loading and cleaning, supporting advanced analysis and reporting on relational data. Furthermore, thanks to recent advances in the Hadoop ecosystem, it is now also possible for Hadoop users to build analytic applications natively on Big Data for data exploration and discovery.
In this session, attendees will get an architect-level view of this solution (comprising HDFS, Impala, and the Parquet columnar storage format) and explore an example configuration and benchmark numbers that demonstrate how it offers a high level of performance, functionality, and ability to handle a multi-user workload, while retaining Hadoop's traditional strengths of flexibility and ease of scaling.
Marcel Kornacker is the architect of Impala, and its tech lead at Cloudera. He has a PhD in databases from UC Berkeley.
Prior to Cloudera, Marcel worked at Google, where he was the tech lead for the distributed query engine component of Google's F1 project.