Learn How to Manage & Process Big Data Using Open Source Tools
This session will cover an overview of big data processing platforms, how to use them, the similarities and differences, how programming in ECL works and how to integrate with external code in C++, Java and via Web Services. We will also discuss how to load data, perform ETL, how to profile and index data and how to access the final result from a web-based front end.
Drea Leed specializes in big data analytics, entity resolution and data correlation using massively parallel processing systems. Drea works at Lexis Nexis Special Services designing and developing large-scale supercomputing applications for Government customers using the Lexis Nexis HPCC architecture.
These applications involve implementing cutting edge search algorithms for data retrieval across enormous (petabyte+) datasets, and the use of heuristics, fuzzy matching, natural language processing and pattern recognition for data analysis and extrapolation across those datasets.
Drea's first-role at Lexis Nexis was in 1997 where she worked on the ScienceDirect project, a web-based searchable database of over 13,000,000 scientific articles, where she performed extensive data analytics on both Oracle and SQL Server.