GPUs for Graph and Predictive Analytics
Apache Spark and GPUs are two of the biggest stories in Data Science and Analytics in the past year. Apache Spark makes it easy to build analytics over large scale data sets. NVIDIA GPUs provide the computational power for machine and deep learning challenges.
However, It is non-trivial to exploit the power of GPUs and scale applications onto multi-core, parallel architectures. GPU algorithms not only require significant expertise to develop, but also intimate knowledge of the CPU and GPU memory systems, and detailed knowledge of the Compute Unified Device Architecture (CUDA), Writing fast, efficient data analytics for graph and machine learning on GPUs can be hard due to the complexities of CUDA and achieving effective parallelism.
DASL and SPARQL are high-level languages for graph and machine learning algorithms (DASL) and graph pattern matching (SPARQL) that provide speedups of up to 1,000x over Spark native and up to 300x over leading graph databases when executed on the BlazeGraph platform.
We will present Blazegraph GPU benchmarking results against our SPARQL-enabled, non-GPU Blazegraph platform over the Lehigh University Benchmark (LUBM) and Berlin Sparql Benchmark (BSBM) demonstrating 200-300X speed-up. Users of the RDF/SPARQL API are able to achieve this acceleration simply by changing to the GPU-enabled platform without underlying code or application changes.
We will present Blazegraph DASL (pronounced 'dazzle') for graph and predictive analytics that combines the ease of Spark with the speed of GPUs. It has shown 1000X acceleration for large graphs when compared to in-memory processing with GraphX. DASL is a Scala-based language provides graphs and machine learning algorithms over linear algebra primitives. DASL programs are translated into task graphs that expose the available parallelism. The underlying Blazegraph DASL runtime integrates closely with Apache Spark. It provides the ability to write and execute DASL programs in Apache Spark and delivers a distributed, scalable architecture for machine learning and graph algorithms on GPUs and GPU clusters within the Spark environment.
Brad is the CEO of Blazegraph leading efforts to deliver graphs at scale with Blazegraph products. An expert in graphs and large-scale analytics, he has a diverse background in software developments, telecommunications, and information retrieval.