Loading…
Wednesday, July 10 • 2:20pm - 2:40pm
Apache Nemo: A Framework for Building Distributed Dataflow Optimization Policies

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Optimizing scheduling and communication of distributed data processing for resource and data characteristics is crucial for achieving high performance. Existing approaches to such optimizations largely fall into two categories. First, distributed runtimes provide low-level policy interfaces to apply the optimizations, but do not ensure the maintenance of correct application semantics and thus often require significant effort to use. Second, policy interfaces that extend a high-level application programming model ensure correctness, but do not provide sufficient fine control. We describe Apache Nemo, an optimization framework for distributed dataflow processing that provides fine control for high performance, and also ensures correctness for ease of use. We combine several techniques to achieve this, including an intermediate representation, optimization passes, and runtime extensions. Our evaluation results show that Nemo enables composable and reusable optimizations that bring performance improvements on par with existing specialized runtimes tailored for a specific deployment scenario.

Speakers
YY

Youngseok Yang

Seoul National University
JE

Jeongyoon Eo

Seoul National University
GK

Geon-Woo Kim

Viva Republica
JY

Joo Yeon Kim

Samsung Electronics
SL

Sanha Lee

Naver Corp.
JS

Jangho Seo

Seoul National University
WW

Won Wook Song

Seoul National University
BC

Byung-Gon Chun

Seoul National University


Wednesday July 10, 2019 2:20pm - 2:40pm PDT
USENIX ATC Track II: Grand Ballroom VII–IX