Back To Schedule
Monday, July 8 • 2:00pm - 2:15pm
DLion: Decentralized Distributed Deep Learning in Micro-Clouds

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Deep learning is a popular technique for building inference models and classifiers from large quantities of input data for applications in many domains. With the proliferation of edge devices such as sensor and mobile devices, large volumes of data are generated at rapid pace all over the world. Migrating large amounts of data into centralized data center(s) over WAN environments is often infeasible due to cost, performance or privacy reasons. Moreover, there is an increasing need for incremental or online deep learning over newly generated data in real-time. These trends require rethinking of the traditional training approach to deep learning. To handle the computation on distributed input data, micro-clouds, small-scale clouds deployed near edge devices in many different locations, provide an attractive alternative for data locality reasons. However, existing distributed deep learning systems do not support training in micro-clouds, due to the unique characteristics and challenges in this environment. In this paper, we examine the key challenges of deep learning in micro-clouds: computation and network resource heterogeneity at inter- and intra micro-cloud levels and their scale. We present DLion, a decentralized distributed deep learning system for such environments. It employs techniques specifically designed to address the above challenges to reduce training time, enhance model accuracy, and provide system scalability. We have implemented a prototype of DLion in TensorFlow and our preliminary experiments show promising results towards achieving accurate and efficient distributed deep learning in micro-clouds.


Rankyung Hong

University of Minnesota

Abhishek Chandra

University of Minnesota

Monday July 8, 2019 2:00pm - 2:15pm PDT
HotCloud: Grand Ballroom VII–IX