A systematic evaluation of machine learning on serverless infrastructure

Тип публикацииJournal Article
Дата публикации2023-09-20
scimago Q1
wos Q2
white level БС1
SJR1.176
CiteScore8.3
Impact factor3.8
ISSN10668888, 0949877X
Hardware and Architecture
Information Systems
Краткое описание
Recently, the serverless paradigm of computing has inspired research on its applicability to data-intensive tasks such as ETL, database query processing, and machine learning (ML) model training. Recent efforts have proposed multiple systems for training large-scale ML models in a distributed manner on top of serverless infrastructures (e.g., AWS Lambda). Yet, there is so far no consensus on the design space for such systems when compared with systems built on top of classical “serverful” infrastructures. Indeed, a variety of factors could impact the performance of training ML models in a distributed environment, such as the optimization algorithm used and the synchronization protocol followed by parallel executors, which must be carefully considered when designing serverless ML systems. To clarify contradictory observations from previous work, in this paper we present a systematic comparative study of serverless and serverful systems for distributed ML training. We present a design space that covers design choices made by previous systems on aspects such as optimization algorithms and synchronization protocols. We then implement a platform, LambdaML , that enables a fair comparison between serverless and serverful systems by navigating the aforementioned design space. We further improve LambdaML toward automatic support by designing a hyper-parameter tuning framework that leverages the ability of serverless infrastructure. We present empirical evaluation results using LambdaML on both single training jobs and multi-tenant workloads. Our results reveal that there is no “one size fits all” serverless solution given the current state of the art—one must choose different designs for different ML workloads. We also develop an analytic model based on the empirical observations to capture the cost/performance tradeoffs that one has to consider when deciding between serverless and serverful designs for distributed ML training.
Для доступа к списку цитирований публикации необходимо авторизоваться.

Топ-30

Журналы

1
2
Computing (Vienna/New York)
2 публикации, 50%
Cluster Computing
1 публикация, 25%
1
2

Издатели

1
2
3
Springer Nature
3 публикации, 75%
Institute of Electrical and Electronics Engineers (IEEE)
1 публикация, 25%
1
2
3
  • Мы не учитываем публикации, у которых нет DOI.
  • Статистика публикаций обновляется еженедельно.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.
Метрики
4
Поделиться
Цитировать
ГОСТ |
Цитировать
Jiang J. et al. A systematic evaluation of machine learning on serverless infrastructure // VLDB Journal. 2023.
ГОСТ со всеми авторами (до 50) Скопировать
Jiang J., Gan S., Du B., ALONSO G., Klimovic A., Singla A., Wu W., Wang S., Zhang C. A systematic evaluation of machine learning on serverless infrastructure // VLDB Journal. 2023.
RIS |
Цитировать
TY - JOUR
DO - 10.1007/s00778-023-00813-0
UR - https://doi.org/10.1007/s00778-023-00813-0
TI - A systematic evaluation of machine learning on serverless infrastructure
T2 - VLDB Journal
AU - Jiang, Jiawei
AU - Gan, Shaoduo
AU - Du, Bo
AU - ALONSO, GUSTAVO
AU - Klimovic, Ana
AU - Singla, Ankit
AU - Wu, Wentao
AU - Wang, Sheng
AU - Zhang, Ce
PY - 2023
DA - 2023/09/20
PB - Springer Nature
SN - 1066-8888
SN - 0949-877X
ER -
BibTex
Цитировать
BibTex (до 50 авторов) Скопировать
@article{2023_Jiang,
author = {Jiawei Jiang and Shaoduo Gan and Bo Du and GUSTAVO ALONSO and Ana Klimovic and Ankit Singla and Wentao Wu and Sheng Wang and Ce Zhang},
title = {A systematic evaluation of machine learning on serverless infrastructure},
journal = {VLDB Journal},
year = {2023},
publisher = {Springer Nature},
month = {sep},
url = {https://doi.org/10.1007/s00778-023-00813-0},
doi = {10.1007/s00778-023-00813-0}
}
Ошибка в публикации?