VLDB Journal

A systematic evaluation of machine learning on serverless infrastructure

Тип публикации: Journal Article

Дата публикации: 2023-09-20

Springer Nature

VLDB Journal

scimago Q1

wos Q2

white level БС1

SJR: 1.176

CiteScore: 8.3

Impact factor: 3.8

ISSN: 10668888, 0949877X

DOI: 10.1007/s00778-023-00813-0

Скопировать DOI

Hardware and Architecture

Information Systems

Краткое описание

Recently, the serverless paradigm of computing has inspired research on its applicability to data-intensive tasks such as ETL, database query processing, and machine learning (ML) model training. Recent efforts have proposed multiple systems for training large-scale ML models in a distributed manner on top of serverless infrastructures (e.g., AWS Lambda). Yet, there is so far no consensus on the design space for such systems when compared with systems built on top of classical “serverful” infrastructures. Indeed, a variety of factors could impact the performance of training ML models in a distributed environment, such as the optimization algorithm used and the synchronization protocol followed by parallel executors, which must be carefully considered when designing serverless ML systems. To clarify contradictory observations from previous work, in this paper we present a systematic comparative study of serverless and serverful systems for distributed ML training. We present a design space that covers design choices made by previous systems on aspects such as optimization algorithms and synchronization protocols. We then implement a platform, LambdaML , that enables a fair comparison between serverless and serverful systems by navigating the aforementioned design space. We further improve LambdaML toward automatic support by designing a hyper-parameter tuning framework that leverages the ability of serverless infrastructure. We present empirical evaluation results using LambdaML on both single training jobs and multi-tenant workloads. Our results reveal that there is no “one size fits all” serverless solution given the current state of the art—one must choose different designs for different ML workloads. We also develop an analytic model based on the empirical observations to capture the cost/performance tradeoffs that one has to consider when deciding between serverless and serverful designs for distributed ML training.

Для доступа к списку цитирований публикации необходимо авторизоваться.

Войти с ORCID

Топ-30

Журналы

	1 2
Computing (Vienna/New York)	Computing (Vienna/New York), 2, 50% Computing (Vienna/New York) 2 публикации, 50%
Cluster Computing	Cluster Computing, 1, 25% Cluster Computing 1 публикация, 25%
	1 2

Издатели

	1 2 3
Springer Nature	Springer Nature, 3, 75% Springer Nature 3 публикации, 75%
Institute of Electrical and Electronics Engineers (IEEE)	Institute of Electrical and Electronics Engineers (IEEE), 1, 25% Institute of Electrical and Electronics Engineers (IEEE) 1 публикация, 25%
	1 2 3

Мы не учитываем публикации, у которых нет DOI.
Статистика публикаций обновляется еженедельно.

Вы ученый?

Создайте профиль, чтобы получать персональные рекомендации коллег, конференций и новых статей.

Войти с ORCID

Метрики

Цитировать

ГОСТ |

Цитировать

ГОСТ Скопировать

Jiang J. et al. A systematic evaluation of machine learning on serverless infrastructure // VLDB Journal. 2023.

ГОСТ со всеми авторами (до 50) Скопировать

Jiang J., Gan S., Du B., ALONSO G., Klimovic A., Singla A., Wu W., Wang S., Zhang C. A systematic evaluation of machine learning on serverless infrastructure // VLDB Journal. 2023.

RIS |

Цитировать

RIS Скопировать

TY - JOUR

DO - 10.1007/s00778-023-00813-0

UR - https://doi.org/10.1007/s00778-023-00813-0

TI - A systematic evaluation of machine learning on serverless infrastructure

T2 - VLDB Journal

AU - Jiang, Jiawei

AU - Gan, Shaoduo

AU - Du, Bo

AU - ALONSO, GUSTAVO

AU - Klimovic, Ana

AU - Singla, Ankit

AU - Wu, Wentao

AU - Wang, Sheng

AU - Zhang, Ce

PY - 2023

DA - 2023/09/20

PB - Springer Nature

SN - 1066-8888

SN - 0949-877X

ER -

BibTex

Цитировать

BibTex (до 50 авторов) Скопировать

@article{2023_Jiang,

author = {Jiawei Jiang and Shaoduo Gan and Bo Du and GUSTAVO ALONSO and Ana Klimovic and Ankit Singla and Wentao Wu and Sheng Wang and Ce Zhang},

title = {A systematic evaluation of machine learning on serverless infrastructure},

journal = {VLDB Journal},

year = {2023},

publisher = {Springer Nature},

month = {sep},

url = {https://doi.org/10.1007/s00778-023-00813-0},

doi = {10.1007/s00778-023-00813-0}

}

Ошибка в публикации?

Издатель

Springer Nature

Журнал

VLDB Journal

scimago Q1

wos Q2

white level БС1

SJR

1.176

CiteScore

8.3

Impact factor

3.8

ISSN

10668888 (Print)

0949877X (Electronic)