Open Access
Open access
Lecture Notes in Computer Science, pages 35-44

How to Mitigate Node Failures in Hybrid Parallel Applications

Publication typeBook Chapter
Publication date2016-03-31
Q2
SJR0.606
CiteScore2.6
Impact factor
ISSN03029743, 16113349, 18612075, 18612083
Abstract
This paper describes approach to distributed node failure detection and communicator recovery in MPI applications with dynamic resource allocation. Failure detection is based on a recent proposal for user-level mitigation. The aim of this paper is to identify distributed and scalable approach for node failures detection and mitigation. Failed MPI communication recovery is realized with experimental implementation for MPI level resource allocation. Re-allocation of resources is used to replace failed node and enable application continuation with a full performance. Experimental results and performance of proposed techniques are discussed for schematic application scenarios.
Found 

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Metrics
Share
Cite this
GOST |
Cite this
GOST Copy
Szpindler M. How to Mitigate Node Failures in Hybrid Parallel Applications // Lecture Notes in Computer Science. 2016. pp. 35-44.
GOST all authors (up to 50) Copy
Szpindler M. How to Mitigate Node Failures in Hybrid Parallel Applications // Lecture Notes in Computer Science. 2016. pp. 35-44.
RIS |
Cite this
RIS Copy
TY - GENERIC
DO - 10.1007/978-3-319-32152-3_4
UR - https://doi.org/10.1007/978-3-319-32152-3_4
TI - How to Mitigate Node Failures in Hybrid Parallel Applications
T2 - Lecture Notes in Computer Science
AU - Szpindler, Maciej
PY - 2016
DA - 2016/03/31
PB - Springer Nature
SP - 35-44
SN - 0302-9743
SN - 1611-3349
SN - 1861-2075
SN - 1861-2083
ER -
BibTex
Cite this
BibTex (up to 50 authors) Copy
@incollection{2016_Szpindler,
author = {Maciej Szpindler},
title = {How to Mitigate Node Failures in Hybrid Parallel Applications},
publisher = {Springer Nature},
year = {2016},
pages = {35--44},
month = {mar}
}
Found error?