Redundancy Does Not Imply Fault Tolerance
We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous problems related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. We also find that the above outcomes arise due to fundamental problems in file-system fault handling that are common across many systems. Our results have implications for the design of next-generation fault-tolerant distributed and cloud storage systems.
Top-30
Journals
|
1
2
|
|
|
Communications in Computer and Information Science
2 publications, 9.09%
|
|
|
ACM Transactions on Modeling and Performance Evaluation of Computing Systems
1 publication, 4.55%
|
|
|
ACM Transactions on Storage
1 publication, 4.55%
|
|
|
Journal of Physics: Conference Series
1 publication, 4.55%
|
|
|
IEEE Transactions on Reliability
1 publication, 4.55%
|
|
|
IEEE Transactions on Parallel and Distributed Systems
1 publication, 4.55%
|
|
|
Studies in Computational Intelligence
1 publication, 4.55%
|
|
|
IEEE Transactions on Software Engineering
1 publication, 4.55%
|
|
|
IEEE Transactions on Computers
1 publication, 4.55%
|
|
|
Proceedings of the VLDB Endowment
1 publication, 4.55%
|
|
|
1
2
|
Publishers
|
2
4
6
8
10
|
|
|
Institute of Electrical and Electronics Engineers (IEEE)
10 publications, 45.45%
|
|
|
Association for Computing Machinery (ACM)
6 publications, 27.27%
|
|
|
Springer Nature
3 publications, 13.64%
|
|
|
IOP Publishing
1 publication, 4.55%
|
|
|
proceedings of the vldb endowment
1 publication, 4.55%
|
|
|
IntechOpen
1 publication, 4.55%
|
|
|
2
4
6
8
10
|
- We do not take into account publications without a DOI.
- Statistics recalculated weekly.