ACM Transactions on Modeling and Performance Evaluation of Computing Systems, volume 5, issue 3, pages 1-27

Toward Efficient Block Replication Management in Distributed Storage

Jianwei Liao 1
Zhibing Sha 1
Zhigang Cai 1
Zhiming Liu 1
Kenli Li 2
Wei-keng Liao 3
Alok N. Choudhary 3
Yutaka Ishiakwa 4
Publication typeJournal Article
Publication date2020-09-30
scimago Q2
SJR0.525
CiteScore2.1
Impact factor0.7
ISSN23763639, 23763647
Computer Science (miscellaneous)
Hardware and Architecture
Information Systems
Computer Networks and Communications
Software
Safety, Risk, Reliability and Quality
Media Technology
Abstract

Distributed/parallel file systems commonly suffer from load imbalance and resource contention due to the bursty characteristic exhibited in scientific applications. This article presents an adaptive scheme supporting dynamic block data replication and an efficient replica placement policy to improve the I/O performance of a distributed file system. Our goal is not only to yield a balanced data replication among storage servers but also a high degree of data access parallelism for the applications. We first present mathematical cost models to formulate the cost of data block replication by considering both the overhead and reduced data access time to the replicated data. To verify the validity and feasibility of the proposed cost model, we implement our proposal in a prototype distributed file system and evaluate it using a set of representative database-relevant application benchmarks. Our results demonstrate that the proposed approach can boost the usage efficiency of the data replicas with acceptable overhead of data replication management. Consequently, the overall data throughput of storage system can be noticeably improved. In summary, the proposed replication management scheme works well, especially for the database-relevant applications that exhibit an uneven access frequency and pattern to different parts of files.

Shwe T., Aritsugi M.
2018-11-29 citations by CoLab: 4 Abstract  
With the expansion of storage components in cloud data centers, component failures become prevalent. Although data replication can be exploited to protect against data loss, unfortunately, each time storage components fail, the burden incurred by the data block restoration process is not negligible. Re-replication should be performed in a careful manner to avoid creating a load imbalance on the remaining storage datanodes while maintaining the reliability level. In this paper, we propose PRTuner, which forecasts resource utilization for the whole cluster and tunes the re-replication rate dynamically and proactively in order to minimize performance impacts on regular cluster jobs while ensuring the reliability of the system. PRTuner also enhances proactive re-replication with an additional reactive feature that minimizes performance degradation in the case of inaccurate prediction. Simulation results demonstrate that PRTuner is able to minimize performance impacts on regular cluster jobs for both highly and lightly utilized clusters while maintaining the systems reliability.
Mansouri N., Javidi M.M.
Journal of Systems and Software scimago Q1 wos Q1
2018-10-01 citations by CoLab: 41 Abstract  
Data replication is an effective technique that decreases retrieval time, thus reducing energy consumption in Cloud. When necessary files aren't locally available, they will be fetched from remote locations that is very high-time consuming process. Therefore, it is superior to pre-replicate the popular files. Even though few previous works considered prediction-based replication strategy, the prediction is not precise at many situations and occupies the storage. To address these challenges, a new dynamic replication strategy called Prefetching-aware Data Replication (PDR) is proposed, which determines the correlation of the data files using the file access history and pre-fetches the most popular files. So, the next time that this site requires a file, it will be locally available. In addition, due to the storage space restriction, replica replacement strategy plays a vital role. PDR strategy can ascertain the importance of valuable replicas based on the fuzzy inference system with four input parameters (i.e., number of accesses, cost of replica, the last time the replica was accessed, and data availability). Extensive experiments with CloudSim show that PDR achieves high data availability, high hit ratio, low storage and bandwidth consumption. On average PDR reduces over 35% of response time when compared to the other algorithms.
He S., Sun X.
IEEE Transactions on Computers scimago Q1 wos Q2
2018-10-01 citations by CoLab: 11 Abstract  
As data volumes of high-performance computing applications continuously increase, low I/O performance becomes a fatal bottleneck of these data-intensive applications. Data replication is a promising approach to improve parallel I/O performance. However, most existing strategies are designed based on the assumption that contiguous requests are being served more efficiently than non-contiguous requests, which is not necessarily true in a parallel I/O system. The reason is that the multiple-server data distribution makes the favorable accesses between contiguous requests and non-contiguous ones indeterminate. In this study, we propose CEDA, a cost-effective distribution-aware data replication scheme to better support parallel I/O systems. As logical file access information is inefficient to make replication decisions in a parallel environment, CEDA considers physical data accesses on servers in both data selection and data placement during a parallel replication process. Specifically, CEDA first proposes a distribution-aware cost model to evaluate the file request time with a given data layout, and then it carries out cost-effective data replication based on replication benefit analysis. We have implemented CEDA as a part of the MPI I/O library in light of high portability on top of the OrangeFS file system. By replaying representative benchmarks and a real application, we collected comprehensive experimental results on both HDD- and SSD-based servers and conclude that CEDA can significantly improve parallel I/O system performance.
Cui L., Zhang J., Yue L., Shi Y., Li H., Yuan D.
2018-07-01 citations by CoLab: 63 Abstract  
Cloud computing is a promising distributed computing platform for big data applications, e.g., scientific applications, since excessive resources can be obtained from cloud services for processing and storing both existing and generated application datasets. However, when tasks process big data stored in distributed data centers, the inevitable data movements will cause huge bandwidth cost and execution delay. In this paper, we construct a tripartite graph based model to formulate the data replica placement problem and propose a genetic algorithm based data replica placement strategy for scientific applications to reduce data transmissions in cloud. Our approach can reduce 1) the size of moved data, 2) the time of data movement and 3) the number of movements. We conduct experiments to compare the proposed strategy with the random placement strategy used in Hadoop Distributed Files System (HDFS), which demonstrates that our strategy has better performance for scientific applications in clouds.
Liao J., Cai Z., Trahay F., Zhou J., Xiao G.
2018-06-30 citations by CoLab: 1 Abstract  
Many problems in science and engineering are usually emulated as a set of mutually interacting models, resulting in a coupled or multiphysics application. These component models show challenges originating from their interdisciplinary nature and from their computational and algorithmic complexities. In general, these models are independently developed and maintained, so that they commonly employ the global file system for exchanging their data in the coupled application. To effectively use the local file cache on the compute node for exchanging the data among the processes of such applications, and consequently boosting I/O performance, this article presents a novel mechanism to migrate a process from one compute node to another node on the basis of block I/O dependency. In this newly proposed mechanism, the block I/O dependency between two involved processes running on the different nodes is profiled as block access similarity by taking advantage of the Cohen’s kappa statistic . Then, the process is supposed to be dynamically migrated from its source node to the destination node, on which there is another process having heavy block I/O dependency. As a result, both processes can exchange their data by utilizing the local file cache instead of the global file system to reduce I/O time. The experimental results demonstrate that the I/O performance can be significantly improved, and the time required for executing the application can be resultantly decreased, as expected.
Liao J., Cai Z., Trahay F., Peng X.
IEEE Access scimago Q1 wos Q2 Open Access
2018-06-29 citations by CoLab: 10 Abstract  
This paper proposes a new data placement policy to allocate data blocks across storage servers of the distributed/parallel file systems, for yielding even block access workload distribution. To this end, we first analyze the history of block access sequence of a specific application and then introduce a k-partition algorithm to divide data blocks into multiple groups, by referring their access frequency. After that, each group has almost the same access workloads, and we can thus distribute these block groups onto storage servers of the distributed file system, to achieve the goal of uniformly assigning data blocks when running the application. In summary, this newly proposed data placement policy can yield not only an even data distribution but also the block data access balance. The experimental results show that the proposed scheme can greatly reduce I/O time and better improve utilization of storage servers when running the database-relevant applications, compared with the commonly used block data placement strategy, i.e., the round-robin placement policy.
Aral A., Ovatman T.
2018-06-01 citations by CoLab: 99 Abstract  
As the devices that make up the Internet become more powerful, algorithms that orchestrate cloud systems are on the verge of putting more responsibility for computation and storage on these devices. In our current age of Big Data, dissemination and storage of data across end cloud devices is becoming a prominent problem subject to this expansion. In this paper, we propose a distributed data dissemination approach that relies on dynamic creation/replacement/removal of replicas guided by continuous monitoring of data requests coming from edge nodes of the underlying network. Our algorithm exploits geographical locality of data during the dissemination process due to the plenitude of common data requests that stem from the clients within a close proximity. Our results using both real-world and synthetic data demonstrate that a decentralized replica placement approach provides significant cost benefits compared to client side caching that is widely used in traditional distributed systems.
Huang D., Han D., Wang J., Yin J., Chen X., Zhang X., Zhou J., Ye M.
IEEE Transactions on Computers scimago Q1 wos Q2
2018-03-01 citations by CoLab: 27 Abstract  
The distributed file system, HDFS, is widely deployed as the bedrock for many parallel big data analysis. However, when running multiple parallel applications over the shared file system, the data requests from different processes/executors will unfortunately be served in a surprisingly imbalanced fashion on the distributed storage servers. These imbalanced access patterns among storage nodes are caused because a). unlike conventional parallel file system using striping policies to evenly distribute data among storage nodes, data-intensive file system such as HDFS store each data unit, referred to as chunk file, with several copies based on a relative random policy, which can result in an uneven data distribution among storage nodes; b). based on the data retrieval policy in HDFS, the more data a storage node contains, the higher probability the storage node could be selected to serve the data. Therefore, on the nodes serving multiple chunk files, the data requests from different processes/executors will compete for shared resources such as hard disk head and networkbandwidth, resulting in a degraded I/O performance. In this paper, we first conduct a complete analysis on how remote and imbalanced read/write patterns occur and how they are affected by the size of the cluster. We then propose novel methods, referred to as Opass, to optimize parallel data reads, as well as to reduce the imbalance of parallel writes on distributed file systems. Our proposed methods can benefit parallel data-intensive analysis with various parallel data access strategies. Opass adopts new matching-based algorithms to match processes to data so as to compute the maximum degree of data locality and balanced data access. Furthermore, to reduce the imbalance of parallel writes, Opass employs a heatmap for monitoring the I/O statuses of storage nodes and performs HM-LRU policy to select a local optimal storage node for serving write requests. Experiments are conducted on PRObE's Marmot 128-node cluster testbed and the results from both benchmark and well-known parallel applications show the performance benefits and scalability of Opass.
Guerrero C., Lera I., Juiz C.
Journal of Grid Computing scimago Q1 wos Q1
2018-02-14 citations by CoLab: 26 Abstract  
This work addresses the optimization of file locality, file availability, and replica migration cost in a Hadoop architecture. Our optimization algorithm is based on the Non-dominated Sorting Genetic Algorithm-II and it simultaneously determines file block placement, with a variable replication factor, and MapReduce job scheduling. Our proposal has been tested with experiments that considered three data center sizes (8, 16 and 32 nodes) with the same workload and number of files (150 files and 3519 file blocks). In general terms, the use of a placement policy with a variable replica factor obtains higher improvements for our three optimization objectives. On the contrary, the use of a job scheduling policy only improves these objectives when it is used along a variable replication factor. The results have also shown that the migration cost is a suitable optimization objective as significant improvements up to 34% have been observed between the experiments.
Rao Chandakanna V.
2018-02-01 citations by CoLab: 11 Abstract  
The Hadoop Distributed File System (HDFS) (Shvachko et al., 2010) is a highly scalable and fault-tolerant distributed file system that can be deployed on low-cost hardware. The content stored in the HDFS is partitioned into blocks, and the blocks are replicated on multiple Data Nodes. Different block placement strategies can be used to make it highly fault-tolerant and to improve throughput and access time. The HDFS allows the user to (i) specify the block size to be used for partitioning a given file, (ii) issue only sequential read and append operations. It does not allow the user to perform random read and random write operations. This paper proposes an enhanced HDFS (REHDFS) that (i) explores different block placement and block read strategies, (ii) implements random read and random write operations. The proposed architecture is implemented and evaluated. The proposed load based block access strategy performed better than the other block retrieval strategies. The random read feature is implemented and evaluated. Pessimistic and optimistic models to implement the random write are proposed and evaluated.
Wang J., Zhang X., Zhang J., Yin J., Han D., Wang R., Huang D.
2017-10-01 citations by CoLab: 5 Abstract  
During the last few decades, Data-intensive File Systems (DiFS), such as Google File System (GFS) and Hadoop Distributed File System (HDFS) have become the key storage architectures for big data processing. These storage systems usually divide files into fixed-sized blocks (or chunks). Each block is replicated (usually three-way) and distributed pseudo-randomly across the cluster. The master node (namenode) uses a huge table to record the locations of each block and its replicas. However, with the increasing size of the data, the block location table and its corresponding maintenance could occupy more than half of the memory space and 30% of processing capacity in master node, which severely limit the scalability and performance of master node. We argue that the physical data distribution and maintenance should be separated out from the metadata management and performed by each storage node autonomously. In this paper, we propose Deister, a novel block management scheme that is built on an invertible deterministic declustering distribution method called Intersected Shifted Declustering (ISD). Deister is amendable to current research on scaling the namespace management in master node. In Deister, the huge table for maintaining the block locations in the master node is eliminated and the maintenance of the block-node mapping is performed autonomously on each data node. Results show that as compared with the HDFS default configuration, Deister is able to achieve identical performance with a saving of about half of the RAM space and 30% of processing capacity in master node and is expected to scale to double the size of current single namenode HDFS cluster, pushing the scalability bottleneck of master node back to namespace management.
Ganesan A., Alagappan R., Arpaci-Dusseau A.C., Arpaci-Dusseau R.H.
ACM Transactions on Storage scimago Q2 wos Q2
2017-08-31 citations by CoLab: 19 Abstract  
We analyze how modern distributed storage systems behave in the presence of file-system faults such as data corruption and read and write errors. We characterize eight popular distributed storage systems and uncover numerous problems related to file-system fault tolerance. We find that modern distributed systems do not consistently use redundancy to recover from file-system faults: a single file-system fault can cause catastrophic outcomes such as data loss, corruption, and unavailability. We also find that the above outcomes arise due to fundamental problems in file-system fault handling that are common across many systems. Our results have implications for the design of next-generation fault-tolerant distributed and cloud storage systems.
Lin Y., Shen H.
2017-04-01 citations by CoLab: 15 Abstract  
In data intensive clusters, a large amount of files are stored, processed and transferred simultaneously. To increase the data availability, some file systems create and store three replicas for each file in randomly selected servers across different racks. However, they neglect the file heterogeneity and server heterogeneity, which can be leveraged to further enhance data availability and file system efficiency. As files have heterogeneous popularities, a rigid number of three replicas may not provide immediate response to an excessive number of read requests to hot files, and waste resources (including energy) for replicas of cold files that have few read requests. Also, servers are heterogeneous in network bandwidth, hardware configuration and capacity (i.e., the maximal number of service requests that can be supported simultaneously), it is crucial to select replica servers to ensure low replication delay and request response delay. In this paper, we propose an Energy-Efficient Adaptive File Replication System (EAFR), which incorporates three components. It is adaptive to time-varying file popularities to achieve a good tradeoff between data availability and efficiency. Higher popularity of a file leads to more replicas and vice versa. Also, to achieve energy efficiency, servers are classified into hot servers and cold servers with different energy consumption, and cold files are stored in cold servers. EAFR then selects a server with sufficient capacity (including network bandwidth and capacity) to hold a replica. To further improve the performance of EAFR, we propose a dynamic transmission rate adjustment strategy to prevent potential incast congestion when replicating a file to a server, a networkaware data node selection strategy to reduce file read latency, and a load-aware replica maintenance strategy to quickly create file replicas under replica node failures. Experimental results on a real-world cluster show the effectiveness of EAFR and proposed strategies in reducing file read latency, replication time, and power consumption in large clusters.
Stavrinides G.L., Duro F.R., Karatza H.D., Blas J.G., Carretero J.
2017-01-01 citations by CoLab: 28 Abstract  
As large-scale distributed systems gain momentum, the scheduling of workflow applications with multiple requirements in such computing platforms has become a crucial area of research. In this paper, we investigate the workflow scheduling problem in large-scale distributed systems, from the Quality of Service (QoS) and data locality perspectives. We present a scheduling approach, considering two models of synchronization for the tasks in a workflow application: (a) communication through the network and (b) communication through temporary files. Specifically, we investigate via simulation the performance of a heterogeneous distributed system, where multiple soft real-time workflow applications arrive dynamically. The applications are scheduled under various tardiness bounds, taking into account the communication cost in the first case study and the I/O cost and data locality in the second. The simulation results provide useful insights into the impact of tardiness bound and data locality on the system performance.
Chen L., Qiu M., Song J., Xiong Z., Hassan H.
Journal of Supercomputing scimago Q2 wos Q2
2016-08-27 citations by CoLab: 20 Abstract  
In cloud storage, replication technologies are essential to fault tolerance and high availability of data. While achieving the goal of high availability, replication brings extra number of active servers to the storage system. Extra active servers mean extra power consumption and capital expenditure. Furthermore, the lack of classification of data makes replication scheme fixed at the very beginning. This paper proposes an elastic and efficient file storage called E2FS for big data applications. E2FS can dynamically scale in/out the storage system based on real-time demands of big data applications. We adopt a novel replication scheme based on data blocks, which provides a fine-grained maintenance of the data in the storage system. E2FS analyzes features of data and makes dynamic replication decision to balance the cost and performance of cloud storage. To evaluate the performance of proposed work, we implement a prototype of E2FS and compare it with HDFS. Our experiments show E2FS can outperform HDFS in elasticity while achieving guaranteed performance for big data applications.
Sundara Kumar M.R., Mohan H.S.
2024-04-18 citations by CoLab: 10 Abstract  
Big Data Analytics (BDA) is an unavoidable technique in today’s digital world for dealing with massive amounts of digital data generated by online and internet sources. It is kept in repositories for data processing via cluster nodes that are distributed throughout the wider network. Because of its magnitude and real-time creation, big data processing faces challenges with latency and throughput. Modern systems such as Hadoop and SPARK manage large amounts of data with their HDFS, Map Reduce, and In-Memory analytics approaches, but the migration cost is higher than usual. With Genetic Algorithm-based Optimization (GABO), Map Reduce Scheduling (MRS) and Data Replication have provided answers to this challenge. With multi objective solutions provided by Genetic Algorithm, resource utilization and node availability improve processing performance in large data environments. This work develops a novel creative strategy for enhancing data processing performance in big data analytics called Map Reduce Scheduling Based Non-Dominated Sorting Genetic Algorithm (MRSNSGA). The Hadoop-Map Reduce paradigm handles the placement of data in distributed blocks as a chunk and their scheduling among the cluster nodes in a wider network. Best fit solutions with high latency and low accessing time are extracted from the findings of various objective solutions. Experiments were carried out as a simulation with several inputs of varied location node data and cluster racks. Finally, the results show that the speed of data processing in big data analytics was enhanced by 30–35% over previous methodologies. Optimization approaches developed to locate the best solutions from multi-objective solutions at a rate of 24–30% among cluster nodes.

Top-30

Journals

1
1

Publishers

1
1
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?