IEEE Transactions on Parallel and Distributed Systems, volume 29, issue 12, pages 2882-2895

A Survey of Desktop Grid Scheduling

Publication typeJournal Article
Publication date2018-12-01
scimago Q1
SJR2.340
CiteScore11.0
Impact factor5.6
ISSN10459219, 15582183, 21619883
Hardware and Architecture
Computational Theory and Mathematics
Signal Processing
Abstract
The paper surveys the state of the art of task scheduling in Desktop Grid computing systems. We describe the general architecture of a Desktop Grid system and the computing model adopted by the BOINC middleware. We summarize research papers to bring together and examine the optimization criteria and methods proposed by researchers so far for improving Desktop Grid task scheduling (assigning tasks to computing nodes by a server). In addition, we review related papers, which address non-regular architectures, like hierarchical and peer-to-peer Desktop Grids, as well as Desktop Grids designed for solving interdependent tasks.
Jung D., Jeong H., Kim J., Lee D., Kim M.
Cluster Computing scimago Q1 wos Q1
2017-12-30 citations by CoLab: 2 Abstract  
Latest scientific application and simulation are requiring powerful computing resources. To get the sufficient resources, more and more computer experts are constructing high-end supercomputer, cluster computing or cloud computing. Although most corporations are starting to use high performance computing by organizing cluster, they rarely add some extra resources dynamically when computing power is not sufficient. In addition, they are also expensive to use insufficient resources as cloud computing resources. To overcome resource limitations when using cloud computing to conduct large computations, we exploit underutilized personal desktops by organizing a heterogeneous cluster environment. An open-source based scheduler, Son of Grid Engine (SGE) is provided in the integrated environment to manage or control the extra computing resources. Two kinds of groups are classified according to the individuals degree of utilization: a full-time group that provides resources at any time, and a custom-time group that provides resources only at designated times. We developed a resource manager that can turn on/off and schedule the SGE execution daemon; this manager performs as the core engine. The resource running time manager efficiently allocates and controls the members of affiliated computing resources.
Kolokoltsev Y., Ivashko E., Gershenson C.
Open Engineering scimago Q3 wos Q2 Open Access
2017-12-29 citations by CoLab: 3 PDF Abstract  
AbstractA regular Desktop Grid bag-of-tasks project can take a lot of time to complete computations. An important part of the process is tail computations: when the number of tasks to perform becomes less than the number of computing nodes. At this stage, a dynamic replication could be used to reduce the time needed to complete computations. In this paper, we propose a mathematical model and a strategy of dynamic replication at the tail stage. The results of the numerical experiments are given.
Nikitina N., Ivashko E., Tchernykh A.
2017-12-20 citations by CoLab: 8 Abstract  
In virtual drug screening, the chemical diversity of hits is an important factor, along with their predicted activity. Moreover, interim results are of interest for directing the further research, and their diversity is also desirable. In this paper, we consider a problem of obtaining a diverse set of virtual screening hits in a short time. To this end, we propose a mathematical model of task scheduling for virtual drug screening in high-performance computational systems as a congestion game between computational nodes to find the equilibrium solutions for best balancing the number of interim hits with their chemical diversity. The model considers the heterogeneous environment with workload uncertainty, processing time uncertainty, and limited knowledge about the input dataset structure. We perform computational experiments and evaluate the performance of the developed approach considering organic molecules database GDB-9. The used set of molecules is rich enough to demonstrate the feasibility and practicability of proposed solutions. We compare the algorithm with two known heuristics used in practice and observe that game-based scheduling outperforms them by the hit discovery rate and chemical diversity at earlier steps. Based on these results, we use a social utility metric for assessing the efficiency of our equilibrium solutions and show that they reach greatest values.
Shang Y., Shang L.
2017-10-05 citations by CoLab: 1 Abstract  
Summary Resource volatility is a major challenge on desktop grid platforms with characteristics that primarily depend on human computer usage behavior. This study proposes a trust model based on Dempster–Shafer theory that predicts the relative reliability of nodes using information on daily computer usage behavior based on the historical information from a desktop grid platform for a desktop grid. In the proposed trust model, called TM-DG, a degree of trustworthiness is introduced for the nodes to quantify their reliability. Dempster's rule of combination is also applied to exploit two bodies of independent evidence: 1) current node availability as actively probed by a special test node and 2) proportion of allocated tasks successfully completed. The simulations performed on a lightweight desktop grid platform illustrate how TM-DG can fully utilize the most reliable nodes for a given computation, leading to a reduction in the communication overhead and an improvement in the computing power of the platform.
Alonso-Monsalve S., García-Carballeira F., Calderón A.
2017-09-01 citations by CoLab: 13 Abstract  
Volunteer Computing is a type of distributed computing in which ordinary people donate their idle computer time to science projects like SETI@Home, Climateprediction.net and many others. In a similar way, Desktop Grid Computing is a form of distributed computing in which an organization uses its existing computers to handle its own long-running computational tasks. BOINC is the main middleware that provides a software platform for Volunteer Computing and desktop grid computing, and it became generalized as a platform for distributed applications in areas as diverse as mathematics, medicine, molecular biology, climatology, environmental science, and astrophysics. In this paper we present a complete simulator of BOINC infrastructures, called ComBoS. Although there are other BOINC simulators, none of them allow us to simulate the complete infrastructure of BOINC. Our goal was to create a complete simulator that, unlike the existing ones, could simulate realistic scenarios taking into account the whole BOINC infrastructure, that other simulators do not consider: projects, servers, network, redundant computing, scheduling, and volunteer nodes. The outputs of the simulations allow us to analyze a wide range of statistical results, such as the throughput of each project, the number of jobs executed by the clients, the total credit granted and the average occupation of the BOINC servers. The paper describes the design of ComBoS and the results of the validation performed. This validation compares the results obtained in ComBoS with the real ones of three different BOINC projects (Einstein@Home, SETI@Home and LHC@Home). Besides, we analyze the performance of the simulator in terms of memory usage and execution time. The paper also shows that our simulator can guide the design of BOINC projects, describing some case studies using ComBoS that could help designers verify the feasibility of BOINC projects.
Kianpisheh S., Kargahi M., Charkari N.M.
2017-08-01 citations by CoLab: 8 Abstract  
Large scale distributed systems employ thousands of resources which inevitably suffer from the unavailability issue. Serious side effects like unexpected delay or failure in the application execution are probable in case of such an issue. The imposed outcome might then be catastrophic consequences for real time applications or penalties for the service providers. Better prediction of the resource unavailability helps diminishing the undesired outcomes. This paper proposes a resource availability prediction algorithm for the mentioned goal. The resource availability variation is modeled as a stochastic process. By analyzing the availability information of NDU resources and both physical and virtual machines of the PlantLab, we found that the transition probabilities among the availability levels are non-stationary. To cope with this characteristic, we introduce Availability Transition Patterns (ATPs); the ATPs are dynamically constructed and the transitions between them are modeled by a Markov chain. The future ATP is then predicted based on the constructed Markov chain, according to which the resource availability-level is predicted. Experimental results confirm the efficiency of the proposed prediction algorithm.
Montes D., Añel J.A., Pena T.F., Uhe P., Wallom D.C.
Geoscientific Model Development scimago Q1 wos Q1 Open Access
2017-02-21 citations by CoLab: 9 Abstract  
Abstract. Volunteer or crowd computing is becoming increasingly popular for solving complex research problems from an increasingly diverse range of areas. The majority of these have been built using the Berkeley Open Infrastructure for Network Computing (BOINC) platform, which provides a range of different services to manage all computation aspects of a project. The BOINC system is ideal in those cases where not only does the research community involved need low-cost access to massive computing resources but also where there is a significant public interest in the research being done.We discuss the way in which cloud services can help BOINC-based projects to deliver results in a fast, on demand manner. This is difficult to achieve using volunteers, and at the same time, using scalable cloud resources for short on demand projects can optimize the use of the available resources. We show how this design can be used as an efficient distributed computing platform within the cloud, and outline new approaches that could open up new possibilities in this field, using Climateprediction.net (http://www.climateprediction.net/) as a case study.
Hwang E., Kim S., Yoo T., Kim J., Hwang S., Choi Y.
2016-08-01 citations by CoLab: 17 Abstract  
High-Throughput Computing (HTC) and Many-Task Computing (MTC) paradigms employ loosely coupled applications which consist of a large number, from tens of thousands to even billions, of independent tasks. To support such large-scale applications, a heterogeneous computing system composed of multiple computing platforms with different types such as supercomputers, grids, and clouds can be used. On allocating heterogeneous resources of the system to multiple users, there are three important aspects to consider: fairness among users, efficiency for maximizing the system throughput, and user satisfaction for reducing the average user response time. In this paper, we present three resource allocation policies for multi-user and multi-application workloads in a heterogeneous computing system. These three policies are a fairness policy, a greedy efficiency policy, and a fair efficiency policy. We evaluate and compare the performance of the three resource allocation policies over various settings of a heterogeneous computing system and loosely coupled applications, using simulation based on the trace from real experiments. Our simulation results show that the fair efficiency policy can provide competitive efficiency, with a balanced level of fairness and user satisfaction, compared to the other two resource allocation policies.
Rahman M.T., Nguyen H., Subhlok J., Pandurangan G.
2016-05-01 citations by CoLab: 6 Abstract  
Volunteer computing is being used successfully for large scale scientific computations. This research is in the context of Volpex, a programming framework that supports communicating parallel processes in a volunteer environment. Redundancy and checkpointing are combined to ensure consistent forward progress with Volpex in this unique execution environment characterized by heterogeneous failure prone nodes and interdependent replicated processes. An important parameter for optimizing performance with Volpex is the frequency of checkpointing. The paper presents a mathematical model to minimize the completion time for inter-dependent parallel processes running in a volunteer environment by finding a suitable checkpoint interval. Validation is performed with a sample real world application running on a pool of distributed volunteer nodes. The results indicate that the performance with our predicted checkpoint interval is fairly close to the best performance obtained empirically by varying the checkpoint interval.
Yasuda S., Nogami Y., Fukushi M.
2015-12-01 citations by CoLab: 3 Abstract  
This paper proposes a dynamic job scheduling method for reliable and high-performance volunteer computing. In volunteer computing, each job is replicated and allocated to multiple participants (workers) to remove incorrect results by a voting mechanism. Hence, the number of workers necessary to complete a job is an important factor for the system performance; however, this is not well-considered in the existing methods. The proposed method defines the expected probability of completion for each job based on the worker's secession probability. By allocating each job so that the expected probability is always greater than a specified value, the proposed method avoids excess job allocation, which leads to the higher performance. The performance of the proposed method is evaluated by computer simulation, under the two scenarios of workers having uniform and different processing speeds. It is found that the performance of the proposed method is higher than the existing method especially under the practical latter scenario.
Cordasco G., De Chiara R., Rosenberg A.L.
2015-08-01 citations by CoLab: 12 Abstract  
Many modern computing platforms-notably clouds and desktop grids-exhibit dynamic heterogeneity: the availability and computing power of their constituent resources can change unexpectedly and dynamically, even in the midst of a computation. We introduce a new quality metric, AREA, for schedules that execute computations having interdependent constituent chores (jobs, tasks, etc.) on such platforms. AREA measures the average number of chores that a schedule renders eligible for execution at each step of a computation. Even though the definition of AREA does not mention any properties of host platforms (such as volatility), intuition suggests that rendering chores eligible at a faster rate will have a benign impact on the performance of volatile platforms. We report on simulation experiments that support this intuition. Earlier work has derived the basic properties of the AREA metric and has shown how to efficiently craft AREA-maximizing (A-M) schedules for several classes of significant computations. Even though A-M schedules always exist for every computation, it is not always known how to derive such schedules efficiently. In response, the current study develops an efficient algorithm that produces AREA-Oriented (A-O) schedules, which aim to efficiently approximate the AREAs of A-M schedules on arbitrary computations. The simulation experiments reported on here suggest that, in common with A-M schedules, A-O schedules complete computations on volatile heterogeneous platforms faster than a variety of heuristics that range from lightweight ones to computationally intensive ones-albeit not to the same degree as A-M schedules do. Our experiments suggest that schedules having larger AREAs have smaller completion times-but no proof of that yet exists.
Chernov I., Nikitina N.
2015-07-24 citations by CoLab: 7 Abstract  
We propose a mathematical model of a desktop grid computing system that solves tasks with two possible answers. Replication is used in order to reduce the error risk: wrong answers are returned with some known probabilities and penalty is added to the calculation cost in case of an error. We solve the optimization problems to determine the optimal quorum for tasks of varying duration. Beside the general case, we consider reliable answers of one kind. We apply the model to the problem of virtual screening and show how replication reduces the average cost. Also we demonstrate that when penalties are close to but lower than the critical values, taking different duration of tasks into account significantly reduces the penalty threat at very low additional cost.
Ivashko E., Nikitina N.
2025-01-30 citations by CoLab: 0 Abstract  
Scalability is one of the most valuable features of the Desktop Grid concept. The “master-worker” computing model allows to gather huge computing resources with a single server. But still having in mind the quality of service, there are limits on the number of computing nodes that can be supported. In this paper, we consider a Desktop Grid as a queueing system. We construct a mathematical model and list the main quality-of-service characteristics. Using the log data of a volunteer computing project SiDock@home, we evaluate steady-state performance characteristics of the real-world project and estimate its peak performance and quality of service.
Ma P., Garg S., Barika M.
2024-05-01 citations by CoLab: 2 Abstract  
The rise of mobile devices and the Internet of Things has generated vast data which require efficient processing methods. Volunteer Computing (VC) is a distributed network that utilises idle resources from diverse devices for task completion. VC offers a cost-effective and scalable solution for computation resources. Mobile Volunteer Computing (MVC) capitalises on the abundance of mobile devices as participants. However, managing a large number of participants in the network presents a challenge in scheduling resources. Various resource allocation algorithms and MVC platforms have been developed, but there is a lack of survey papers summarising these systems and algorithms. This paper aims to bridge the gap by delivering a comprehensive survey of MVC, including related technologies, MVC architecture, and major finding in taxonomy of resource allocation in MVC.
Nikitina N., Ivashko E.
2023-01-01 citations by CoLab: 0 Abstract  
The HiTViSc service is a new advanced tool for implementing high-performance virtual screening using cloud computing and distributed computing technologies of the Desktop Grid type based on the Desktop Grid as a Service concept. The paper describes three main workflows implemented by the HiTViSc user. The main workflow is related to the setup, selection of input data and the launch of a virtual screening, which is performed by the user using a special “experiment wizard”. Auxiliary workflows are related to the analysis of the results of virtual screening using external programs and the management of computing resources.
Nikitina N., Ivashko E.
2022-12-15 citations by CoLab: 1 Abstract  
This paper presents an analysis of a BOINC-based volunteer computing project SiDock@home. The project implements virtual drug screening. We analyse the employed workflow describing the processes of task generation, results creation, validation and assimilation. Basing on this analysis, we propose an optimized workflow aimed at minimization of computing intensity and scaling up the granularity of the results.
Djigal H., Xu J., Liu L., Zhang Y.
2022-08-17 citations by CoLab: 56 Abstract  
With the rapid development of Internet-of-Things (IoT) devices and mobile communication technologies, Multi-access Edge Computing (MEC) has emerged as a promising paradigm to extend cloud computing and storage capabilities to the edge of cellular networks, near to IoT devices. MEC enables IoT devices with limited battery capacity and computation/storage capabilities to execute their computation-intensive and latency-sensitive applications at the edge of the networks. However, to efficiently execute these applications in MEC systems, each task must be properly offloaded and scheduled onto the MEC servers. Additionally, the MEC servers may intelligently balance and share their computing resources to satisfy the application QoS and QoE. Therefore, effective resource allocation (RA) mechanisms in MEC are vital for ensuring its foreseen advantages. Recently, Machine Learning (ML) and Deep Learning (DL) have emerged as key methods for many challenging aspects of MEC. Particularly, ML and DL play a crucial role in addressing the challenges of RA in MEC. This paper presents a comprehensive survey of ML/DL-based RA mechanisms in MEC. We first present tutorials that demonstrate the advantages of applying ML and DL in MEC. Then, we present enabling technologies for quickly running ML/DL training and inference in MEC. Afterward, we provide an in-depth survey of recent works that used ML/DL methods for RA in MEC from three aspects: (1) ML/DL-based methods for task offloading; (2) ML/DL-based methods for task scheduling; and (3) ML/DL-based methods for joint resource allocation. Finally, we discuss key challenges and future research directions of applying ML/DL for resource allocation in MEC networks.
Ivashko E., Chernov I.
2022-04-25 citations by CoLab: 0 Abstract  
Scientific computing problems include a “needle in a haystack” problem. One searches for a unique target independently examining objects from a finite set. If there is a possible error in an object examination, one should have a strategy for re-examination the objects. High-throughput computing systems such as Desktop Grids commonly use so-called replication to increase reliability and throughput of a computing process. In this paper we present and analytically prove the optimal strategy of re-examination and derive an explicit formula for the expected number of examinations needed to find the target. We also show connection between replication in Desktop Grids and re-examination strategy in “needle in a haystack” problems.
Ivashko E., Chernov I.
2021-12-01 citations by CoLab: 1 Abstract  
In scientific computing, blind search problems are quite common. Objects from a finite set are independently studied one after another until a unique target is found. If the recognition may fail, so that the target may be missed, objects must be re-studied, according to some strategy. In Desktop Grid computing (Enterprise Desktop Grid, Volunteer computing) replication is commonly used to increase the reliability of a computing system. Doing the work twice or more times is a sacrifice of performance for reliability, and the optimal trade-off is not trivial. In this paper, we construct a mathematical model of this problem, from which we obtain the optimal trade-off policy, which turns out to be “no replication”: objects are re-studied not earlier than all of them have been studied once and no target has been obtained. We present an explicit formula for the expected number of examinations needed to find the target. The results of numerical experiments illustrate the result and compare the optimal strategy with other ones. Finally, we discuss the consequences of the results to Desktop Grid computing.
Nikitina N., Manzyuk M., Podlipnik Č., Jukić M.
2021-09-06 citations by CoLab: 3 Abstract  
This paper addresses the performance evaluation of a heterogeneous distributed computing environment (Desktop Grid) for large-scale medicinal chemistry experiments in silico. Dynamic change of the set of computational nodes, their heterogeneity and unreliability impose difficulties on task scheduling and algorithm scaling. We analyze the performance, provide efficiency metrics, statistics and analysis of the volunteer computing project SiDock@home.
Vatutin E., Zaikin O., Manzyuk M., Nikitina N.
2021-01-01 citations by CoLab: 1 Abstract  
This study focuses on searching for pairs of orthogonal diagonal Latin squares of order 10. Consider a cells mapping in accordance to which one diagonal Latin square is mapped to another one. Given a certain cells mapping schema, the problem is to find a pair of orthogonal diagonal Latin squares of order 10 such that they match the schema (or to prove that such a pair does not exist). The problem is reduced to the Boolean satisfiability problem (SAT). Three mapping schemes are considered, and for each of them a SAT instance is constructed. If a satisfying assignment is found for an instance, the corresponding pair of orthogonal Latin squares can be easily extracted from it. The Cube-and-Conquer approach is used to solve the instances. The cubing phase is performed on a sequential look-ahead SAT solver, while on the conquer phase an experiment in a BOINC-based volunteer computing project is launched. In the experiment, for two out of three schemes orthogonal pairs are found.
Chernov I., Ivashko E.
2020-12-05 citations by CoLab: 2 Abstract  
A common problem solved by high-performance computing is a search problem, when the unique object needs to be found among other objects. With a huge number of objects to examine and computationally hard examination of each, the search problem requires a lot of computing resources. However, the problem becomes even harder if an examination might give the wrong results with some probability. Such problem appears in unreliable high-throughput computing environments like Desktop Grids. In this paper, we present a mathematical model of such search problems, derive the optimal strategy of task assignment that minimizes the expected cost of examinations and thus reduces consumption of computing resources. We show that in a rather general case the optimal strategy is the “no-replication” one, i.e., all objects should be examined once, then for the second time if no target has been obtained, etc. We reveal the cases when this strategy is not optimal. Also, the expected costs of finding the target are obtained for a practical case of object-dependent cost.
Ivashko E., Nikitina N., Rumyantsev A.
2020-12-05 citations by CoLab: 3 Abstract  
The paper describes a discrete event simulation model of a Desktop Grid system. Firstly, we present a stochastic model of a volunteer computing project. We then employ the event simulation approach based on the generalized semi-Markov processes to develop a discrete event simulation model. Finally, using the simulation model, we describe a performance optimization problem aiming to optimize the project runtime as a function of the task size under performance constraints.
Mamonov A., Varlamov R., Salpagarov S.
2020-01-01 citations by CoLab: 0 Abstract  
Modern computation problems arise that cannot be solved by increasing the number and quality of computers alone. In this work, we develop distributed computing methods. By applying knowledge from different areas of science, such as queueing theory and theory of computation, we create our own distributed computing system. The main idea behind our approach is the equality of hierarchy and decentralization, and generalization of calculations, covering more high-performance computation areas. Usage of peer-to-peer architecture requires a thoughtful design but offers several advantages, which are discussed in this work. To compose our system, we set a mathematical model of the target situation, draw attention to its weaknesses, and develop software principles to resolve them. After solving several design issues, we create simulation models to analyze the effectiveness of the future system. After analyzing the models, we select the next steps to improve and finalize our system.
Ivashko E., Nikitina N.
2019-12-09 citations by CoLab: 0 Abstract  
In this paper, we propose an efficient computational scheme for virtual drug screening using a Desktop Grid. The scheme is based on a game-theoretical mathematical model, more precisely a crowding game. The proposed scheme provides needed balance between the results quality and the search scope.
Zaikin O.
2019-12-09 citations by CoLab: 0 Abstract  
Volunteer computing is a powerful tool for solving hard problems by the divide-and-conquer approach. During the last decade, several hard cryptanalysis problems were solved in the volunteer computing project SAT@home. In this study, the preliminary stage of these experiments are described: how SAT-based cryptanalysis problems are chosen; how these problems are studied on a computing cluster using state-of-the-art multithreaded SAT solvers; how decompositions of the chosen SAT problems are constructed using a Monte Carlo method; how server and client software are prepared for the corresponding experiments in SAT@home. These issues are described in application to several stream ciphers, for which it is planned to launch experiments in SAT@home.

Top-30

Journals

1
2
3
4
5
6
1
2
3
4
5
6

Publishers

2
4
6
8
10
12
2
4
6
8
10
12
  • We do not take into account publications without a DOI.
  • Statistics recalculated only for publications connected to researchers, organizations and labs registered on the platform.
  • Statistics recalculated weekly.

Are you a researcher?

Create a profile to get free access to personal recommendations for colleagues and new articles.
Share
Cite this
GOST | RIS | BibTex | MLA
Found error?