Search CORE

10 research outputs found

GeoLoc: Robust Resource Allocation Method for Query Optimization in Data Grid Systems

Author: Epimakhov Igor
Hameurlain Abdelkader
MORVAN Franck
Publication venue: 'IOS Press'
Publication date: 08/07/2012
Field of study

International audienceResource allocation (RA) is one of the key stages of distributed query processing in the Data Grid environment. In the last decade were published a number of works in the field that deals with different aspects of the problem. We believe that in those studies authors paid less attention to such important aspects as definition of allocation space and criterion of parallelism degree determination. In this paper we propose a method of RA that extends existing solutions in those two points of interest and resolves the problem in the specific conditions of the large scale heterogeneous environment of Data Grids. Firstly, we propose to use a geographical proximity of nodes to data sources to define the Allocation Space (AS). Secondly, we present the principle of execution time parity between scan and join (build and probe) operations for determination of parallelism degree and for generation of load balanced query execution plans. We conducted an experiment that proved the superiority of our GeoLoc method in terms of response time over the RA method that we chose for the comparison. The present study provides also a brief description of existing methods and their qualitative comparison with respect to proposed method

Scientific Publications of the University of Toulouse II Le Mirail

Open Archive Toulouse Archive Ouverte

Data and Task Scheduling in Distributed Computing Environments, Journal of Telecommunications and Information Technology, 2014, nr 4

Author: Szmajduch Magdalena
Publication venue: Instytut Łączności - Państwowy Instytut Badawczy, Warszawa
Publication date
Field of study

ecome a major research and engineering issue. Data Grids (DGs), Data Clouds (DCs) and Data Centers are designed for supporting the processing and analysis of massive data, which can be generated by distributed users, devices and computing centers. Data scheduling must be considered jointly with the application scheduling process. It generates a wide family of global optimization problems with the new scheduling criteria including data transmission time, data access and processing times, reliability of the data servers, security in the data processing and data access processes. In this paper, a new version of the Expected Time to Compute Matrix (ETC Matrix) model is defined for independent batch scheduling in physical network in DG and DC environments. In this model, the completion times of the computing nodes are estimated based on the standard ETC Matrix and data transmission times. The proposed model has been empirically evaluated on the static grid scheduling benchmark by using the simple genetic-based schedulers. A simple comparison of the achieved results for two basic scheduling metrics, namely makespan and average flowtime, with the results generated in the case of ignoring the data scheduling phase show the significant impact of the data processing model on the schedule execution times

Biblioteka Cyfrowa Instytutu Łączności / National Institute of Telecomunications: Digital Library

An SCP-based Heuristic Approach for Scheduling Distributed Data-Intensive Applications on Global Grids

Author: Rajkumar Buyya
Srikumar Venugopal
Publication venue
Publication date: 01/04/2008
Field of study

Data-intensive Grid applications need access to large datasets that may each be replicated on different resources. Minimizing the overhead of transferring these datasets to the resources where the applications are executed requires that appropriate computational and data resources be selected. In this paper, we consider the problem of scheduling an application composed of a set of independent tasks, each of which requires multiple datasets that are each replicated on multiple resources. We break this problem into two parts: one, to match each task (or job) to one compute resource for executing the job and one storage resource each for accessing each dataset required by the job and two, to assign the set of tasks to the selected resources. We model the first part as an instance of the well-known Set Covering Problem (SCP) and apply a known heuristic for SCP to match jobs to resources. The second part is tackled by extending existing MinMin and Sufferage algorithms to schedule the set of distributed data-intensive tasks. Through simulation, we experimentally compare the SCP-based matching heuristic to others in conjunction with the task scheduling algorithms and present the results

CiteSeerX

University of Melbourne Institutional Repository

An SCP-based heuristic approach for scheduling distributed data-intensive applications on global grids

Author: Acharya
Balas
Bell
Blythe
Braun
Breslau
Buyya
Cameron
Casanova
Chervenak
Cormen
Deelman
Foster
Giersch
Hoschek
Jain
Kafil
Khan
Khanna
Kwok
Kwok
Kwok
Legrand
Maheshwaran
Maheswaran
Mohamed
Park
Rajkumar Buyya
Ranganathan
Ranganathan
Salem
Seidel
Shi
Srikumar Venugopal
Sulistio
Thakur
Venugopal
Wolski
Yamamoto
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Data Placement And Task Mapping Optimization For Big Data Workflows In The Cloud

Author: Ebrahimi Mahdi
Publication venue: DigitalCommons@WayneState
Publication date: 01/01/2017
Field of study

Data-centric workflows naturally process and analyze a huge volume of datasets. In this new era of Big Data there is a growing need to enable data-centric workflows to perform computations at a scale far exceeding a single workstation\u27s capabilities. Therefore, this type of applications can benefit from distributed high performance computing (HPC) infrastructures like cluster, grid or cloud computing. Although data-centric workflows have been applied extensively to structure complex scientific data analysis processes, they fail to address the big data challenges as well as leverage the capability of dynamic resource provisioning in the Cloud. The concept of “big data workflows” is proposed by our research group as the next generation of data-centric workflow technologies to address the limitations of exist-ing workflows technologies in addressing big data challenges. Executing big data workflows in the Cloud is a challenging problem as work-flow tasks and data are required to be partitioned, distributed and assigned to the cloud execution sites (multiple virtual machines). In running such big data work-flows in the cloud distributed across several physical locations, the workflow execution time and the cloud resource utilization efficiency highly depends on the initial placement and distribution of the workflow tasks and datasets across the multiple virtual machines in the Cloud. Several workflow management systems have been developed for scientists to facilitate the use of workflows; however, data and work-flow task placement issue has not been sufficiently addressed yet. In this dissertation, I propose BDAP strategy (Big Data Placement strategy) for data placement and TPS (Task Placement Strategy) for task placement, which improve workflow performance by minimizing data movement across multiple virtual machines in the Cloud during the workflow execution. In addition, I propose CATS (Cultural Algorithm Task Scheduling) for workflow scheduling, which improve workflow performance by minimizing workflow execution cost. In this dissertation, I 1) formalize data and task placement problems in workflows, 2) propose a data placement algorithm that considers both initial input dataset and intermediate datasets obtained during workflow run, 3) propose a task placement algorithm that considers placement of workflow tasks before workflow run, 4) propose a workflow scheduling strategy to minimize the workflow execution cost once the deadline is provided by user and 5)perform extensive experiments in the distributed environment to validate that our proposed strategies provide an effective data and task placement solution to distribute and place big datasets and tasks into the appropriate virtual machines in the Cloud within reasonable time

Digital Commons@Wayne State University

Allocation des ressources pour l'optimisation de requêtes dans les systèmes de grille de données

Author: Epimakhov Igor
Publication venue
Publication date: 10/07/2013
Field of study

Les systèmes de grille de données sont de plus en plus utilisés grâce à leur capacité de stockage et de calcul. L'un des problèmes importants de ces systèmes est l'allocation de ressources pour l'optimisation de requêtes SQL. Récemment, la communauté scientifique a publié plusieurs approches et méthodes d'allocation de ressources, en s'efforçant de tenir compte des différentes spécificités de systèmes de grille de données : l'hétérogénéité, l'instabilité du système et la grande échelle. La structure de gestion centralisée prédomine dans les méthodes proposées, malgré les risques encourus par cette solution dans les systèmes à grande échelle. Dans cette thèse nous proposons une méthode d'allocation de ressources hybride et décentralisée pour l'optimisation d'une requête. La partie statique de notre méthode constitue la stratégie d'allocation initiale de ressources par un 'broker' d'une requête. Quant à la partie dynamique, nous proposons une stratégie, qui utilise la coopération entre des opérations relationnelles mobiles autonomes et des coordinateurs stationnaires des nœuds pour décentraliser le processus de réallocation dynamique de ressources. Les éléments clés de notre méthode sont : (i) la limitation de l'espace de recherche pour résoudre les problèmes causés par la grande échelle, (ii) le principe de répartition des ressources entre les opérations d'une requête pour déterminer le degré de parallélisme des opérations et pour équilibrer la charge dynamiquement et (iii) la décentralisation du processus d'allocation dynamique. Les résultats de l'évaluation des performances de notre méthode montrent l'efficacité de nos propositions. Notre stratégie d'allocation initiale de ressources a donné des résultats supérieurs à la méthode de référence que nous avons utilisée pour la comparaison. La stratégie de réallocation dynamique de ressources réduit notablement le temps de réponse en présence de l'instabilité du système et du déséquilibre de charge.Data grid systems are utilized more and more due to their storage and computing capacities. One of the main problems of these systems is the resource allocation for SQL query optimization. Recently, the scientific community published numerous approaches and methods of resource allocation, striving to take into account different peculiarities of data grid systems: heterogeneity, instability and large scale. Centralized management structure predominates in the proposed methods, in spite of the risks incurred of the solution in the large scale systems. In the thesis we adopt the hybrid approach of resource allocation for query optimization, meaning that, we first make a static resource allocation during the query compile time, and then reallocate the resources dynamically during the query runtime. As opposed to the previously proposed methods, we use a decentralized management structure. The static part of our method consists of the strategy of initial resource allocation with a query 'broker'. As for the dynamic part, we propose a strategy that uses the cooperation between relational mobile operations and stationary coordinators of nodes in order to decentralize the process of dynamic resource reallocation. Key elements of our method are: (i) limitation of research space for resolve problems caused by the large scale, (ii) principle of resources distribution between query operations for determining the parallelism degree of operations and for balancing the load dynamically and (iii) decentralization of the dynamic allocation process. Results of performance evaluation show the efficiency of our propositions. Our initial resource allocation strategy gives results superior to the referenced method that we used for the comparison. The dynamic reallocation strategy decrease considerably the response time in the presence of the system instability and load misbalance

Thèses en ligne de l'Université Toulouse III - Paul Sabatier

Group-Based Parallel Multi-scheduling Methods for Grid Computing

Author: Abraham Goodhead Tomvie
Publication venue
Publication date: 01/01/2016
Field of study

Coventry University Pure Portal

GREEDY SINGLE USER AND FAIR MULTIPLE USERS REPLICA SELECTION DECISION IN DATA GRID

Author: JARADAT AYMAN KAMEL SALIM
Publication venue
Publication date: 01/05/2013
Field of study

Replication in data grids increases data availability, accessibility and reliability. Replicas of datasets are usually distributed to different sites, and the choice of any replica locations has a significant impact. Replica selection algorithms decide the best replica places based on some criteria. To this end, a family of efficient replica selection systems has been proposed (RsDGrid). The problem presented in this thesis is how to select the best replica location that achieve less time, higher QoS, consistency with users' preferences and almost equal users' satisfactions. RsDGrid consists of three systems: A-system, D-system, and M-system. Each of them has its own scope and specifications. RsDGrid switches among these systems according to the decision maker

UTPedia