2,094 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    Scalable dimensioning of resilient Lambda Grids

    Get PDF
    This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and education use, including for instruction at the authors institution and sharing with colleagues. Other uses, including reproduction and distribution, or selling or licensing copies, or posting to personal, institutional or third party websites are prohibited. In most cases authors are permitted to post their version of the article (e.g. in Word or Tex form) to their personal website or institutional repository. Authors requiring further information regarding Elsevier’s archiving and manuscript policies are encouraged to visit

    Hybrid ant colony system and genetic algorithm approach for scheduling of jobs in computational grid

    Get PDF
    Metaheuristic algorithms have been used to solve scheduling problems in grid computing.However, stand-alone metaheuristic algorithms do not always show good performance in every problem instance. This study proposes a high level hybrid approach between ant colony system and genetic algorithm for job scheduling in grid computing.The proposed approach is based on a high level hybridization.The proposed hybrid approach is evaluated using the static benchmark problems known as ETC matrix.Experimental results show that the proposed hybridization between the two algorithms outperforms the stand-alone algorithms in terms of best and average makespan values

    On the Integrated Job Scheduling and Constrained Network Routing Problem

    Get PDF
    This paper examines the NP-hard problem of scheduling a number of jobs on a finite set of machines such that the overall profit of executed jobs is maximized. Each job demands a number of resources, which must be sent to the executing machine via constrained paths. Furthermore, two resource demand transmissions cannot use the same edge in the same time period. An exact solution approach based on Dantzig-Wolfe decomposition is proposed along with several heuristics. The methods are computationally evaluated on test instances arising from telecommunications with up to 500 jobs and 500 machines. Results show that solving the problem to optimality is very difficult. The proposed heuristics have good performance with an average solution value gap of 3% and with very small running times

    Parallel I/O scheduling in the presence of data duplication on multiprogrammed cluster computing systems

    Full text link
    The widespread adoption of cluster computing as a high performance computing platform has seen the growth of data intensive scientific, engineering and commercial applications such as digital libraries, climate modeling, computational chemistry, computational fluid dynamics and image repositories. However, I/O subsystem performance has not been keeping pace with processor and memory performance, and is fast becoming the dominant factor in overall system performance.&nbsp; Thus, parallel I/O has become a necessity in the face of performance improvements in other areas of computing systems. This paper addresses the problem of parallel I/O scheduling on cluster computing systems in the presence of data replication.&nbsp; We propose two new I/O scheduling algorithms and evaluate the relative performance of the proposed policies against two existing approaches.&nbsp; Simulation results show that the proposed policies perform substantially better than the baseline policies.<br /

    Survey on job scheduling mechanisms in grid environment

    Get PDF
    Grid systems provide geographically distributed resources for both computational intensive and data-intensive applications.These applications generate large data sets.However, the high latency imposed by the underlying technologies; upon which the grid system is built (such as the Internet and WWW), induced impediment in the effective access to such huge and widely distributed data.To minimize this impediment, jobs need to be scheduled across grid environments to achieve efficient data access.Scheduling multiple data requests submitted by grid users onto the grid environment is NP-hard.Thus, there is no best scheduling algorithm that cuts across all grids computing environments.Job scheduling is one of the key research area in grid computing.In the recent past many researchers have proposed different mechanisms to help scheduling of user jobs in grid systems.Some characteristic features of the grid components; such as machines types and nature of jobs at hand means that a choice needs to be made for an appropriate scheduling algorithm to march a given grid environment.The aim of scheduling is to achieve maximum possible system throughput and to match the application needs with the available computing resources.This paper is motivated by the need to explore the various job scheduling techniques alongside their area of implementation.The paper will systematically analyze the strengths and weaknesses of some selected approaches in the area of grid jobs scheduling.This helps researchers better understand the concept of scheduling, and can contribute in developing more efficient and practical scheduling algorithms.This will also benefit interested researchers to carry out further work in this dynamic research area

    QoS-aware predictive workflow scheduling

    Full text link
    This research places the basis of QoS-aware predictive workflow scheduling. This research novel contributions will open up prospects for future research in handling complex big workflow applications with high uncertainty and dynamism. The results from the proposed workflow scheduling algorithm shows significant improvement in terms of the performance and reliability of the workflow applications

    Replica Creation Algorithm for Data Grids

    Get PDF
    Data grid system is a data management infrastructure that facilitates reliable access and sharing of large amount of data, storage resources, and data transfer services that can be scaled across distributed locations. This thesis presents a new replication algorithm that improves data access performance in data grids by distributing relevant data copies around the grid. The new Data Replica Creation Algorithm (DRCM) improves performance of data grid systems by reducing job execution time and making the best use of data grid resources (network bandwidth and storage space). Current algorithms focus on number of accesses in deciding which file to replicate and where to place them, which ignores resources’ capabilities. DRCM differs by considering both user and resource perspectives; strategically placing replicas at locations that provide the lowest transfer cost. The proposed algorithm uses three strategies: Replica Creation and Deletion Strategy (RCDS), Replica Placement Strategy (RPS), and Replica Replacement Strategy (RRS). DRCM was evaluated using network simulation (OptorSim) based on selected performance metrics (mean job execution time, efficient network usage, average storage usage, and computing element usage), scenarios, and topologies. Results revealed better job execution time with lower resource consumption than existing approaches. This research contributes replication strategies embodied in one algorithm that enhances data grid performance, capable of making a decision on creating or deleting more than one file during same decision. Furthermore, dependency-level-between-files criterion was utilized and integrated with the exponential growth/decay model to give an accurate file evaluation
    corecore