300 research outputs found

    Heterogeneous Strong Computation Migration

    Full text link
    The continuous increase in performance requirements, for both scientific computation and industry, motivates the need of a powerful computing infrastructure. The Grid appeared as a solution for inexpensive execution of heavy applications in a parallel and distributed manner. It allows combining resources independently of their physical location and architecture to form a global resource pool available to all grid users. However, grid environments are highly unstable and unpredictable. Adaptability is a crucial issue in this context, in order to guarantee an appropriate quality of service to users. Migration is a technique frequently used for achieving adaptation. The objective of this report is to survey the problem of strong migration in heterogeneous environments like the grids', the related implementation issues and the current solutions.Comment: This is the pre-peer reviewed version of the following article: Milan\'es, A., Rodriguez, N. and Schulze, B. (2008), State of the art in heterogeneous strong migration of computations. Concurrency and Computation: Practice and Experience, 20: 1485-1508, which has been published in final form at http://onlinelibrary.wiley.com/doi/10.1002/cpe.1287/abstrac

    A Preemption-Based Meta-Scheduling System for Distributed Computing

    Get PDF
    This research aims at designing and building a scheduling framework for distributed computing systems with the primary objectives of providing fast response times to the users, delivering high system throughput and accommodating maximum number of applications into the systems. The author claims that the above mentioned objectives are the most important objectives for scheduling in recent distributed computing systems, especially Grid computing environments. In order to achieve the objectives of the scheduling framework, the scheduler employs arbitration of application-level schedules and preemption of executing jobs under certain conditions. In application-level scheduling, the user develops a schedule for his application using an execution model that simulates the execution behavior of the application. Since application-level scheduling can seriously impede the performance of the system, the scheduling framework developed in this research arbitrates between different application-level schedules corresponding to different applications to provide fair system usage for all applications and balance the interests of different applications. In this sense, the scheduling framework is not a classical scheduling system, but a meta-scheduling system that interacts with the application-level schedulers. Due to the large system dynamics involved in Grid computing systems, the ability to preempt executing jobs becomes a necessity. The meta-scheduler described in this dissertation employs well defined scheduling policies to preempt and migrate executing applications. In order to provide the users with the capability to make their applications preemptible, a user-level check-pointing library called SRS (Stop-Restart Software) was also developed by this research. The SRS library is different from many user-level check-pointing libraries since it allows reconfiguration of applications between migrations. This reconfiguration can be achieved by changing the processor configuration and/or data distribution. The experimental results provided in this dissertation demonstrates the utility of the metascheduling framework for distributed computing systems. And lastly, the metascheduling framework was put to practical use by building a Grid computing system called GradSolve. GradSolve is a flexible system and it allows the application library writers to upload applications with different capabilities into the system. GradSolve is also unique with respect to maintaining traces of the execution of the applications and using the traces for subsequent executions of the application

    Trace-Driven Simulation for Energy Consumption in High Throughput Computing Systems

    Get PDF
    High Throughput Computing (HTC) is a powerful paradigm allowing vast quantities of independent work to be performed simultaneously. However, until recently little evaluation has been performed on the energy impact of HTC. Many organisations now seek to minimise energy consumption across their IT infrastructure though it is unclear how this will affect the usability of HTC systems. We present here HTC-Sim, a simulation system which allows the evaluation of different energy reduction policies across an HTC system comprising a collection of computational resources dedicated to HTC work and resources provided through cycle scavenging -- a Desktop Grid. We demonstrate that our simulation software scales linearly with increasing HTC workload

    Design and implementation of a multi-agent opportunistic grid computing platform

    Get PDF
    Opportunistic Grid Computing involves joining idle computing resources in enterprises into a converged high performance commodity infrastructure. The research described in this dissertation investigates the viability of public resource computing in offering a plethora of possibilities through seamless access to shared compute and storage resources. The research proposes and conceptualizes the Multi-Agent Opportunistic Grid (MAOG) solution in an Information and Communication Technologies for Development (ICT4D) initiative to address some limitations prevalent in traditional distributed system implementations. Proof-of-concept software components based on JADE (Java Agent Development Framework) validated Multi-Agent Systems (MAS) as an important tool for provisioning of Opportunistic Grid Computing platforms. Exploration of agent technologies within the research context identified two key components which improve access to extended computer capabilities. The first component is a Mobile Agent (MA) compute component in which a group of agents interact to pool shared processor cycles. The compute component integrates dynamic resource identification and allocation strategies by incorporating the Contract Net Protocol (CNP) and rule based reasoning concepts. The second service is a MAS based storage component realized through disk mirroring and Google file-system’s chunking with atomic append storage techniques. This research provides a candidate Opportunistic Grid Computing platform design and implementation through the use of MAS. Experiments conducted validated the design and implementation of the compute and storage services. From results, support for processing user applications; resource identification and allocation; and rule based reasoning validated the MA compute component. A MAS based file-system that implements chunking optimizations was considered to be optimum based on evaluations. The findings from the undertaken experiments also validated the functional adequacy of the implementation, and show the suitability of MAS for provisioning of robust, autonomous, and intelligent platforms. The context of this research, ICT4D, provides a solution to optimizing and increasing the utilization of computing resources that are usually idle in these contexts

    Operating policies for energy efficient large scale computing

    Get PDF
    PhD ThesisEnergy costs now dominate IT infrastructure total cost of ownership, with datacentre operators predicted to spend more on energy than hardware infrastructure in the next five years. With Western European datacentre power consumption estimated at 56 TWh/year in 2007 and projected to double by 2020, improvements in energy efficiency of IT operations is imperative. The issue is further compounded by social and political factors and strict environmental legislation governing organisations. One such example of large IT systems includes high-throughput cycle stealing distributed systems such as HTCondor and BOINC, which allow organisations to leverage spare capacity on existing infrastructure to undertake valuable computation. As a consequence of increased scrutiny of the energy impact of these systems, aggressive power management policies are often employed to reduce the energy impact of institutional clusters, but in doing so these policies severely restrict the computational resources available for high-throughput systems. These policies are often configured to quickly transition servers and end-user cluster machines into low power states after only short idle periods, further compounding the issue of reliability. In this thesis, we evaluate operating policies for energy efficiency in large-scale computing environments by means of trace-driven discrete event simulation, leveraging real-world workload traces collected within Newcastle University. The major contributions of this thesis are as follows: i) Evaluation of novel energy efficient management policies for a decentralised peer-to-peer (P2P) BitTorrent environment. ii) Introduce a novel simulation environment for the evaluation of energy efficiency of large scale high-throughput computing systems, and propose a generalisable model of energy consumption in high-throughput computing systems. iii iii) Proposal and evaluation of resource allocation strategies for energy consumption in high-throughput computing systems for a real workload. iv) Proposal and evaluation for a realworkload ofmechanisms to reduce wasted task execution within high-throughput computing systems to reduce energy consumption. v) Evaluation of the impact of fault tolerance mechanisms on energy consumption

    ASCR/HEP Exascale Requirements Review Report

    Full text link
    This draft report summarizes and details the findings, results, and recommendations derived from the ASCR/HEP Exascale Requirements Review meeting held in June, 2015. The main conclusions are as follows. 1) Larger, more capable computing and data facilities are needed to support HEP science goals in all three frontiers: Energy, Intensity, and Cosmic. The expected scale of the demand at the 2025 timescale is at least two orders of magnitude -- and in some cases greater -- than that available currently. 2) The growth rate of data produced by simulations is overwhelming the current ability, of both facilities and researchers, to store and analyze it. Additional resources and new techniques for data analysis are urgently needed. 3) Data rates and volumes from HEP experimental facilities are also straining the ability to store and analyze large and complex data volumes. Appropriately configured leadership-class facilities can play a transformational role in enabling scientific discovery from these datasets. 4) A close integration of HPC simulation and data analysis will aid greatly in interpreting results from HEP experiments. Such an integration will minimize data movement and facilitate interdependent workflows. 5) Long-range planning between HEP and ASCR will be required to meet HEP's research needs. To best use ASCR HPC resources the experimental HEP program needs a) an established long-term plan for access to ASCR computational and data resources, b) an ability to map workflows onto HPC resources, c) the ability for ASCR facilities to accommodate workflows run by collaborations that can have thousands of individual members, d) to transition codes to the next-generation HPC platforms that will be available at ASCR facilities, e) to build up and train a workforce capable of developing and using simulations and analysis to support HEP scientific research on next-generation systems.Comment: 77 pages, 13 Figures; draft report, subject to further revisio

    Contributions to Desktop Grid Computing : From High Throughput Computing to Data-Intensive Sciences on Hybrid Distributed Computing Infrastructures

    Get PDF
    Since the mid 90’s, Desktop Grid Computing - i.e the idea of using a large number of remote PCs distributed on the Internet to execute large parallel applications - has proved to be an efficient paradigm to provide a large computational power at the fraction of the cost of a dedicated computing infrastructure.This document presents my contributions over the last decade to broaden the scope of Desktop Grid Computing. My research has followed three different directions. The first direction has established new methods to observe and characterize Desktop Grid resources and developed experimental platforms to test and validate our approach in conditions close to reality. The second line of research has focused on integrating Desk- top Grids in e-science Grid infrastructure (e.g. EGI), which requires to address many challenges such as security, scheduling, quality of service, and more. The third direction has investigated how to support large-scale data management and data intensive applica- tions on such infrastructures, including support for the new and emerging data-oriented programming models.This manuscript not only reports on the scientific achievements and the technologies developed to support our objectives, but also on the international collaborations and projects I have been involved in, as well as the scientific mentoring which motivates my candidature for the Habilitation `a Diriger les Recherches

    Big Data and Large-scale Data Analytics: Efficiency of Sustainable Scalability and Security of Centralized Clouds and Edge Deployment Architectures

    Get PDF
    One of the significant shifts of the next-generation computing technologies will certainly be in the development of Big Data (BD) deployment architectures. Apache Hadoop, the BD landmark, evolved as a widely deployed BD operating system. Its new features include federation structure and many associated frameworks, which provide Hadoop 3.x with the maturity to serve different markets. This dissertation addresses two leading issues involved in exploiting BD and large-scale data analytics realm using the Hadoop platform. Namely, (i)Scalability that directly affects the system performance and overall throughput using portable Docker containers. (ii) Security that spread the adoption of data protection practices among practitioners using access controls. An Enhanced Mapreduce Environment (EME), OPportunistic and Elastic Resource Allocation (OPERA) scheduler, BD Federation Access Broker (BDFAB), and a Secure Intelligent Transportation System (SITS) of multi-tiers architecture for data streaming to the cloud computing are the main contribution of this thesis study
    • …
    corecore