39 research outputs found

    Contributions to Desktop Grid Computing : From High Throughput Computing to Data-Intensive Sciences on Hybrid Distributed Computing Infrastructures

    Get PDF
    Since the mid 90’s, Desktop Grid Computing - i.e the idea of using a large number of remote PCs distributed on the Internet to execute large parallel applications - has proved to be an efficient paradigm to provide a large computational power at the fraction of the cost of a dedicated computing infrastructure.This document presents my contributions over the last decade to broaden the scope of Desktop Grid Computing. My research has followed three different directions. The first direction has established new methods to observe and characterize Desktop Grid resources and developed experimental platforms to test and validate our approach in conditions close to reality. The second line of research has focused on integrating Desk- top Grids in e-science Grid infrastructure (e.g. EGI), which requires to address many challenges such as security, scheduling, quality of service, and more. The third direction has investigated how to support large-scale data management and data intensive applica- tions on such infrastructures, including support for the new and emerging data-oriented programming models.This manuscript not only reports on the scientific achievements and the technologies developed to support our objectives, but also on the international collaborations and projects I have been involved in, as well as the scientific mentoring which motivates my candidature for the Habilitation `a Diriger les Recherches

    Técnicas de altas prestaciones aplicadas al diseño de infraestructuras ferroviarias complejas

    Get PDF
    In this work we will focus on overhead air switches design problem. The design of railway infrastructures is an important problem in the railway world, non-optimal designs cause limitations in the train speed and, most important, malfunctions and breakages. Most railway companies have regulations for the design of these elements. Those regulations have been defined by the experience, but, as far as we know, there are no computerized software tools that assist with the task of designing and testing optimal solutions for overhead switches. The aim of this thesis is the design, implementation, and evaluation of a simulator that that facilitates the exploration of all possible solutions space, looking for the set of optimal solutions in the shortest time and at the lowest possible cost. Simulators are frequently used in the world of rail infrastructure. Many of them only focus on simulated scenarios predefined by the users, analyzing the feasibility or otherwise of the proposed design. Throughout this thesis, we will propose a framework to design a complete simulator that be able to propose, simulate and evaluate multiple solutions. This framework is based on four pillars: compromise between simulation accuracy and complexity, automatic generation of possible solutions (automatic exploration of the solution space), consideration of all the actors involved in the design process (standards, additional restrictions, etc.), and finally, the expert’s knowledge and integration of optimization metrics. Once we defined the framework different deployment proposes are presented, one to be run in a single node, and one in a distributed system. In the first paradigm, one thread per CPU available in the system is launched. All the simulators are designed around this paradigm of parallelism. The second simulation approach will be designed to be deploy in a cluster with several nodes, MPI will be used for that purpose. Finally, after the implementation of each of the approaches, we will proceed to evaluate the performance of each of them, carrying out a comparison of time and cost. Two examples of real scenarios will be used.El diseño de agujas aéreas es un problema bastante complejo y critico dentro del proceso de diseño de sistemas ferroviarios. Un diseño no óptimo puede provocar limitaciones en el servicio, como menor velocidad de tránsito, y lo que es más importante, puede ser la causa principal de accidentes y averías. La mayoría de las compañías ferroviarias disponen de regulaciones para el diseño correcto de estas agujas aéreas. Todas estas regulaciones han sido definidas bajo décadas de experiencia, pero hasta donde sé, no existen aplicaciones software que ayuden en la tarea de diseñar y probar soluciones óptimas. Es en este punto donde se centra el objetivo de la tesis, el diseño, implementación y evaluación de un simulador capaz de explorar todo el posible espacio de soluciones buscando el conjunto de soluciones óptimas en el menor tiempo y con el menor coste posible. Los simuladores son utilizados frecuentemente en el mundo de la infraestructura ferroviaria. Muchos de ellos solo se centran en la simulación de escenarios preestablecidos por el usuario, analizando la viabilidad o no del diseño propuesto. A lo largo de esta tesis, se propondrá un framework que permita al simulador final ser capaz de proponer, simular y evaluar múltiples soluciones. El framework se basa en 4 pilares fundamentales, compromiso entre precisión en la simulación y la complejidad del simulador; generación automática de posibles soluciones (exploración automática del espacio de soluciones), consideración de todos los agentes que intervienen en el proceso de diseño (normativa, restricciones adicionales, etc.) y por último, el conocimiento del experto y la integración de métricas de optimización. Una vez definido el framework se presentaran varias opciones de implementación del simulador, en la primera de ellas se diseñará e implementara una versión con hilos pura. Se lanzara un hilo por cada CPU disponible en el sistema. Todo el simulador se diseñará en torno a este paradigma de paralelismo. En un segundo simulador, se aplicará un paradigma mucho más pensado para su despliegue en un cluster y no en un único nodo (como el paradigma inicial), para ello se empleara MPI. Con esta versión se podrá adaptar el simulador al cluster en el que se va a ejecutar. Por último, se va a emplear un paradigma basado en cloud computing. Para ello, según las necesidades del escenario a simular, se emplearán más o menos máquinas virtuales. Finalmente, tras la implementación de cada uno de los simuladores, se procederá a evaluar el rendimiento de cada uno de ellos, realizando para ello una comparativa de tiempo y coste. Se empleara para ello dos ejemplos de escenarios reales.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: José Daniel García Sánchez.- Secretario: Antonio García Dopico.- Vocal: Juan Carlos Díaz Martí

    Overview and Evaluation of Conceptual Strategies for Accessing CPU-Dependent Execution Resources in Grid Infrastructures

    Get PDF
    The emergence of many-core and massively-parallel computational accelerators (e.g., GPGPUs) has led to user demand for such resources in grid infrastructures. A widely adopted approach for discovering and accessing such resources has, however, yet to emerge.  GPGPUs are an example of a larger class of computational resources, characterized in part by dependence on an allocated CPU. This paper terms such resources "CPU-Dependent Execution Resources" (CDERs). Five conceptual strategies for discovering and accessing CDERs are described and evaluated against key criteria, and all five strategies are compliant with GLUE 1.3, GLUE 2.0, or both. From this evaluation, two of the presented strategies clearly emerge as providing the greatest flexibility for publishing both static and dynamic CDER information and identifying CDERs that satisfy specific job requirements. Furthermore, a two-phase approach to job-submission is proposed for those jobs requiring access to CDERs. The approach is compatible with existing grid services.  Examples are provided to illustrate job submission under each strategy

    Economic-based Distributed Resource Management and Scheduling for Grid Computing

    Full text link
    Computational Grids, emerging as an infrastructure for next generation computing, enable the sharing, selection, and aggregation of geographically distributed resources for solving large-scale problems in science, engineering, and commerce. As the resources in the Grid are heterogeneous and geographically distributed with varying availability and a variety of usage and cost policies for diverse users at different times and, priorities as well as goals that vary with time. The management of resources and application scheduling in such a large and distributed environment is a complex task. This thesis proposes a distributed computational economy as an effective metaphor for the management of resources and application scheduling. It proposes an architectural framework that supports resource trading and quality of services based scheduling. It enables the regulation of supply and demand for resources and provides an incentive for resource owners for participating in the Grid and motives the users to trade-off between the deadline, budget, and the required level of quality of service. The thesis demonstrates the capability of economic-based systems for peer-to-peer distributed computing by developing users' quality-of-service requirements driven scheduling strategies and algorithms. It demonstrates their effectiveness by performing scheduling experiments on the World-Wide Grid for solving parameter sweep applications

    A Policy-Based Resource Brokering Environment for Computational Grids

    Get PDF
    With the advances in networking infrastructure in general, and the Internet in particular, we can build grid environments that allow users to utilize a diverse set of distributed and heterogeneous resources. Since the focus of such environments is the efficient usage of the underlying resources, a critical component is the resource brokering environment that mediates the discovery, access and usage of these resources. With the consumer\u27s constraints, provider\u27s rules, distributed heterogeneous resources and the large number of scheduling choices, the resource brokering environment needs to decide where to place the user\u27s jobs and when to start their execution in a way that yields the best performance for the user and the best utilization for the resource provider. As brokering and scheduling are very complicated tasks, most current resource brokering environments are either specific to a particular grid environment or have limited features. This makes them unsuitable for large applications with heterogeneous requirements. In addition, most of these resource brokering environments lack flexibility. Policies at the resource-, application-, and system-levels cannot be specified and enforced to provide commitment to the guaranteed level of allocation that can help in attracting grid users and contribute to establishing credibility for existing grid environments. In this thesis, we propose and prototype a flexible and extensible Policy-based Resource Brokering Environment (PROBE) that can be utilized by various grid systems. In designing PROBE, we follow a policy-based approach that provides PROBE with the intelligence to not only match the user\u27s request with the right set of resources, but also to assure the guaranteed level of the allocation. PROBE looks at the task allocation as a Service Level Agreement (SLA) that needs to be enforced between the resource provider and the resource consumer. The policy-based framework is useful in a typical grid environment where resources, most of the time, are not dedicated. In implementing PROBE, we have utilized a layered architecture and façade design patterns. These along with the well-defined API, make the framework independent of any architecture and allow for the incorporation of different types of scheduling algorithms, applications and platform adaptors as the underlying environment requires. We have utilized XML as a base for all the specification needs. This provides a flexible mechanism to specify the heterogeneous resources and user\u27s requests along with their allocation constraints. We have developed XML-based specifications by which high-level internal structures of resources, jobs and policies can be specified. This provides interoperability in which a grid system can utilize PROBE to discover and use resources controlled by other grid systems. We have implemented a prototype of PROBE to demonstrate its feasibility. We also describe a test bed environment and the evaluation experiments that we have conducted to demonstrate the usefulness and effectiveness of our approach

    Enabling Scalability: Graph Hierarchies and Fault Tolerance

    Get PDF
    In this dissertation, we explore approaches to two techniques for building scalable algorithms. First, we look at different graph problems. We show how to exploit the input graph\u27s inherent hierarchy for scalable graph algorithms. The second technique takes a step back from concrete algorithmic problems. Here, we consider the case of node failures in large distributed systems and present techniques to quickly recover from these. In the first part of the dissertation, we investigate how hierarchies in graphs can be used to scale algorithms to large inputs. We develop algorithms for three graph problems based on two approaches to build hierarchies. The first approach reduces instance sizes for NP-hard problems by applying so-called reduction rules. These rules can be applied in polynomial time. They either find parts of the input that can be solved in polynomial time, or they identify structures that can be contracted (reduced) into smaller structures without loss of information for the specific problem. After solving the reduced instance using an exponential-time algorithm, these previously contracted structures can be uncontracted to obtain an exact solution for the original input. In addition to a simple preprocessing procedure, reduction rules can also be used in branch-and-reduce algorithms where they are successively applied after each branching step to build a hierarchy of problem kernels of increasing computational hardness. We develop reduction-based algorithms for the classical NP-hard problems Maximum Independent Set and Maximum Cut. The second approach is used for route planning in road networks where we build a hierarchy of road segments based on their importance for long distance shortest paths. By only considering important road segments when we are far away from the source and destination, we can substantially speed up shortest path queries. In the second part of this dissertation, we take a step back from concrete graph problems and look at more general problems in high performance computing (HPC). Here, due to the ever increasing size and complexity of HPC clusters, we expect hardware and software failures to become more common in massively parallel computations. We present two techniques for applications to recover from failures and resume computation. Both techniques are based on in-memory storage of redundant information and a data distribution that enables fast recovery. The first technique can be used for general purpose distributed processing frameworks: We identify data that is redundantly available on multiple machines and only introduce additional work for the remaining data that is only available on one machine. The second technique is a checkpointing library engineered for fast recovery using a data distribution method that achieves balanced communication loads. Both our techniques have in common that they work in settings where computation after a failure is continued with less machines than before. This is in contrast to many previous approaches that---in particular for checkpointing---focus on systems that keep spare resources available to replace failed machines. Overall, we present different techniques that enable scalable algorithms. While some of these techniques are specific to graph problems, we also present tools for fault tolerant algorithms and applications in a distributed setting. To show that those can be helpful in many different domains, we evaluate them for graph problems and other applications like phylogenetic tree inference

    Advances in Grid Computing

    Get PDF
    This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems

    Monitoring and Optimization of ATLAS Tier 2 Center GoeGrid

    Get PDF
    The demand on computational and storage resources is growing along with the amount of infor- mation that needs to be processed and preserved. In order to ease the provisioning of the digital services to the growing number of consumers, more and more distributed computing systems and platforms are actively developed and employed. The building block of the distributed computing infrastructure are single computing centers, similar to the Worldwide LHC Computing Grid, Tier 2 centre GoeGrid. The main motivation of this thesis was the optimization of GoeGrid perfor- mance by efficient monitoring. The goal has been achieved by means of the GoeGrid monitoring information analysis. The data analysis approach was based on the adaptive-network-based fuzzy inference system (ANFIS) and machine learning algorithm such as Linear Support Vector Machine (SVM). The main object of the research was the digital service, since availability, reliability and ser- viceability of the computing platform can be measured according to the constant and stable provisioning of the services. Due to the widely used concept of the service oriented architecture (SOA) for large computing facilities, in advance knowing of the service state as well as the quick and accurate detection of its disability allows to perform the proactive management of the com- puting facility. The proactive management is considered as a core component of the computing facility management automation concept, such as Autonomic Computing. Thus in time as well as in advance and accurate identification of the provided service status can be considered as a contribution to the computing facility management automation, which is directly related to the provisioning of the stable and reliable computing resources. Based on the case studies, performed using the GoeGrid monitoring data, consideration of the approaches as generalized methods for the accurate and fast identification and prediction of the service status is reasonable. Simplicity and low consumption of the computing resources allow to consider the methods in the scope of the Autonomic Computing component
    corecore