18 research outputs found

    January-March 2008

    Get PDF

    Contributions to Desktop Grid Computing : From High Throughput Computing to Data-Intensive Sciences on Hybrid Distributed Computing Infrastructures

    Get PDF
    Since the mid 90’s, Desktop Grid Computing - i.e the idea of using a large number of remote PCs distributed on the Internet to execute large parallel applications - has proved to be an efficient paradigm to provide a large computational power at the fraction of the cost of a dedicated computing infrastructure.This document presents my contributions over the last decade to broaden the scope of Desktop Grid Computing. My research has followed three different directions. The first direction has established new methods to observe and characterize Desktop Grid resources and developed experimental platforms to test and validate our approach in conditions close to reality. The second line of research has focused on integrating Desk- top Grids in e-science Grid infrastructure (e.g. EGI), which requires to address many challenges such as security, scheduling, quality of service, and more. The third direction has investigated how to support large-scale data management and data intensive applica- tions on such infrastructures, including support for the new and emerging data-oriented programming models.This manuscript not only reports on the scientific achievements and the technologies developed to support our objectives, but also on the international collaborations and projects I have been involved in, as well as the scientific mentoring which motivates my candidature for the Habilitation `a Diriger les Recherches

    Reference Exascale Architecture (Extended Version)

    Get PDF
    While political commitments for building exascale systems have been made, turning these systems into platforms for a wide range of exascale applications faces several technical, organisational and skills-related challenges. The key technical challenges are related to the availability of data. While the first exascale machines are likely to be built within a single site, the input data is in many cases impossible to store within a single site. Alongside handling of extreme-large amount of data, the exascale system has to process data from different sources, support accelerated computing, handle high volume of requests per day, minimize the size of data flows, and be extensible in terms of continuously increasing data as well as an increase in parallel requests being sent. These technical challenges are addressed by the general reference exascale architecture. It is divided into three main blocks: virtualization layer, distributed virtual file system, and manager of computing resources. Its main property is modularity which is achieved by containerization at two levels: 1) application containers - containerization of scientific workflows, 2) micro-infrastructure - containerization of extreme-large data service-oriented infrastructure. The paper also presents an instantiation of the reference architecture - the architecture of the PROCESS project (PROviding Computing solutions for ExaScale ChallengeS) and discusses its relation to the reference exascale architecture. The PROCESS architecture has been used as an exascale platform within various exascale pilot applications. This paper also presents performance modelling of exascale platform with its validation

    Advances in Grid Computing

    Get PDF
    This book approaches the grid computing with a perspective on the latest achievements in the field, providing an insight into the current research trends and advances, and presenting a large range of innovative research papers. The topics covered in this book include resource and data management, grid architectures and development, and grid-enabled applications. New ideas employing heuristic methods from swarm intelligence or genetic algorithm and quantum encryption are considered in order to explain two main aspects of grid computing: resource management and data management. The book addresses also some aspects of grid computing that regard architecture and development, and includes a diverse range of applications for grid computing, including possible human grid computing system, simulation of the fusion reaction, ubiquitous healthcare service provisioning and complex water systems

    Financial Derivatives Market for Grid Computing

    Get PDF
    This Master thesis studies the feasibility and properties of a financial derivatives market on Grid computing, a service for sharing computing resources over a network such as the Internet. For the European Organization for Nuclear Research (CERN) to perform research with the world's largest and most complex machine, the Large Hadron Collider (LHC), Grid computing was developed to handle the information created. In accordance with the mandate of CERN Technology Transfer (TT) group, this thesis is a part of CERN's dissemination of the Grid technology. The thesis gives a brief overview of the use of the Grid technology and where it is heading. IT trend analysts and large-scale IT vendors see this technology as key in transforming the world of IT. They predict that in a matter of years, IT will be bought as a service, instead of a good. Commoditization of IT, delivered as a service, is a paradigm shift that will have a broad impact on all parts of the IT market, as well as on the society as a whole. Political, economic and physical factors advocate a market for standardized computing resources supplied by multiple professional providers, benefiting from economies of scale. We argue for the trade of Virtual Servers as the standardized bundle of computer resources. Continuous trade of homogeneous resources allows for scheduling market efficiency and liquidity, but may entail a risk of erratic, unpredictable prices. We therefor e construct a complete, coherent Grid economy, consisting of both a spot market and a derivatives market. While the spot market is the trading place for the computer resources, the derivatives market aims to disperse the risk among those who are willing to invest in it. Because the Virtual Servers are non-storable assets, normal arbitrage theory cannot be used to price derivatives contracts. We propose to solve this issue by creating storable swap contracts priced by an auction-based market, where we argue that the price process follows a geometric Brownian motion. Taking into account the absence of arbitrage in the swap market and the requirement for a complete market, we offer a theoretical framework for martingale pricing and hedging of derivatives written on swaps

    A Process Model for the Integrated Reasoning about Quantitative IT Infrastructure Attributes

    Get PDF
    IT infrastructures can be quantitatively described by attributes, like performance or energy efficiency. Ever-changing user demands and economic attempts require varying short-term and long-term decisions regarding the alignment of an IT infrastructure and particularly its attributes to this dynamic surrounding. Potentially conflicting attribute goals and the central role of IT infrastructures presuppose decision making based upon reasoning, the process of forming inferences from facts or premises. The focus on specific IT infrastructure parts or a fixed (small) attribute set disqualify existing reasoning approaches for this intent, as they neither cover the (complex) interplay of all IT infrastructure components simultaneously, nor do they address inter- and intra-attribute correlations sufficiently. This thesis presents a process model for the integrated reasoning about quantitative IT infrastructure attributes. The process model’s main idea is to formalize the compilation of an individual reasoning function, a mathematical mapping of parametric influencing factors and modifications on an attribute vector. Compilation bases upon model integration to benefit from the multitude of existing specialized, elaborated, and well-established attribute models. The achieved reasoning function consumes an individual tuple of IT infrastructure components, attributes, and external influencing factors to expose a broad applicability. The process model formalizes a reasoning intent in three phases. First, reasoning goals and parameters are collected in a reasoning suite, and formalized in a reasoning function skeleton. Second, the skeleton is iteratively refined, guided by the reasoning suite. Third, the achieved reasoning function is employed for What-if analyses, optimization, or descriptive statistics to conduct the concrete reasoning. The process model provides five template classes that collectively formalize all phases in order to foster reproducibility and to reduce error-proneness. Process model validation is threefold. A controlled experiment reasons about a Raspberry Pi cluster’s performance and energy efficiency to illustrate feasibility. Besides, a requirements analysis on a world-class supercomputer and on the European-wide execution of hydro meteorology simulations as well as a related work examination disclose the process model’s level of innovation. Potential future work employs prepared automation capabilities, integrates human factors, and uses reasoning results for the automatic generation of modification recommendations.IT-Infrastrukturen können mit Attributen, wie Leistung und Energieeffizienz, quantitativ beschrieben werden. Nutzungsbedarfsänderungen und ökonomische Bestrebungen erfordern Kurz- und Langfristentscheidungen zur Anpassung einer IT-Infrastruktur und insbesondere ihre Attribute an dieses dynamische Umfeld. Potentielle Attribut-Zielkonflikte sowie die zentrale Rolle von IT-Infrastrukturen erfordern eine Entscheidungsfindung mittels Reasoning, einem Prozess, der Rückschlüsse (rein) aus Fakten und Prämissen zieht. Die Fokussierung auf spezifische Teile einer IT-Infrastruktur sowie die Beschränkung auf (sehr) wenige Attribute disqualifizieren bestehende Reasoning-Ansätze für dieses Vorhaben, da sie weder das komplexe Zusammenspiel von IT-Infrastruktur-Komponenten, noch Abhängigkeiten zwischen und innerhalb einzelner Attribute ausreichend berücksichtigen können. Diese Arbeit präsentiert ein Prozessmodell für das integrierte Reasoning über quantitative IT-Infrastruktur-Attribute. Die grundlegende Idee des Prozessmodells ist die Herleitung einer individuellen Reasoning-Funktion, einer mathematischen Abbildung von Einfluss- und Modifikationsparametern auf einen Attributvektor. Die Herleitung basiert auf der Integration bestehender (Attribut-)Modelle, um von deren Spezialisierung, Reife und Verbreitung profitieren zu können. Die erzielte Reasoning-Funktion verarbeitet ein individuelles Tupel aus IT-Infrastruktur-Komponenten, Attributen und externen Einflussfaktoren, um eine breite Anwendbarkeit zu gewährleisten. Das Prozessmodell formalisiert ein Reasoning-Vorhaben in drei Phasen. Zunächst werden die Reasoning-Ziele und -Parameter in einer Reasoning-Suite gesammelt und in einem Reasoning-Funktions-Gerüst formalisiert. Anschließend wird das Gerüst entsprechend den Vorgaben der Reasoning-Suite iterativ verfeinert. Abschließend wird die hergeleitete Reasoning-Funktion verwendet, um mittels “What-if”–Analysen, Optimierungsverfahren oder deskriptiver Statistik das Reasoning durchzuführen. Das Prozessmodell enthält fünf Template-Klassen, die den Prozess formalisieren, um Reproduzierbarkeit zu gewährleisten und Fehleranfälligkeit zu reduzieren. Das Prozessmodell wird auf drei Arten validiert. Ein kontrolliertes Experiment zeigt die Durchführbarkeit des Prozessmodells anhand des Reasonings zur Leistung und Energieeffizienz eines Raspberry Pi Clusters. Eine Anforderungsanalyse an einem Superrechner und an der europaweiten Ausführung von Hydro-Meteorologie-Modellen erläutert gemeinsam mit der Betrachtung verwandter Arbeiten den Innovationsgrad des Prozessmodells. Potentielle Erweiterungen nutzen die vorbereiteten Automatisierungsansätze, integrieren menschliche Faktoren, und generieren Modifikationsempfehlungen basierend auf Reasoning-Ergebnissen

    Efficient multilevel scheduling in grids and clouds with dynamic provisioning

    Get PDF
    Tesis de la Universidad Complutense de Madrid, Facultad de Informática, Departamento de Arquitectura de Computadores y Automática, leída el 12-01-2016La consolidación de las grandes infraestructuras para la Computación Distribuida ha resultado en una plataforma de Computación de Alta Productividad que está lista para grandes cargas de trabajo. Los mejores exponentes de este proceso son las federaciones grid actuales. Por otro lado, la Computación Cloud promete ser más flexible, utilizable, disponible y simple que la Computación Grid, cubriendo además muchas más necesidades computacionales que las requeridas para llevar a cabo cálculos distribuidos. En cualquier caso, debido al dinamismo y la heterogeneidad presente en grids y clouds, encontrar la asignación ideal de las tareas computacionales en los recursos disponibles es, por definición un problema NP-completo, y sólo se pueden encontrar soluciones subóptimas para estos entornos. Sin embargo, la caracterización de estos recursos en ambos tipos de infraestructuras es deficitaria. Los sistemas de información disponibles no proporcionan datos fiables sobre el estado de los recursos, lo cual no permite la planificación avanzada que necesitan los diferentes tipos de aplicaciones distribuidas. Durante la última década esta cuestión no ha sido resuelta para la Computación Grid y las infraestructuras cloud establecidas recientemente presentan el mismo problema. En este marco, los planificadores (brokers) sólo pueden mejorar la productividad de las ejecuciones largas, pero no proporcionan ninguna estimación de su duración. La planificación compleja ha sido abordada tradicionalmente por otras herramientas como los gestores de flujos de trabajo, los auto-planificadores o los sistemas de gestión de producción pertenecientes a ciertas comunidades de investigación. Sin embargo, el bajo rendimiento obtenido con estos mecanismos de asignación anticipada (early-binding) es notorio. Además, la diversidad en los proveedores cloud, la falta de soporte de herramientas de planificación y de interfaces de programación estandarizadas para distribuir la carga de trabajo, dificultan la portabilidad masiva de aplicaciones legadas a los entornos cloud...The consolidation of large Distributed Computing infrastructures has resulted in a High-Throughput Computing platform that is ready for high loads, whose best proponents are the current grid federations. On the other hand, Cloud Computing promises to be more flexible, usable, available and simple than Grid Computing, covering also much more computational needs than the ones required to carry out distributed calculations. In any case, because of the dynamism and heterogeneity that are present in grids and clouds, calculating the best match between computational tasks and resources in an effectively characterised infrastructure is, by definition, an NP-complete problem, and only sub-optimal solutions (schedules) can be found for these environments. Nevertheless, the characterisation of the resources of both kinds of infrastructures is far from being achieved. The available information systems do not provide accurate data about the status of the resources that can allow the advanced scheduling required by the different needs of distributed applications. The issue was not solved during the last decade for grids and the cloud infrastructures recently established have the same problem. In this framework, brokers only can improve the throughput of very long calculations, but do not provide estimations of their duration. Complex scheduling was traditionally tackled by other tools such as workflow managers, self-schedulers and the production management systems of certain research communities. Nevertheless, the low performance achieved by these earlybinding methods is noticeable. Moreover, the diversity of cloud providers and mainly, their lack of standardised programming interfaces and brokering tools to distribute the workload, hinder the massive portability of legacy applications to cloud environments...Depto. de Arquitectura de Computadores y AutomáticaFac. de InformáticaTRUEsubmitte

    A Process Model for the Integrated Reasoning about Quantitative IT Infrastructure Attributes

    Get PDF
    IT infrastructures can be quantitatively described by attributes, like performance or energy efficiency. Ever-changing user demands and economic attempts require varying short-term and long-term decisions regarding the alignment of an IT infrastructure and particularly its attributes to this dynamic surrounding. Potentially conflicting attribute goals and the central role of IT infrastructures presuppose decision making based upon reasoning, the process of forming inferences from facts or premises. The focus on specific IT infrastructure parts or a fixed (small) attribute set disqualify existing reasoning approaches for this intent, as they neither cover the (complex) interplay of all IT infrastructure components simultaneously, nor do they address inter- and intra-attribute correlations sufficiently. This thesis presents a process model for the integrated reasoning about quantitative IT infrastructure attributes. The process model’s main idea is to formalize the compilation of an individual reasoning function, a mathematical mapping of parametric influencing factors and modifications on an attribute vector. Compilation bases upon model integration to benefit from the multitude of existing specialized, elaborated, and well-established attribute models. The achieved reasoning function consumes an individual tuple of IT infrastructure components, attributes, and external influencing factors to expose a broad applicability. The process model formalizes a reasoning intent in three phases. First, reasoning goals and parameters are collected in a reasoning suite, and formalized in a reasoning function skeleton. Second, the skeleton is iteratively refined, guided by the reasoning suite. Third, the achieved reasoning function is employed for What-if analyses, optimization, or descriptive statistics to conduct the concrete reasoning. The process model provides five template classes that collectively formalize all phases in order to foster reproducibility and to reduce error-proneness. Process model validation is threefold. A controlled experiment reasons about a Raspberry Pi cluster’s performance and energy efficiency to illustrate feasibility. Besides, a requirements analysis on a world-class supercomputer and on the European-wide execution of hydro meteorology simulations as well as a related work examination disclose the process model’s level of innovation. Potential future work employs prepared automation capabilities, integrates human factors, and uses reasoning results for the automatic generation of modification recommendations.IT-Infrastrukturen können mit Attributen, wie Leistung und Energieeffizienz, quantitativ beschrieben werden. Nutzungsbedarfsänderungen und ökonomische Bestrebungen erfordern Kurz- und Langfristentscheidungen zur Anpassung einer IT-Infrastruktur und insbesondere ihre Attribute an dieses dynamische Umfeld. Potentielle Attribut-Zielkonflikte sowie die zentrale Rolle von IT-Infrastrukturen erfordern eine Entscheidungsfindung mittels Reasoning, einem Prozess, der Rückschlüsse (rein) aus Fakten und Prämissen zieht. Die Fokussierung auf spezifische Teile einer IT-Infrastruktur sowie die Beschränkung auf (sehr) wenige Attribute disqualifizieren bestehende Reasoning-Ansätze für dieses Vorhaben, da sie weder das komplexe Zusammenspiel von IT-Infrastruktur-Komponenten, noch Abhängigkeiten zwischen und innerhalb einzelner Attribute ausreichend berücksichtigen können. Diese Arbeit präsentiert ein Prozessmodell für das integrierte Reasoning über quantitative IT-Infrastruktur-Attribute. Die grundlegende Idee des Prozessmodells ist die Herleitung einer individuellen Reasoning-Funktion, einer mathematischen Abbildung von Einfluss- und Modifikationsparametern auf einen Attributvektor. Die Herleitung basiert auf der Integration bestehender (Attribut-)Modelle, um von deren Spezialisierung, Reife und Verbreitung profitieren zu können. Die erzielte Reasoning-Funktion verarbeitet ein individuelles Tupel aus IT-Infrastruktur-Komponenten, Attributen und externen Einflussfaktoren, um eine breite Anwendbarkeit zu gewährleisten. Das Prozessmodell formalisiert ein Reasoning-Vorhaben in drei Phasen. Zunächst werden die Reasoning-Ziele und -Parameter in einer Reasoning-Suite gesammelt und in einem Reasoning-Funktions-Gerüst formalisiert. Anschließend wird das Gerüst entsprechend den Vorgaben der Reasoning-Suite iterativ verfeinert. Abschließend wird die hergeleitete Reasoning-Funktion verwendet, um mittels “What-if”–Analysen, Optimierungsverfahren oder deskriptiver Statistik das Reasoning durchzuführen. Das Prozessmodell enthält fünf Template-Klassen, die den Prozess formalisieren, um Reproduzierbarkeit zu gewährleisten und Fehleranfälligkeit zu reduzieren. Das Prozessmodell wird auf drei Arten validiert. Ein kontrolliertes Experiment zeigt die Durchführbarkeit des Prozessmodells anhand des Reasonings zur Leistung und Energieeffizienz eines Raspberry Pi Clusters. Eine Anforderungsanalyse an einem Superrechner und an der europaweiten Ausführung von Hydro-Meteorologie-Modellen erläutert gemeinsam mit der Betrachtung verwandter Arbeiten den Innovationsgrad des Prozessmodells. Potentielle Erweiterungen nutzen die vorbereiteten Automatisierungsansätze, integrieren menschliche Faktoren, und generieren Modifikationsempfehlungen basierend auf Reasoning-Ergebnissen
    corecore