41 research outputs found

    A Taxonomy of Data Grids for Distributed Data Sharing, Management and Processing

    Full text link
    Data Grids have been adopted as the platform for scientific communities that need to share, access, transport, process and manage large data collections distributed worldwide. They combine high-end computing technologies with high-performance networking and wide-area storage management techniques. In this paper, we discuss the key concepts behind Data Grids and compare them with other data sharing and distribution paradigms such as content delivery networks, peer-to-peer networks and distributed databases. We then provide comprehensive taxonomies that cover various aspects of architecture, data transportation, data replication and resource allocation and scheduling. Finally, we map the proposed taxonomy to various Data Grid systems not only to validate the taxonomy but also to identify areas for future exploration. Through this taxonomy, we aim to categorise existing systems to better understand their goals and their methodology. This would help evaluate their applicability for solving similar problems. This taxonomy also provides a "gap analysis" of this area through which researchers can potentially identify new issues for investigation. Finally, we hope that the proposed taxonomy and mapping also helps to provide an easy way for new practitioners to understand this complex area of research.Comment: 46 pages, 16 figures, Technical Repor

    JOR: A Journal-guided Reconstruction Optimization for RAID-Structured Storage Systems

    Get PDF
    This paper proposes a simple and practical RAID reconstruction optimization scheme, called JOurnal-guided Reconstruction (JOR). JOR exploits the fact that significant portions of data blocks in typical disk arrays are unused. JOR monitors the storage space utilization status at the block level to guide the reconstruction process so that only failed data on the used stripes is recovered to the spare disk. In JOR, data consistency is ensured by the requirement that all blocks in a disk array be initialized to zero (written with value zero) during synchronization while all blocks in the spare disk also be initialized to zero in the background. JOR can be easily incorporated into any existing reconstruction approach to optimize it, because the former is independent of and orthogonal to the latter. Experimental results obtained from our JOR prototype implementation demonstrate that JOR reduces reconstruction times of two state-of-the-art reconstruction schemes by an amount that is approximately proportional to the percentage of unused storage space while ensuring data consistency

    A lightweight Web interface to Grid scheduling systems

    Get PDF
    Grid computing is often out of reach for the very scientists who need these resources because of the complexity of popular middleware suites. Some effort has gone into abstracting away these complexities using graphical user interfaces, some of which have been Web-based. This paper presents a lightweight and portable interface for Grid management, that is made possible using recent advances in dynamic technologies for Web applications. Case studies are presented to demonstrate that this interface is both usable and useful. An analysis of usage then highlights some positive and negative aspects of this approach

    NetIbis: An Efficient and Dynamic Communication System for Heterogeneous Grids

    Get PDF
    Grids are more heterogeneous and dynamic than traditional parallel or distributed systems, both in terms of processors and of interconnects. A grid communication system must handle many issues: first, it must run on networks that are not yet determined when the application is launched, including user-space interconnects; second, it must transparently run on different networks at the same time; third, it should yield performance close to that of specialized communication systems. In this paper, we present NetIbis, a new Java communication system that provides a uniform interface for any underlying intercluster or intracluster network. NetIbis solves the heterogeneity issues posed by Grid computing by dynamically constructing network protocol stacks out of drivers, self-contained building blocks for flexible configuration, with limited functionality per driver. We describe the design and implementation of the major NetIbis drivers for serialization, multicast, reliability, and various underlying networks. We also describe various optimizations for performance, like layer collapsing for the GM driver. We evaluate the performance of NetIbis on several platforms, including a European grid

    Técnicas de altas prestaciones aplicadas al diseño de infraestructuras ferroviarias complejas

    Get PDF
    In this work we will focus on overhead air switches design problem. The design of railway infrastructures is an important problem in the railway world, non-optimal designs cause limitations in the train speed and, most important, malfunctions and breakages. Most railway companies have regulations for the design of these elements. Those regulations have been defined by the experience, but, as far as we know, there are no computerized software tools that assist with the task of designing and testing optimal solutions for overhead switches. The aim of this thesis is the design, implementation, and evaluation of a simulator that that facilitates the exploration of all possible solutions space, looking for the set of optimal solutions in the shortest time and at the lowest possible cost. Simulators are frequently used in the world of rail infrastructure. Many of them only focus on simulated scenarios predefined by the users, analyzing the feasibility or otherwise of the proposed design. Throughout this thesis, we will propose a framework to design a complete simulator that be able to propose, simulate and evaluate multiple solutions. This framework is based on four pillars: compromise between simulation accuracy and complexity, automatic generation of possible solutions (automatic exploration of the solution space), consideration of all the actors involved in the design process (standards, additional restrictions, etc.), and finally, the expert’s knowledge and integration of optimization metrics. Once we defined the framework different deployment proposes are presented, one to be run in a single node, and one in a distributed system. In the first paradigm, one thread per CPU available in the system is launched. All the simulators are designed around this paradigm of parallelism. The second simulation approach will be designed to be deploy in a cluster with several nodes, MPI will be used for that purpose. Finally, after the implementation of each of the approaches, we will proceed to evaluate the performance of each of them, carrying out a comparison of time and cost. Two examples of real scenarios will be used.El diseño de agujas aéreas es un problema bastante complejo y critico dentro del proceso de diseño de sistemas ferroviarios. Un diseño no óptimo puede provocar limitaciones en el servicio, como menor velocidad de tránsito, y lo que es más importante, puede ser la causa principal de accidentes y averías. La mayoría de las compañías ferroviarias disponen de regulaciones para el diseño correcto de estas agujas aéreas. Todas estas regulaciones han sido definidas bajo décadas de experiencia, pero hasta donde sé, no existen aplicaciones software que ayuden en la tarea de diseñar y probar soluciones óptimas. Es en este punto donde se centra el objetivo de la tesis, el diseño, implementación y evaluación de un simulador capaz de explorar todo el posible espacio de soluciones buscando el conjunto de soluciones óptimas en el menor tiempo y con el menor coste posible. Los simuladores son utilizados frecuentemente en el mundo de la infraestructura ferroviaria. Muchos de ellos solo se centran en la simulación de escenarios preestablecidos por el usuario, analizando la viabilidad o no del diseño propuesto. A lo largo de esta tesis, se propondrá un framework que permita al simulador final ser capaz de proponer, simular y evaluar múltiples soluciones. El framework se basa en 4 pilares fundamentales, compromiso entre precisión en la simulación y la complejidad del simulador; generación automática de posibles soluciones (exploración automática del espacio de soluciones), consideración de todos los agentes que intervienen en el proceso de diseño (normativa, restricciones adicionales, etc.) y por último, el conocimiento del experto y la integración de métricas de optimización. Una vez definido el framework se presentaran varias opciones de implementación del simulador, en la primera de ellas se diseñará e implementara una versión con hilos pura. Se lanzara un hilo por cada CPU disponible en el sistema. Todo el simulador se diseñará en torno a este paradigma de paralelismo. En un segundo simulador, se aplicará un paradigma mucho más pensado para su despliegue en un cluster y no en un único nodo (como el paradigma inicial), para ello se empleara MPI. Con esta versión se podrá adaptar el simulador al cluster en el que se va a ejecutar. Por último, se va a emplear un paradigma basado en cloud computing. Para ello, según las necesidades del escenario a simular, se emplearán más o menos máquinas virtuales. Finalmente, tras la implementación de cada uno de los simuladores, se procederá a evaluar el rendimiento de cada uno de ellos, realizando para ello una comparativa de tiempo y coste. Se empleara para ello dos ejemplos de escenarios reales.Programa Oficial de Doctorado en Ciencia y Tecnología InformáticaPresidente: José Daniel García Sánchez.- Secretario: Antonio García Dopico.- Vocal: Juan Carlos Díaz Martí

    Workflow models for heterogeneous distributed systems

    Get PDF
    The role of data in modern scientific workflows becomes more and more crucial. The unprecedented amount of data available in the digital era, combined with the recent advancements in Machine Learning and High-Performance Computing (HPC), let computers surpass human performances in a wide range of fields, such as Computer Vision, Natural Language Processing and Bioinformatics. However, a solid data management strategy becomes crucial for key aspects like performance optimisation, privacy preservation and security. Most modern programming paradigms for Big Data analysis adhere to the principle of data locality: moving computation closer to the data to remove transfer-related overheads and risks. Still, there are scenarios in which it is worth, or even unavoidable, to transfer data between different steps of a complex workflow. The contribution of this dissertation is twofold. First, it defines a novel methodology for distributed modular applications, allowing topology-aware scheduling and data management while separating business logic, data dependencies, parallel patterns and execution environments. In addition, it introduces computational notebooks as a high-level and user-friendly interface to this new kind of workflow, aiming to flatten the learning curve and improve the adoption of such methodology. Each of these contributions is accompanied by a full-fledged, Open Source implementation, which has been used for evaluation purposes and allows the interested reader to experience the related methodology first-hand. The validity of the proposed approaches has been demonstrated on a total of five real scientific applications in the domains of Deep Learning, Bioinformatics and Molecular Dynamics Simulation, executing them on large-scale mixed cloud-High-Performance Computing (HPC) infrastructures

    μ-DSU:A Micro-Language Based Approach to Dynamic Software Updating

    Get PDF
    Today software systems play a critical role in society’s infrastructures and many are required to provide uninterrupted services in their constantly changing environments. As the problem domain and the operational context of such software changes, the software itself must be updated accordingly. In this paper we propose to support dynamic software updating through language semantic adaptation; this is done through use of micro-languages that confine the effect of the introduced change to specific application features. Micro-languages provide a logical layer over a programming language and associate an application feature with the portion of the programming language used to implement it. Thus, they permit to update the application feature by updating the underlying programming constructs without affecting the behaviour of the other application features. Such a linguistic approach provides the benefit of easy addition/removal of application features (with a special focus on non-functional features) to/from a running application by separating the implementation of the new feature from the original application, allowing for the application to remain unaware of any extensions. The feasibility of this approach is demonstrated with two studies; its benefits and drawbacks are also analysed

    EXPRESS: Resource-oriented and RESTful Semantic Web services

    No full text
    This thesis investigates an approach that simplifies the development of Semantic Web services (SWS) by removing the need for additional semantic descriptions.The most actively researched approaches to Semantic Web services introduce explicit semantic descriptions of services that are in addition to the existing semantic descriptions of the service domains. This increases their complexity and design overhead. The need for semantically describing the services in such approaches stems from their foundations in service-oriented computing, i.e. the extension of already existing service descriptions. This thesis demonstrates that adopting a resource-oriented approach based on REST will, in contrast to service-oriented approaches, eliminate the need for explicit semantic service descriptions and service vocabularies. This reduces the development efforts while retaining the significant functional capabilities.The approach proposed in this thesis, called EXPRESS (Expressing RESTful Semantic Services), utilises the similarities between REST and the Semantic Web, such as resource realisation, self-describing representations, and uniform interfaces. The semantics of a service is elicited from a resource’s semantic description in the domain ontology and the semantics of the uniform interface, hence eliminating the need for additional semantic descriptions. Moreover, stub-generation is a by-product of the mapping between entities in the domain ontology and resources.EXPRESS was developed to test the feasibility of eliminating explicit service descriptions and service vocabularies or ontologies, to explore the restrictions placed on domain ontologies as a result, to investigate the impact on the semantic quality of the description, and explore the benefits and costs to developers. To achieve this, an online demonstrator that allows users to generate stubs has been developed. In addition, a matchmaking experiment was conducted to show that the descriptions of the services are comparable to OWL-S in terms of their ability to be discovered, while improving the efficiency of discovery. Finally, an expert review was undertaken which provided evidence of EXPRESS’s simplicity and practicality when developing SWS from scratch
    corecore