241 research outputs found

    A shared-disk parallel cluster file system

    Get PDF
    Dissertação apresentada para obtenção do Grau de Doutor em Informática Pela Universidade Nova de Lisboa, Faculdade de Ciências e TecnologiaToday, clusters are the de facto cost effective platform both for high performance computing (HPC) as well as IT environments. HPC and IT are quite different environments and differences include, among others, their choices on file systems and storage: HPC favours parallel file systems geared towards maximum I/O bandwidth, but which are not fully POSIX-compliant and were devised to run on top of (fault prone) partitioned storage; conversely, IT data centres favour both external disk arrays (to provide highly available storage) and POSIX compliant file systems, (either general purpose or shared-disk cluster file systems, CFSs). These specialised file systems do perform very well in their target environments provided that applications do not require some lateral features, e.g., no file locking on parallel file systems, and no high performance writes over cluster-wide shared files on CFSs. In brief, we can say that none of the above approaches solves the problem of providing high levels of reliability and performance to both worlds. Our pCFS proposal makes a contribution to change this situation: the rationale is to take advantage on the best of both – the reliability of cluster file systems and the high performance of parallel file systems. We don’t claim to provide the absolute best of each, but we aim at full POSIX compliance, a rich feature set, and levels of reliability and performance good enough for broad usage – e.g., traditional as well as HPC applications, support of clustered DBMS engines that may run over regular files, and video streaming. pCFS’ main ideas include: · Cooperative caching, a technique that has been used in file systems for distributed disks but, as far as we know, was never used either in SAN based cluster file systems or in parallel file systems. As a result, pCFS may use all infrastructures (LAN and SAN) to move data. · Fine-grain locking, whereby processes running across distinct nodes may define nonoverlapping byte-range regions in a file (instead of the whole file) and access them in parallel, reading and writing over those regions at the infrastructure’s full speed (provided that no major metadata changes are required). A prototype was built on top of GFS (a Red Hat shared disk CFS): GFS’ kernel code was slightly modified, and two kernel modules and a user-level daemon were added. In the prototype, fine grain locking is fully implemented and a cluster-wide coherent cache is maintained through data (page fragments) movement over the LAN. Our benchmarks for non-overlapping writers over a single file shared among processes running on different nodes show that pCFS’ bandwidth is 2 times greater than NFS’ while being comparable to that of the Parallel Virtual File System (PVFS), both requiring about 10 times more CPU. And pCFS’ bandwidth also surpasses GFS’ (600 times for small record sizes, e.g., 4 KB, decreasing down to 2 times for large record sizes, e.g., 4 MB), at about the same CPU usage.Lusitania, Companhia de Seguros S.A, Programa IBM Shared University Research (SUR

    Adaptive Mid-term and Short-term Scheduling of Mixed-criticality Systems

    Get PDF
    A mixed-criticality real-time system is a real-time system having multiple tasks classified according to their criticality. Research on mixed-criticality systems started to provide an effective and cost efficient a priori verification process for safety critical systems. The higher the criticality of a task within a system and the more the system should guarantee the required level of service for it. However, such model poses new challenges with respect to scheduling and fault tolerance within real-time systems. Currently, mixed-criticality scheduling protocols severely degrade lower criticality tasks in case of resource shortage to provide the required level of service for the most critical ones. The actual research challenge in this field is to devise robust scheduling protocols to minimise the impact on less critical tasks. This dissertation introduces two approaches, one short-term and the other medium-term, to appropriately allocate computing resources to tasks within mixed-criticality systems both on uniprocessor and multiprocessor systems. The short-term strategy consists of a protocol named Lazy Bailout Protocol (LBP) to schedule mixed-criticality task sets on single core architectures. Scheduling decisions are made about tasks that are active in the ready queue and that have to be dispatched to the CPU. LBP minimises the service degradation for lower criticality tasks by providing to them a background execution during the system idle time. After, I refined LBP with variants that aim to further increase the service level provided for lower criticality tasks. However, this is achieved at an increased cost of either system offline analysis or complexity at runtime. The second approach, named Adaptive Tolerance-based Mixed-criticality Protocol (ATMP), decides at runtime which task has to be allocated to the active cores according to the available resources. ATMP permits to optimise the overall system utility by tuning the system workload in case of shortage of computing capacity at runtime. Unlike the majority of current mixed-criticality approaches, ATMP allows to smoothly degrade also higher criticality tasks to keep allocated lower criticality ones

    Fault tolerant software technology for distributed computing system

    Get PDF
    Issued as Monthly reports [nos. 1-23], Interim technical report, Technical guide books [nos. 1-2], and Final report, Project no. G-36-64

    The exploitation of parallelism on shared memory multiprocessors

    Get PDF
    PhD ThesisWith the arrival of many general purpose shared memory multiple processor (multiprocessor) computers into the commercial arena during the mid-1980's, a rift has opened between the raw processing power offered by the emerging hardware and the relative inability of its operating software to effectively deliver this power to potential users. This rift stems from the fact that, currently, no computational model with the capability to elegantly express parallel activity is mature enough to be universally accepted, and used as the basis for programming languages to exploit the parallelism that multiprocessors offer. To add to this, there is a lack of software tools to assist programmers in the processes of designing and debugging parallel programs. Although much research has been done in the field of programming languages, no undisputed candidate for the most appropriate language for programming shared memory multiprocessors has yet been found. This thesis examines why this state of affairs has arisen and proposes programming language constructs, together with a programming methodology and environment, to close the ever widening hardware to software gap. The novel programming constructs described in this thesis are intended for use in imperative languages even though they make use of the synchronisation inherent in the dataflow model by using the semantics of single assignment when operating on shared data, so giving rise to the term shared values. As there are several distinct parallel programming paradigms, matching flavours of shared value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council

    Locating Objects in a Wide-area System

    Get PDF
    Steen, M.R. van [Promotor]Tanenbaum, A.S. [Promotor

    Second CLIPS Conference Proceedings, volume 1

    Get PDF
    Topics covered at the 2nd CLIPS Conference held at the Johnson Space Center, September 23-25, 1991 are given. Topics include rule groupings, fault detection using expert systems, decision making using expert systems, knowledge representation, computer aided design and debugging expert systems

    Organization based multiagent architecture for distributed environments

    Get PDF
    [EN]Distributed environments represent a complex field in which applied solutions should be flexible and include significant adaptation capabilities. These environments are related to problems where multiple users and devices may interact, and where simple and local solutions could possibly generate good results, but may not be effective with regards to use and interaction. There are many techniques that can be employed to face this kind of problems, from CORBA to multi-agent systems, passing by web-services and SOA, among others. All those methodologies have their advantages and disadvantages that are properly analyzed in this documents, to finally explain the new architecture presented as a solution for distributed environment problems. The new architecture for solving complex solutions in distributed environments presented here is called OBaMADE: Organization Based Multiagent Architecture for Distributed Environments. It is a multiagent architecture based on the organizations of agents paradigm, where the agents in the architecture are structured into organizations to improve their organizational capabilities. The reasoning power of the architecture is based on the Case-Based Reasoning methology, being implemented in a internal organization that uses agents to create services to solve the external request made by the users. The OBaMADE architecture has been successfully applied to two different case studies where its prediction capabilities have been properly checked. Those case studies have showed optimistic results and, being complex systems, have demonstrated the abstraction and generalizations capabilities of the architecture. Nevertheless OBaMADE is intended to be able to solve much other kind of problems in distributed environments scenarios. It should be applied to other varieties of situations and to other knowledge fields to fully develop its potencial.[ES]Los entornos distribuidos representan un campo de conocimiento complejo en el que las soluciones a aplicar deben ser flexibles y deben contar con gran capacidad de adaptación. Este tipo de entornos está normalmente relacionado con problemas donde varios usuarios y dispositivos entran en juego. Para solucionar dichos problemas, pueden utilizarse sistemas locales que, aunque ofrezcan buenos resultados en términos de calidad de los mismos, no son tan efectivos en cuanto a la interacción y posibilidades de uso. Existen múltiples técnicas que pueden ser empleadas para resolver este tipo de problemas, desde CORBA a sistemas multiagente, pasando por servicios web y SOA, entre otros. Todas estas mitologías tienen sus ventajas e inconvenientes, que se analizan en este documento, para explicar, finalmente, la nueva arquitectura presentada como una solución para los problemas generados en entornos distribuidos. La nueva arquitectura aquí se llama OBaMADE, que es el acrónimo del inglés Organization Based Multiagent Architecture for Distributed Environments (Arquitectura Multiagente Basada en Organizaciones para Entornos Distribuidos). Se trata de una arquitectura multiagente basasa en el paradigma de las organizaciones de agente, donde los agentes que forman parte de la arquitectura se estructuran en organizaciones para mejorar sus capacidades organizativas. La capacidad de razonamiento de la arquitectura está basada en la metodología de razonamiento basado en casos, que se ha implementado en una de las organizaciones internas de la arquitectura por medio de agentes que crean servicios que responden a las solicitudes externas de los usuarios. La arquitectura OBaMADE se ha aplicado de forma exitosa a dos casos de estudio diferentes, en los que se han demostrado sus capacidades predictivas. Aplicando OBaMADE a estos casos de estudio se han obtenido resultados esperanzadores y, al ser sistemas complejos, se han demostrado las capacidades tanto de abstracción como de generalización de la arquitectura presentada. Sin embargo, esta arquitectura está diseñada para poder ser aplicada a más tipo de problemas de entornos distribuidos. Debe ser aplicada a más variadas situaciones y a otros campos de conocimiento para desarrollar completamente el potencial de esta arquitectura

    Software test and evaluation study phase I and II : survey and analysis

    Get PDF
    Issued as Final report, Project no. G-36-661 (continues G-36-636; includes A-2568
    • …
    corecore