304 research outputs found

    On Designing Multicore-aware Simulators for Biological Systems

    Full text link
    The stochastic simulation of biological systems is an increasingly popular technique in bioinformatics. It often is an enlightening technique, which may however result in being computational expensive. We discuss the main opportunities to speed it up on multi-core platforms, which pose new challenges for parallelisation techniques. These opportunities are developed in two general families of solutions involving both the single simulation and a bulk of independent simulations (either replicas of derived from parameter sweep). Proposed solutions are tested on the parallelisation of the CWC simulator (Calculus of Wrapped Compartments) that is carried out according to proposed solutions by way of the FastFlow programming framework making possible fast development and efficient execution on multi-cores.Comment: 19 pages + cover pag

    Building a RTOS for MPSoC Dataflow Programming

    No full text
    International audienceMultiprocessor Systems-on-Chip (MPSoC) are becoming the standard high performance Digital Signal Processing (DSP) systems. Hardware complexity abstraction is needed to enable efficient MPSoC programming. A major challenge of MPSoC programming is efficiently handling the combination of new features necessary in a MPSoC operating system: load balancing and efficient use of the parallel resources, with the more traditional features of Real-Time Operating Systems (RTOS): resource sharing between applications, task priorities and reactivity to events. This paper presents a method to combine dataflow methods and RTOS features. The resulting system prototypes an RTOS for symmetric multiprocessing MPSoCs whose inputs are dataflow graphs of applications. The prototype is built on the uC/OS-II RTOS. Experimental results are given on a 3GPP Long Term Evolution algorithm executed on a 4-core MPSoC

    Integration of generic operating systems in partitioned architectures

    Get PDF
    Tese de mestrado, Engenharia Informática (Arquitectura, Sistemas e Redes de Computadores), Universidade de Lisboa, Faculdade de Ciências, 2009The Integrated Modular Avionics (IMA) specification defines a partitioned environment hosting multiple avionics functions of different criticalities on a shared computing platform. ARINC 653, one of the specifications related to the IMA concept, defines a standard interface between the software applications and the underlying operating system. Both these specifications come from the world of civil aviation, but they are getting interest from space industry partners, who have identified common requirements to those of aeronautic applications. Within the scope of this interest, the AIR architecture was defined, under a contract from the European Space Agency (ESA). AIR provides temporal and spatial segregation, and foresees the use of different operating systems in each partition. Temporal segregation is achieved through the fixed cyclic scheduling of computing resources to partitions. The present work extends the foreseen partition operating system (POS) heterogeneity to generic non-real-time operating systems. This was motivated by documented difficulties in porting applications to RTOSs, and by the notion that proper integration of a non-real-time POS will not compromise the timeliness of critical real-time functions. For this purpose, Linux is used as a case study. An embedded variant of Linux is built and evaluated regarding its adequacy as a POS in the AIR architecture. To guarantee safe integration, a solution based on the Linux paravirtualization interface, paravirt-ops, is proposed. In the course of these activities, the AIR architecture definition was also subject to improvements. The most significant one, motivated by the intended increased POS heterogeneity, was the introduction of a new component, the AIR Partition OS Adaptation Layer (PAL). The AIR PAL provides greater POS-independence to the major components of the AIR architecture, easing their independent certification efforts. Other improvements provide enhanced timeliness mechanisms, such as mode-based schedules and process deadline violation monitoring.A especificação Integrated Modular Avionics (IMA) define um ambiente compartimentado com funções de aviónica de diferentes criticalidades a coexistir numa plataforma computacional. A especificação relacionada ARINC 653 define uma interface padrão entre as aplicações e o sistema operativo subjacente. Ambas as especificações provêm do mundo da aviónica, mas estão a ganhar o interesse de parceiros da indústria espacial, que identificaram requisitos em comum entre as aplicações aeronáuticas e espaciais. No âmbito deste interesse, foi definida a arquitectura AIR, sob contrato da Agência Espacial Europeia (ESA). Esta arquitectura fornece segregação temporale espacial, e prevê o uso de diferentes sistemas operativos em cada partição. A segregação temporal é obtida através do escalonamento fixo e cíclico dos recursos às partições. Este trabalho estende a heterogeneidade prevista entre os sistemas operativos das partições (POS). Tal foi motivado pelas dificuldades documentadas em portar aplicações para sistemas operativos de tempo-real, e pela noção de que a integração apropriada de um POS não-tempo-real não comprometerá a pontualidade das funções críticas de tempo-real. Para este efeito, o Linux foi utilizado como caso de estudo. Uma variante embedida de Linux é construída e avaliada quanto à sua adequação como POS na arquitectura AIR. Para garantir uma integração segura, é proposta uma solução baseada na interface de paravirtualização do Linux, paravirt-ops. No decurso destas actividades, foram também feitas melhorias à definição da arquitectura AIR. O mais significante, motivado pelo pretendido aumento da heterogeneidade entre POSs, foi a introdução de um novo componente, AIR Partition OS Adaptation Layer (PAL). Este componente proporciona aos principais componentes da arquitectura AIR maior independência face ao POS, facilitando os esforços para a sua certificação independente. Outros melhoramentos fornecem mecanismos avançados de pontualidade, como mode-based schedules e monitorização de incumprimento de metas temporais de processos.ESA/ITI - European Space Agency Innovation Triangular Initiative (through ESTEC Contract 21217/07/NL/CB-Project AIR-II) and FCT - Fundação para a Ciência e Tecnologia (through the Multiannual Funding Programme

    On Designing Multicore-Aware Simulators for Systems Biology Endowed with OnLine Statistics

    Get PDF
    The paper arguments are on enabling methodologies for the design of a fully parallel, online, interactive tool aiming to support the bioinformatics scientists .In particular, the features of these methodologies, supported by the FastFlow parallel programming framework, are shown on a simulation tool to perform the modeling, the tuning, and the sensitivity analysis of stochastic biological models. A stochastic simulation needs thousands of independent simulation trajectories turning into big data that should be analysed by statistic and data mining tools. In the considered approach the two stages are pipelined in such a way that the simulation stage streams out the partial results of all simulation trajectories to the analysis stage that immediately produces a partial result. The simulation-analysis workflow is validated for performance and effectiveness of the online analysis in capturing biological systems behavior on a multicore platform and representative proof-of-concept biological systems. The exploited methodologies include pattern-based parallel programming and data streaming that provide key features to the software designers such as performance portability and efficient in-memory (big) data management and movement. Two paradigmatic classes of biological systems exhibiting multistable and oscillatory behavior are used as a testbed

    Porting sloth system to FreeRTOS for ARM Multicore

    Get PDF
    Dissertação de mestrado integrado em Engenharia Eletrónica Industrial e ComputadoresThe microprocessor industry is in the midst of a dramatic transformation. Up until recently, to boost microprocessors’ performance it was solely relied on increasing clock frequency. Nowadays, however, the power consumption requirements, coupled with the growing consumer demand, made the industry shift their focus from singlecore to multicore solutions, which offer an increase in performance, without a proportional increase in power consumption. The embedded systems field is no exception and the trend to use multicore solutions has been rising substantially in the last few years. Managing control flow is one of the core responsibilities of an operating system. Bearing this in mind, operating systems suffer from the existence of a bifid priority space, dictated by the co-existence of synchronous threads, managed by kernel scheduler, and asynchronous interrupt handlers, scheduled by hardware. This induces a well-identified problem, termed rate-monotonic priority inversion. Regarding safety-critical real-time systems, where time and determinism play a critical role, the inherent possibility of delayed execution of real-time threads by hardware interrupts with semantically lower priority can have catastrophic consequences to human life. Within this context, this dissertation presents the extension of a previous ’inhouse’ project, by proposing the implementation of a unified priority space approach (Sloth) in a multicore environment. To accomplish this, it is proposed the offloading of the scheduling decisions and synchronization mechanisms to a Commercial Off-The-Shelf (COTS) hardware interrupt controller (removing the need for a software scheduler) on an ARM Cortex-A9 MPCore platform.A indústria de microprocessores está envolta numa transformação dramática. Até recentemente, para impulsionar a performance, a indústria dependia somente do aumento gradual da frequência de relógio. Atualmente, os requisitos de consumo energético, conjugados com as crescentes exigências do consumidor, levaram a indústria a mudar o seu foco de soluções singlecore para soluções multicore. Estas oferecem um aumento substancial de performance, sem o proporcional aumento de consumo energético, característico das arquiteturas singlecore. Os sistemas embebidos não são excepção e a tendência para a utilização de soluções multicore tem aumentado substancialmente nos últimos anos. Uma das principais responsabilidades de um sistema operativo é a gestão do fluxo de controlo. Neste contexto, os sistemas operativos sofrem da existência de um espaço de prioridades bifurcado, caracterizado pela existência de tarefas, geridas pelo escalonador do kernel (software) e de interrupções, escalonadas por hardware. Introduz-se, assim, um problema bem identificado na comunidade científica, denominado rate-monotonic priority inversion. Em sistemas de tempo real, em que a segurança assume um papel fulcral e onde a performance e o determinismo são essenciais, a possibilidade da execução de tarefas de elevada prioridade ser atrasada, por interrupções de hardware com prioridade semântica inferior, pode ter consequências catastróficas para a vida humana. Neste sentido, esta dissertação apresenta a extensão de um trabalho anterior, propondo a implementação de um espaço de prioridades unificado (Sloth), num ambiente multicore. Assim sendo, é proposto o offloading do escalonador e mecanismos de sincronização para o controlador de interrupções (hardware) numa plataforma ARM Cortex-A9 MPCore

    StarPU: a Runtime System for Scheduling Tasks over Accelerator-Based Multicore Machines

    Get PDF
    Multicore machines equipped with accelerators are becoming increasingly popular. The TOP500-leading RoadRunner machine is probably the most famous example of a parallel computer mixing IBM Cell Broadband Engines and AMD opteron processors. Other architectures, featuring GPU accelerators, are expected to appear in the near future. To fully tap into the potential of these hybrid machines, pure offloading approaches, in which the main core of the application runs on regular processors and offloads specific parts on accelerators, are not sufficient. The real challenge is to build systems where the application would permanently spread across the entire machine, that is, where parallel tasks would be dynamically scheduled over the full set of available processing units. To face this challenge, we propose a new runtime system capable of scheduling tasks over heterogeneous, accelerator-based machines. Our system features a software virtual shared memory that provides a weak consistency model. The system keeps track of data copies within accelerator embedded-memories and features a data-prefetching engine. Such facilities, together with a database of self-tuned per-task performance models, can be used to greatly improve the quality of scheduling policies in this context. We demonstrate the relevance of our approach by benchmarking various parallel numerical kernel implementations over our runtime system. We obtain significant speedups and a very high efficiency on various typical workloads over multicore machines equipped with multiple accelerators

    Accelerating Pattern Matching in Neuromorphic Text Recognition System Using Intel Xeon Phi Coprocessor

    Get PDF
    Neuromorphic computing systems refer to the computing architecture inspired by the working mechanism of human brains. The rapidly reducing cost and increasing performance of state-of-the-art computing hardware allows large-scale implementation of machine intelligence models with neuromorphic architectures and opens the opportunity for new applications. One such computing hardware is Intel Xeon Phi coprocessor, which delivers over a TeraFLOP of computing power with 61 integrated processing cores. How to efficiently harness such computing power to achieve real time decision and cognition is one of the key design considerations. This work presents an optimized implementation of Brain-State-in-a-Box (BSB) neural network model on the Xeon Phi coprocessor for pattern matching in the context of intelligent text recognition of noisy document images. From a scalability standpoint on a High Performance Computing (HPC) platform we show that efficient workload partitioning and resource management can double the performance of this many-core architecture for neuromorphic applications

    MURAC: A unified machine model for heterogeneous computers

    Get PDF
    Includes bibliographical referencesHeterogeneous computing enables the performance and energy advantages of multiple distinct processing architectures to be efficiently exploited within a single machine. These systems are capable of delivering large performance increases by matching the applications to architectures that are most suited to them. The Multiple Runtime-reconfigurable Architecture Computer (MURAC) model has been proposed to tackle the problems commonly found in the design and usage of these machines. This model presents a system-level approach that creates a clear separation of concerns between the system implementer and the application developer. The three key concepts that make up the MURAC model are a unified machine model, a unified instruction stream and a unified memory space. A simple programming model built upon these abstractions provides a consistent interface for interacting with the underlying machine to the user application. This programming model simplifies application partitioning between hardware and software and allows the easy integration of different execution models within the single control ow of a mixed-architecture application. The theoretical and practical trade-offs of the proposed model have been explored through the design of several systems. An instruction-accurate system simulator has been developed that supports the simulated execution of mixed-architecture applications. An embedded System-on-Chip implementation has been used to measure the overhead in hardware resources required to support the model, which was found to be minimal. An implementation of the model within an operating system on a tightly-coupled reconfigurable processor platform has been created. This implementation is used to extend the software scheduler to allow for the full support of mixed-architecture applications in a multitasking environment. Different scheduling strategies have been tested using this scheduler for mixed-architecture applications. The design and implementation of these systems has shown that a unified abstraction model for heterogeneous computers provides important usability benefits to system and application designers. These benefits are achieved through a consistent view of the multiple different architectures to the operating system and user applications. This allows them to focus on achieving their performance and efficiency goals by gaining the benefits of different execution models during runtime without the complex implementation details of the system-level synchronisation and coordination
    corecore