40 research outputs found

    QoS-Driven Reconfigurable Parallel Computing for NoC-Based Clustered MPSoCs

    Get PDF
    Reconfigurable parallel computing is required to provide high-performance embedded computing, hide hardware complexity, boost software development, and manage multiple workloads when multiple applications are running simultaneously on the emerging network-on-chip (NoC)-based multiprocessor systems-on-chip (MPSoCs) platforms. In these type of systems, the overall system performance may be affected due to congestion, and therefore parallel programming stacks must be assisted by quality-of-service (QoS) support to meet application requirements and to deal with application dynamism. In this paper, we present a hardware-software QoS-driven reconfigurable parallel computing framework, i.e., the NoC services, the runtime QoS middleware API and our ocMPI library and its tracing support which has been tailored for a distributed-shared memory ARM clustered NoC-based MPSoC platform. The experimental results show the efficiency of our software stack under a broad range of parallel kernels and benchmarks, in terms of low-latency interprocess communication, good application scalability, and most important, they demonstrate the ability to enable runtime reconfiguration to manage workloads in message-passing parallel applications

    Improving time predictability of shared hardware resources in real-time multicore systems : emphasis on the space domain

    Get PDF
    Critical Real-Time Embedded Systems (CRTES) follow a verification and validation process on the timing and functional correctness. This process includes the timing analysis that provides Worst-Case Execution Time (WCET) estimates to provide evidence that the execution time of the system, or parts of it, remain within the deadlines. A key design principle for CRTES is the incremental qualification, whereby each software component can be subject to verification and validation independently of any other component, with obvious benefits for cost. At timing level, this requires time composability, such that the timing behavior of a function is not affected by other functions. CRTES are experiencing an unprecedented growth with rising performance demands that have motivated the use of multicore architectures. Multicores can provide the performance required and bring the potential of integrating several software functions onto the same hardware. However, multicore contention in the access to shared hardware resources creates a dependence of the execution time of a task with the rest of the tasks running simultaneously. This dependence threatens time predictability and jeopardizes time composability. In this thesis we analyze and propose hardware solutions to be applied on current multicore designs for CRTES to improve time predictability and time composability, focusing on the on-chip bus and the memory controller. At hardware level, we propose new bus and memory controller designs that control and mitigate contention between different cores and allow to have time composability by design, also in the context of mixed-criticality systems. At analysis level, we propose contention prediction models that factor the impact of contenders and don¿t need modifications to the hardware. We also propose a set of Performance Monitoring Counters (PMC) that provide evidence about the contention. We give an special emphasis on the Space domain focusing on the Cobham Gaisler NGMP multicore processor, which is currently assessed by the European Space Agency for its future missions.Los Sistemas Críticos Empotrados de Tiempo Real (CRTES) siguen un proceso de verificación y validación para su correctitud funcional y temporal. Este proceso incluye el análisis temporal que proporciona estimaciones de el peor caso del tiempo de ejecución (WCET) para dar evidencia de que el tiempo de ejecución del sistema, o partes de él, permanecen dentro de los límites temporales. Un principio de diseño clave para los CRTES es la cualificación incremental, por la que cada componente de software puede ser verificado y validado independientemente del resto de componentes, con beneficios obvios para el coste. A nivel temporal, esto requiere composabilidad temporal, por la que el comportamiento temporal de una función no se ve afectado por otras funciones. CRTES están experimentando un crecimiento sin precedentes con crecientes demandas de rendimiento que han motivado el uso the arquitecturas multi-núcleo (multicore). Los procesadores multi-núcleo pueden proporcionar el rendimiento requerido y tienen el potencial de integrar varias funcionalidades software en el mismo hardware. A pesar de ello, la interferencia entre los diferentes núcleos que aparece en los recursos compartidos de os procesadores multi núcleo crea una dependencia del tiempo de ejecución de una tarea con el resto de tareas ejecutándose simultáneamente en el procesador. Esta dependencia amenaza la predictabilidad temporal y compromete la composabilidad temporal. En esta tésis analizamos y proponemos soluciones hardware para ser aplicadas en los diseños multi núcleo actuales para CRTES que mejoran la predictabilidad y composabilidad temporal, centrándose en el bus y el controlador de memoria internos al chip. A nivel de hardware, proponemos nuevos diseños de buses y controladores de memoria que controlan y mitigan la interferencia entre los diferentes núcleos y permiten tener composabilidad temporal por diseño, también en el contexto de sistemas de criticalidad mixta. A nivel de análisis, proponemos modelos de predicción de la interferencia que factorizan el impacto de los núcleos y no necesitan modificaciones hardware. También proponemos un conjunto de Contadores de Control del Rendimiento (PMC) que proporcionoan evidencia de la interferencia. En esta tésis, damós especial importancia al dominio espacial, centrándonos en el procesador mutli núcleo Cobham Gaisler NGMP, que está siendo actualmente evaluado por la Agencia Espacial Europea para sus futuras misiones

    Automatic Layer-Based Generation of System-On-Chip Bus Communication Models

    Full text link

    Integration and validation of embedded flight software on space-qualified multicore architectures

    Get PDF
    In the recent decades, the importance of software on space missions has notably increased, reflecting the need to integrate advanced on-board functionalities. With multicore processors being lately introduced to host critical high-performance applications, the complexity to validate software has significantly raised with respect to single core architectures. While there has been a big step forward in avionics after the publication of the CAST-32A paper, the ECSS-E-ST-40C software engineering standard used by the European Space Agency (ESA) is still not providing validation support for multicore processors. Hence, it is expected that standardising guidelines to develop software on such platforms will become a recurring topic in the industry to match the demands of future space exploration missions

    Improving the Scalability of High Performance Computer Systems

    Full text link
    Improving the performance of future computing systems will be based upon the ability of increasing the scalability of current technology. New paths need to be explored, as operating principles that were applied up to now are becoming irrelevant for upcoming computer architectures. It appears that scaling the number of cores, processors and nodes within an system represents the only feasible alternative to achieve Exascale performance. To accomplish this goal, we propose three novel techniques addressing different layers of computer systems. The Tightly Coupled Cluster technique significantly improves the communication for inter node communication within compute clusters. By improving the latency by an order of magnitude over existing solutions the cost of communication is considerably reduced. This enables to exploit fine grain parallelism within applications, thereby, extending the scalability considerably. The mechanism virtually moves the network interconnect into the processor, bypassing the latency of the I/O interface and rendering protocol conversions unnecessary. The technique is implemented entirely through firmware and kernel layer software utilizing off-the-shelf AMD processors. We present a proof-of-concept implementation and real world benchmarks to demonstrate the superior performance of our technique. In particular, our approach achieves a software-to-software communication latency of 240 ns between two remote compute nodes. The second part of the dissertation introduces a new framework for scalable Networks-on-Chip. A novel rapid prototyping methodology is proposed, that accelerates the design and implementation substantially. Due to its flexibility and modularity a large application space is covered ranging from Systems-on-chip, to high performance many-core processors. The Network-on-Chip compiler enables to generate complex networks in the form of synthesizable register transfer level code from an abstract design description. Our engine supports different target technologies including Field Programmable Gate Arrays and Application Specific Integrated Circuits. The framework enables to build large designs while minimizing development and verification efforts. Many topologies and routing algorithms are supported by partitioning the tasks into several layers and by the introduction of a protocol agnostic architecture. We provide a thorough evaluation of the design that shows excellent results regarding performance and scalability. The third part of the dissertation addresses the Processor-Memory Interface within computer architectures. The increasing compute power of many-core processors, leads to an equally growing demand for more memory bandwidth and capacity. Current processor designs exhibit physical limitations that restrict the scalability of main memory. To address this issue we propose a memory extension technique that attaches large amounts of DRAM memory to the processor via a low pin count interface using high speed serial transceivers. Our technique transparently integrates the extension memory into the system architecture by providing full cache coherency. Therefore, applications can utilize the memory extension by applying regular shared memory programming techniques. By supporting daisy chained memory extension devices and by introducing the asymmetric probing approach, the proposed mechanism ensures high scalability. We furthermore propose a DMA offloading technique to improve the performance of the processor memory interface. The design has been implemented in a Field Programmable Gate Array based prototype. Driver software and firmware modifications have been developed to bring up the prototype in a Linux based system. We show microbenchmarks that prove the feasibility of our design

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues

    A sensor node soC architecture for extremely autonomous wireless sensor networks

    Get PDF
    Tese de Doutoramento em Engenharia Eletrónica e de Computadores (PDEEC) (especialidade em Informática Industrial e Sistemas Embebidos)The Internet of Things (IoT) is revolutionizing the Internet of the future and the way new smart objects and people are being connected into the world. Its pervasive computing and communication technologies connect myriads of smart devices, presented at our everyday things and surrounding objects. Big players in the industry forecast, by 2020, around 50 billion of smart devices connected in a multitude of scenarios and heterogeneous applications, sharing data over a true worldwide network. This will represent a trillion dollar market that everyone wants to take a share. In a world where everything is being connected, device security and device interoperability are a paramount. From the sensor to the cloud, this triggers several technological issues towards connectivity, interoperability and security requirements on IoT devices. However, fulfilling such requirements is not straightforward. While the connectivity exposes the device to the Internet, which also raises several security issues, deploying a standardized communication stack on the endpoint device in the network edge, highly increases the data exchanged over the network. Moreover, handling such ever-growing amount of data on resource-constrained devices, truly affects the performance and the energy consumption. Addressing such issues requires new technological and architectural approaches to help find solutions to leverage an accelerated, secure and energy-aware IoT end-device communication. Throughout this thesis, the developed artifacts triggered the achievement of important findings that demonstrate: (1) how heterogeneous architectures are nowadays a perfect solution to deploy endpoint devices in scenarios where not only (heavy processing) application-specific operations are required, but also network-related capabilities are major concerns; (2) how accelerating network-related tasks result in a more efficient device resources utilization, which combining better performance and increased availability, contributed to an improved overall energy utilization; (3) how device and data security can benefit from modern heterogeneous architectures that rely on secure hardware platforms, which are also able to provide security-related acceleration hardware; (4) how a domain-specific language eases the co-design and customization of a secure and accelerated IoT endpoint device at the network edge.Internet of Things (IoT) é o conceito que está a revolucionar a Internet do futuro e a forma como coisas, processos e pessoas se conectam e se relacionam numa infraestrutura de rede global que interligará, num futuro próximo, um vasto número de dispositivos inteligentes e de utilização diária. Com uma grande aposta no mercado IoT por parte dos grandes líderes na industria, algumas visões otimistas preveem para 2020 mais de 50 mil milhões de dispositivos ligados na periferia da rede, partilhando grandes volumes de dados importantes através da Internet, representando um mercado multimilionário com imensas oportunidades de negócio. Num mundo interligado de dispositivos, a interoperabilidade e a segurança é uma preocupação crescente. Tal preocupação exige inúmeros esforços na exploração de novas soluções, quer a nível tecnológico quer a nível arquitetural, que visem impulsionar o desenvolvimento de dispositivos embebidos com maiores capacidades de desempenho, segurança e eficiência energética, não só apenas do dispositivo em si, mas também das camadas e protocolos de rede associados. Apesar da integração de pilhas de comunicação e de protocolos standard das camadas de rede solucionar problemas associados à conectividade e a interoperabilidade, adiciona a sobrecarga inerente dos protocolos de comunicação e do crescente volume de dados partilhados entre os dispositivos e a Internet, afetando severamente o desempenho e a disponibilidade do mesmo, refletindo-se num maior consumo energético global. As soluções apresentadas nesta tese permitiram obter resultados que demonstram: (1) a viabilidade de soluções heterogéneas no desenvolvimento de dispositivos IoT, onde não só tarefas inerentes à aplicação podem ser aceleradas, mas também tarefas relacionadas com a comunicação do dispositivo; (2) os benefícios da aceleração de tarefas e protocolos da pilha de rede, que se traduz num melhor desempenho do dispositivo e aumento da disponibilidade do mesmo, contribuindo para uma melhor eficiência energética; (3) que plataformas de hardware modernas oferecem mecanismos de segurança que podem ser utilizados não apenas em prol da segurança do dispositivo, mas também nas capacidades de comunicação do mesmo; (4) que o desenvolvimento de uma linguagem de domínio específico permite de forma mais eficaz e eficiente o desenvolvimento e configuração de dispositivos IoT inteligentes.This thesis was supported by a PhD scholarship from Fundação para a Ciência e Tecnologia, SFRH/BD/90162/201

    Embedded System Design

    Get PDF
    A unique feature of this open access textbook is to provide a comprehensive introduction to the fundamental knowledge in embedded systems, with applications in cyber-physical systems and the Internet of things. It starts with an introduction to the field and a survey of specification models and languages for embedded and cyber-physical systems. It provides a brief overview of hardware devices used for such systems and presents the essentials of system software for embedded systems, including real-time operating systems. The author also discusses evaluation and validation techniques for embedded systems and provides an overview of techniques for mapping applications to execution platforms, including multi-core platforms. Embedded systems have to operate under tight constraints and, hence, the book also contains a selected set of optimization techniques, including software optimization techniques. The book closes with a brief survey on testing. This fourth edition has been updated and revised to reflect new trends and technologies, such as the importance of cyber-physical systems (CPS) and the Internet of things (IoT), the evolution of single-core processors to multi-core processors, and the increased importance of energy efficiency and thermal issues
    corecore