117 research outputs found

    Design and Programming Methods for Reconfigurable Multi-Core Architectures using a Network-on-Chip-Centric Approach

    Get PDF
    A current trend in the semiconductor industry is the use of Multi-Processor Systems-on-Chip (MPSoCs) for a wide variety of applications such as image processing, automotive, multimedia, and robotic systems. Most applications gain performance advantages by executing parallel tasks on multiple processors due to the inherent parallelism. Moreover, heterogeneous structures provide high performance/energy efficiency, since application-specific processing elements (PEs) can be exploited. The increasing number of heterogeneous PEs leads to challenging communication requirements. To overcome this challenge, Networks-on-Chip (NoCs) have emerged as scalable on-chip interconnect. Nevertheless, NoCs have to deal with many design parameters such as virtual channels, routing algorithms and buffering techniques to fulfill the system requirements. This thesis highly contributes to the state-of-the-art of FPGA-based MPSoCs and NoCs. In the following, the three major contributions are introduced. As a first major contribution, a novel router concept is presented that efficiently utilizes communication times by performing sequences of arithmetic operations on the data that is transferred. The internal input buffers of the routers are exchanged with processing units that are capable of executing operations. Two different architectures of such processing units are presented. The first architecture provides multiply and accumulate operations which are often used in signal processing applications. The second architecture introduced as Application-Specific Instruction Set Routers (ASIRs) contains a processing unit capable of executing any operation and hence, it is not limited to multiply and accumulate operations. An internal processing core located in ASIRs can be developed in C/C++ using high-level synthesis. The second major contribution comprises application and performance explorations of the novel router concept. Models that approximate the achievable speedup and the end-to-end latency of ASIRs are derived and discussed to show the benefits in terms of performance. Furthermore, two applications using an ASIR-based MPSoC are implemented and evaluated on a Xilinx Zynq SoC. The first application is an image processing algorithm consisting of a Sobel filter, an RGB-to-Grayscale conversion, and a threshold operation. The second application is a system that helps visually impaired people by navigating them through unknown indoor environments. A Light Detection and Ranging (LIDAR) sensor scans the environment, while Inertial Measurement Units (IMUs) measure the orientation of the user to generate an audio signal that makes the distance as well as the orientation of obstacles audible. This application consists of multiple parallel tasks that are mapped to an ASIR-based MPSoC. Both applications show the performance advantages of ASIRs compared to a conventional NoC-based MPSoC. Furthermore, dynamic partial reconfiguration in terms of relocation and security aspects are investigated. The third major contribution refers to development and programming methodologies of NoC-based MPSoCs. A software-defined approach is presented that combines the design and programming of heterogeneous MPSoCs. In addition, a Kahn-Process-Network (KPN) –based model is designed to describe parallel applications for MPSoCs using ASIRs. The KPN-based model is extended to support not only the mapping of tasks to NoC-based MPSoCs but also the mapping to ASIR-based MPSoCs. A static mapping methodology is presented that assigns tasks to ASIRs and processors for a given KPN-model. The impact of external hardware components such as sensors, actuators and accelerators connected to the processors is also discussed which makes the approach of high interest for embedded systems

    Network-on-Chip

    Get PDF
    Limitations of bus-based interconnections related to scalability, latency, bandwidth, and power consumption for supporting the related huge number of on-chip resources result in a communication bottleneck. These challenges can be efficiently addressed with the implementation of a network-on-chip (NoC) system. This book gives a detailed analysis of various on-chip communication architectures and covers different areas of NoCs such as potentials, architecture, technical challenges, optimization, design explorations, and research directions. In addition, it discusses current and future trends that could make an impactful and meaningful contribution to the research and design of on-chip communications and NoC systems

    Cost effective technology applied to domotics and smart home energy management systems

    Get PDF
    Premio extraordinario de Trabajo Fin de Máster curso 2019/2020. Máster en Energías Renovables DistribuidasIn this document is presented the state of art for domotics cost effective technologies available on market nowadays, and how to apply them in Smart Home Energy Management Systems (SHEMS) allowing peaks shaving, renewable management and home appliance controls, always in cost effective context in order to be massively applied. Additionally, beyond of SHEMS context, it will be also analysed how to apply this technology in order to increase homes energy efficiency and monitoring of home appliances. Energy management is one of the milestones for distributed renewable energy spread; since renewable energy sources are not time-schedulable, are required control systems capable of the management for exchanging energy between conventional sources (power grid), renewable sources and energy storage sources. With the proposed approach, there is a first block dedicated to show an overview of Smart Home Energy Management Systems (SMHEMS) classical architecture and functional modules of SHEMS; next step is to analyse principles which has allowed some devices to become a cost-effective technology. Once the technology has been analysed, it will be reviewed some specific resources (hardware and software) available on marked for allowing low cost SHEMS. Knowing the “tools” available; it will be shown how to adapt classical SHEMS to cost effective technology. Such way, this document will show some specific applications of SHEMS. Firstly, in a general point of view, comparing the proposed low-cost technology with one of the main existing commercial proposals; and secondly, developing the solution for a specific real case.En este documento se aborda el estado actual de la domótica de bajo coste disponible en el mercado actualmente y cómo aplicarlo en los sistemas inteligentes de gestión energética en la vivienda (SHEMS) permitiendo el recorte de las puntas de demanda, gestión de energías renovables y control de electrodomésticos, siempre en el contexto del bajo coste, con el objetivo de lograr la máxima difusión de los SHEMS. Adicionalmente, más allá del contexto de la tecnología SHEMS, se analizará cómo aplicar esta tecnología para aumentar la eficiencia energética de los hogares y para la supervisión de los electrodomésticos. La gestión energética es uno de los factores principales para lograr la difusión de las energías renovables distribuidas; debido a que las fuentes de energía renovable no pueden ser planificadas, se requieren sistemas de control capaces de gestionar el intercambio de energía entre las fuentes convencionales (red eléctrica de distribución), energías renovables y dispositivos de almacenamiento energético. Bajo esta perspectiva, este documento presenta un primer bloque en el que se exponen las bases de la arquitectura y módulos funcionales de los sistemas inteligentes de gestión energética en la vivienda (SHEMS); el siguiente paso será analizar los principios que han permitido a ciertos dispositivos convertirse en dispositivos de bajo coste. Una vez analizada la tecnología, nos centraremos en los recursos (hardware y software) existentes que permitirán la realización de un SHEMS a bajo coste. Conocidas las “herramientas” a nuestra disposición, se mostrará como adaptar un esquema SHEMS clásico a la tecnología de bajo coste. Primeramente, comparando de modo genérico la tecnología de bajo coste con una de las principales propuestas comerciales de SHEMS, para seguidamente desarrollar la solución de bajo coste a un caso específico real

    Simulador para arquitecturas multiprocesador utilizadas en el sector espacial para apoyar el desarrollo de mecanismos software de tolerancia a fallos

    Get PDF
    La colonización de la Luna y Marte o la minería espacial son ideas que la humanidad ha ido desarrollando durante décadas pero que hoy en día parecen estar al alcance de la mano gracias al avance tecnológico, la nueva carrera espacial en la que China es cada vez más relevante, y al aumento de interés por parte de gobiernos, agencias y empresas. Parte de estos avances tecnológicos radican en el uso de hardware y software espacial cada vez más potente y versátil. En cualquier misión espacial, el software es un componente fundamental que permite configurar de distintas maneras al sistema y manejar las situaciones excepcionales que se puedan dar. Además, en el entorno espacial tanto el hardware como el software necesitan satisfacer unos requisitos de tolerancia a fallos para evitar en mayor medida los errores producidos por la radiación. La verificación de estos requisitos en sistemas críticos consume cada vez una mayor cantidad de recursos dedicados al desarrollo de los sistemas, sobre todo en sistemas multinúcleo. En este contexto es necesario el uso de nuevas herramientas para el desarrollo y verificación tempranos del software empotrado. Entre estas herramientas se puede incluir la propuesta de este trabajo de tesis, que afronta el problema mediante el uso de técnicas de simulación e inyección de fallos. Dadas las restricciones temporales en el desarrollo de los sistemas embarcados y los estrictos requisitos de robustez del software espacial, es necesario realizar esta verificación en etapas muy tempranas del desarrollo. En un caso ideal, estas tareas serian desempeñadas de forma paralela al desarrollo hardware, permitiendo anticipar discrepancias y problemas en la especificación del sistema. Cabe resaltar que la verificación de los mecanismos de tolerancia a fallos software puede ser difícil o imposible de realizar en el hardware, dado que las técnicas de inyección de fallos hardware limitan la reproducibilidad de escenarios concretos asi como que los escenarios sean observables y con inyección de fallos controlable y transparente. En esta tesis se describe la investigación, desarrollo y uso de la plataforma virtual ``LeonViP-MC'', con capacidad de inyección de fallos y que utiliza traducción dinámica binaria (Dynamic Binary Translation) mediante LLVM y corrutinas para la simulación de sistemas multinúcleo. Gracias a esta plataforma se puede ejecutar el mismo binario a ejecutar en el hardware real, pero en un entorno controlado y determinista. Esto ha permitido la realización de distintas campanas de inyección de fallos que no serían viables de otra manera. Su utilización ha permitido demostrar la fiabilidad de las técnicas de tolerancia a fallos implementadas tanto en el software de arranque de la Unidad de Control del Instrumento (ICU) del Detector de Partículas Energéticas (EPD) embarcado en Solar Orbiter asi como de una aplicación basada en un canal de comunicaciones ARINC 653 que forma parte del desarrollo de un futuro hipervisor para el sistema multinúcleo GR740

    Computer Aided Verification

    Get PDF
    This open access two-volume set LNCS 10980 and 10981 constitutes the refereed proceedings of the 30th International Conference on Computer Aided Verification, CAV 2018, held in Oxford, UK, in July 2018. The 52 full and 13 tool papers presented together with 3 invited papers and 2 tutorials were carefully reviewed and selected from 215 submissions. The papers cover a wide range of topics and techniques, from algorithmic and logical foundations of verification to practical applications in distributed, networked, cyber-physical, and autonomous systems. They are organized in topical sections on model checking, program analysis using polyhedra, synthesis, learning, runtime verification, hybrid and timed systems, tools, probabilistic systems, static analysis, theory and security, SAT, SMT and decisions procedures, concurrency, and CPS, hardware, industrial applications

    CIRCUITS AND ARCHITECTURE FOR BIO-INSPIRED AI ACCELERATORS

    Get PDF
    Technological advances in microelectronics envisioned through Moore’s law have led to powerful processors that can handle complex and computationally intensive tasks. Nonetheless, these advancements through technology scaling have come at an unfavorable cost of significantly larger power consumption, which has posed challenges for data processing centers and computers at scale. Moreover, with the emergence of mobile computing platforms constrained by power and bandwidth for distributed computing, the necessity for more energy-efficient scalable local processing has become more significant. Unconventional Compute-in-Memory architectures such as the analog winner-takes-all associative-memory and the Charge-Injection Device processor have been proposed as alternatives. Unconventional charge-based computation has been employed for neural network accelerators in the past, where impressive energy efficiency per operation has been attained in 1-bit vector-vector multiplications, and in recent work, multi-bit vector-vector multiplications. In the latter, computation was carried out by counting quanta of charge at the thermal noise limit, using packets of about 1000 electrons. These systems are neither analog nor digital in the traditional sense but employ mixed-signal circuits to count the packets of charge and hence we call them Quasi-Digital. By amortizing the energy costs of the mixed-signal encoding/decoding over compute-vectors with many elements, high energy efficiencies can be achieved. In this dissertation, I present a design framework for AI accelerators using scalable compute-in-memory architectures. On the device level, two primitive elements are designed and characterized as target computational technologies: (i) a multilevel non-volatile cell and (ii) a pseudo Dynamic Random-Access Memory (pseudo-DRAM) bit-cell. At the level of circuit description, compute-in-memory crossbars and mixed-signal circuits were designed, allowing seamless connectivity to digital controllers. At the level of data representation, both binary and stochastic-unary coding are used to compute Vector-Vector Multiplications (VMMs) at the array level. Finally, on the architectural level, two AI accelerator for data-center processing and edge computing are discussed. Both designs are scalable multi-core Systems-on-Chip (SoCs), where vector-processor arrays are tiled on a 2-layer Network-on-Chip (NoC), enabling neighbor communication and flexible compute vs. memory trade-off. General purpose Arm/RISCV co-processors provide adequate bootstrapping and system-housekeeping and a high-speed interface fabric facilitates Input/Output to main memory

    SpiNNaker - A Spiking Neural Network Architecture

    Get PDF
    20 years in conception and 15 in construction, the SpiNNaker project has delivered the world’s largest neuromorphic computing platform incorporating over a million ARM mobile phone processors and capable of modelling spiking neural networks of the scale of a mouse brain in biological real time. This machine, hosted at the University of Manchester in the UK, is freely available under the auspices of the EU Flagship Human Brain Project. This book tells the story of the origins of the machine, its development and its deployment, and the immense software development effort that has gone into making it openly available and accessible to researchers and students the world over. It also presents exemplar applications from ‘Talk’, a SpiNNaker-controlled robotic exhibit at the Manchester Art Gallery as part of ‘The Imitation Game’, a set of works commissioned in 2016 in honour of Alan Turing, through to a way to solve hard computing problems using stochastic neural networks. The book concludes with a look to the future, and the SpiNNaker-2 machine which is yet to come

    Network-on-Chip Topologies: Potentials, Technical Challenges, Recent Advances and Research Direction

    Get PDF
    Integration technology advancement has impacted the System-on-Chip (SoC) in which heterogeneous cores are supported on a single chip. Based on the huge amount of supported heterogeneous cores, efficient communication between the associated processors has to be considered at all levels of the system design to ensure global interconnection. This can be achieved through a design-friendly, flexible, scalable, and high-performance interconnection architecture. It is noteworthy that the interconnections between multiple cores on a chip present a considerable influence on the performance and communication of the chip design regarding the throughput, end-to-end delay, and packets loss ratio. Although hierarchical architectures have addressed the majority of the associated challenges of the traditional interconnection techniques, the main limiting factor is scalability. Network-on-Chip (NoC) has been presented as a scalable and well-structured alternative solution that is capable of addressing communication issues in the on-chip systems. In this context, several NoC topologies have been presented to support various routing techniques and attend to different chip architectural requirements. This book chapter reviews some of the existing NoC topologies and their associated characteristics. Also, application mapping algorithms and some key challenges of NoC are considered

    Low-Power Embedded Design Solutions and Low-Latency On-Chip Interconnect Architecture for System-On-Chip Design

    Get PDF
    This dissertation presents three design solutions to support several key system-on-chip (SoC) issues to achieve low-power and high performance. These are: 1) joint source and channel decoding (JSCD) schemes for low-power SoCs used in portable multimedia systems, 2) efficient on-chip interconnect architecture for massive multimedia data streaming on multiprocessor SoCs (MPSoCs), and 3) data processing architecture for low-power SoCs in distributed sensor network (DSS) systems and its implementation. The first part includes a low-power embedded low density parity check code (LDPC) - H.264 joint decoding architecture to lower the baseband energy consumption of a channel decoder using joint source decoding and dynamic voltage and frequency scaling (DVFS). A low-power multiple-input multiple-output (MIMO) and H.264 video joint detector/decoder design that minimizes energy for portable, wireless embedded systems is also designed. In the second part, a link-level quality of service (QoS) scheme using unequal error protection (UEP) for low-power network-on-chip (NoC) and low latency on-chip network designs for MPSoCs is proposed. This part contains WaveSync, a low-latency focused network-on-chip architecture for globally-asynchronous locally-synchronous (GALS) designs and a simultaneous dual-path routing (SDPR) scheme utilizing path diversity present in typical mesh topology network-on-chips. SDPR is akin to having a higher link width but without the significant hardware overhead associated with simple bus width scaling. The last part shows data processing unit designs for embedded SoCs. We propose a data processing and control logic design for a new radiation detection sensor system generating data at or above Peta-bits-per-second level. Implementation results show that the intended clock rate is achieved within the power target of less than 200mW. We also present a digital signal processing (DSP) accelerator supporting configurable MAC, FFT, FIR, and 3-D cross product operations for embedded SoCs. It consumes 12.35mW along with 0.167mm2 area at 333MHz

    Leveraging Hardware QoS to Control Contention in the Xilinx Zynq UltraScale+ MPSoC

    Get PDF
    The interference co-running tasks generate on each other’s timing behavior continues to be one of the main challenges to be addressed before Multi-Processor System-on-Chip (MPSoCs) are fully embraced in critical systems like those deployed in avionics and automotive domains. Modern MPSoCs like the Xilinx Zynq UltraScale+ incorporate hardware Quality of Service (QoS) mechanisms that can help controlling contention among tasks. Given the distributed nature of modern MPSoCs, the route a request follows from its source (usually a compute element like a CPU) to its target (usually a memory) crosses several QoS points, each one potentially implementing a different QoS mechanism. Mastering QoS mechanisms individually, as well as their combined operation, is pivotal to obtain the expected benefits from the QoS support. In this work, we perform, to our knowledge, the first qualitative and quantitative analysis of the distributed QoS mechanisms in the Xilinx UltraScale+ MPSoC. We empirically derive QoS information not covered by the technical documentation, and show limitations and benefits of the available QoS support. To that end, we use a case study building on neural network kernels commonly used in autonomous systems in different real-time domains.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant PID2019-107255GB; the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 878752 (MASTECS) and the European Research Council (ERC) grant agreement No. 772773 (SuPerCom).Peer ReviewedPostprint (published version
    corecore