444 research outputs found

    Effective power saving method by on-chip traffic compression in noc-based embedded systems

    Full text link
    [EN] of components, relying on an efficient on-chip network (network-on-chip; NoC). As the size of the system increases, NoC performance and power consumption become a central issue. In this project, we design compression strategies at the NoC level reducing the number of transmitted flits and consequently the energy consumed. The provided mechanism relies on the abundance of memory data blocks filled with zeros in the analysed applications, thus easily compressible by using a zero-elimination strategy. We provide a hardware implementation for both compression and decompression end points at a generic network interface (NI). The mechanisms have been designed in isolated mode in order to make them modular and easily adapted to any NI protocol. Results show the effectiveness of the compression and decompression mechanisms and the low overhead they introduce. The percentage of traffic reduced by the compression strategy (it is reduced by a factor of 3) justifies the added resources. This work reflects some parts of the main research directions we tackle in the wider PhD framework. In particular, we propose a method for power efficient memory traffic management. The work presented here represents the initial research directions in simulation development, traffic pattern characterization and initial solutions development[ES] Con los avances de la tecnología, los sistemas en chip multiprocesador (MPSoC) aumentan en número de componentes, apoyándose en una red en el chip (NoC) eficiente. Según crece el tamaño de estos sistemas, la eficiencia de la red tanto temporal como energética se convierte en una parte primordial. En este proyecto diseñamos estrategias de compresión a nivel de red (en la NoC) reduciendo el número de flits transmitidos y por tanto la energía consumida. El método propuesto se basa en la abundancia de bloques de memoria con largas cadenas de ceros que se detectaron en las aplicaciones analizadas. Esta abundancia de ceros facilita la compresión mediante estrategias de eliminación de ceros. Ofrecemos una implementación hardware tanto de la parte de compresión como de la de descompresión sobre un interfaz de red (NI) genérico. Los mecanismos propuestos han sido diseñados de forma aislada para hacerlos modulares y fácilmente adaptables a cualquier protocolo de NI. Los resultados muestran la efectividad de los mecanismos de compresión y descompresión y la escasa penalización que introducen. El porcentaje de tráfico reducido mediante la estrategia de compresión (se reduce con un factor de 3) justifica los recursos extra requeridos. Este trabajo refleja parte de la línea de investigación global que se pretende abordar en el marco más amplio de un doctorado. En particular proponemos un método de gestión del tráfico de memoria energéticamente eficiente. El trabajo presentado aquí representa pues una primera aproximación a la investigación realizando un desarrollo parcial del simulador, caracterización de patrones de tráfico y el desarrollo de una solución parcialSoler Heredia, M. (2013). Effective power saving method by on-chip traffic compression in noc-based embedded systems. http://hdl.handle.net/10251/43774Archivo delegad

    DeSyRe: on-Demand System Reliability

    No full text
    The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints

    Efficient Interconnection Network Design for Heterogeneous Architectures

    Get PDF
    The onset of big data and deep learning applications, mixed with conventional general-purpose programs, have driven computer architecture to embrace heterogeneity with specialization. With the ever-increasing interconnected chip components, future architectures are required to operate under a stricter power budget and process emerging big data applications efficiently. Interconnection network as the communication backbone thus is facing the grand challenges of limited power envelope, data movement and performance scaling. This dissertation provides interconnect solutions that are specialized to application requirements towards power-/energy-efficient and high-performance computing for heterogeneous architectures. This dissertation examines the challenges of network-on-chip router power-gating techniques for general-purpose workloads to save static power. A voting approach is proposed as an adaptive power-gating policy that considers both local and global traffic status through router voting. In addition, low-latency routing algorithms are designed to guarantee performance in irregular power-gating networks. This holistic solution not only saves power but also avoids performance overhead. This research also introduces emerging computation paradigms to interconnects for big data applications to mitigate the pressure of data movement. Approximate network-on-chip is proposed to achieve high-throughput communication by means of lossy compression. Then, near-data processing is combined with in-network computing to further improve performance while reducing data movement. The two schemes are general to play as plug-ins for different network topologies and routing algorithms. To tackle the challenging computational requirements of deep learning workloads, this dissertation investigates the compelling opportunities of communication algorithm-architecture co-design to accelerate distributed deep learning. MultiTree allreduce algorithm is proposed to bond with message scheduling with network topology to achieve faster and contention-free communication. In addition, the interconnect hardware and flow control are also specialized to exploit deep learning communication characteristics and fulfill the algorithm needs, thereby effectively improving the performance and scalability. By considering application and algorithm characteristics, this research shows that interconnection network can be tailored accordingly to improve the power-/energy-efficiency and performance to satisfy heterogeneous computation and communication requirements

    Homogeneous and heterogeneous MPSoC architectures with network-on-chip connectivity for low-power and real-time multimedia signal processing

    Get PDF
    Two multiprocessor system-on-chip (MPSoC) architectures are proposed and compared in the paper with reference to audio and video processing applications. One architecture exploits a homogeneous topology; it consists of 8 identical tiles, each made of a 32-bit RISC core enhanced by a 64-bit DSP coprocessor with local memory. The other MPSoC architecture exploits a heterogeneous-tile topology with on-chip distributed memory resources; the tiles act as application specific processors supporting a different class of algorithms. In both architectures, the multiple tiles are interconnected by a network-on-chip (NoC) infrastructure, through network interfaces and routers, which allows parallel operations of the multiple tiles. The functional performances and the implementation complexity of the NoC-based MPSoC architectures are assessed by synthesis results in submicron CMOS technology. Among the large set of supported algorithms, two case studies are considered: the real-time implementation of an H.264/MPEG AVC video codec and of a low-distortion digital audio amplifier. The heterogeneous architecture ensures a higher power efficiency and a smaller area occupation and is more suited for low-power multimedia processing, such as in mobile devices. The homogeneous scheme allows for a higher flexibility and easier system scalability and is more suited for general-purpose DSP tasks in power-supplied devices
    corecore