    SIMD based multicore processor for image and video processing

    The MANGO FET-HPC Project: an overview

    In this paper, we provide an overview of the MANGO project and its goal. The MANGO project aims at addressing power, performance and predictability (the PPP space) in future High-Performance Computing systems. It starts from the fundamental intuition that effective techniques for all three goals ultimately rely on customization to adapt the computing resources to reach the desired Quality of Service (QoS). From this starting point, MANGO will explore different but interrelated mechanisms at various architectural levels, as well as at the level of the system software. In particular, to explore a new positioning across the PPP space, MANGO will investigate system-wide, holistic, proactive thermal and power management aimed at extreme-scale energy efficiency.The MANGO project starts in October 2015 and is funded by the European Commission under the Horizon 2020 FET-HPC program. This project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No 671668.

    Interconnect design for the edge computing system-on-chip

    Nowadays the majority of system-on-chips are designed by placing various IP blocks such as CPUs, memories and accelerators on the same chip. With the advantage of silicon manufacturing technologies, it has become possible to place hundreds of CPU cores and other design blocks on the same chip. A communication system that transfers data between chip components largely affects overall chip performance, computational speed and response time for external events. Firstly, this thesis studies the main on-chip interconnect design paradigms. According to the presented research, various architectures may be chosen for an interconnect design depending on the required complexity and number of subsystems. The shared and hybrid bus interconnects are one of the oldest means of on-chip communication. They are efficient for small systems with no more than ten IP blocks. The crossbars or bus matrix interconnects can help to build on-chip communication systems which can efficiently interconnect dozens of system-on-chip modules. The networks-on-chip can provide a communication solution for large scale chip designs with hundreds of IP blocks. The second part of this thesis focuses on the novel Ballast chip implementation and its interconnect design. The Ballast is a heterogeneous multiprocessor chip designed for edge computing and general-purpose computing applications. In this thesis Ballast interconnect was designed from scratch by using a cascaded crossbar approach by connecting three open-sourced AXI protocol bus matrices. The designed interconnect allows to efficiently connect 6 bus masters with 9 slaves and provides up to 9,6 GB/s bandwidth for the most productive CPU subsystem

    ACACES 2012 Poster Abstracts

    Parallel Architectures for Many-Core Systems-On-Chip in Deep Sub-Micron Technology

    Despite the several issues faced in the past, the evolutionary trend of silicon has kept its constant pace. Today an ever increasing number of cores is integrated onto the same die. Unfortunately, the extraordinary performance achievable by the many-core paradigm is limited by several factors. Memory bandwidth limitation, combined with inefficient synchronization mechanisms, can severely overcome the potential computation capabilities. Moreover, the huge HW/SW design space requires accurate and flexible tools to perform architectural explorations and validation of design choices. In this thesis we focus on the aforementioned aspects: a flexible and accurate Virtual Platform has been developed, targeting a reference many-core architecture. Such tool has been used to perform architectural explorations, focusing on instruction caching architecture and hybrid HW/SW synchronization mechanism. Beside architectural implications, another issue of embedded systems is considered: energy efficiency. Near Threshold Computing is a key research area in the Ultra-Low-Power domain, as it promises a tenfold improvement in energy efficiency compared to super-threshold operation and it mitigates thermal bottlenecks. The physical implications of modern deep sub-micron technology are severely limiting performance and reliability of modern designs. Reliability becomes a major obstacle when operating in NTC, especially memory operation becomes unreliable and can compromise system correctness. In the present work a novel hybrid memory architecture is devised to overcome reliability issues and at the same time improve energy efficiency by means of aggressive voltage scaling when allowed by workload requirements. Variability is another great drawback of near-threshold operation. The greatly increased sensitivity to threshold voltage variations in today a major concern for electronic devices. We introduce a variation-tolerant extension of the baseline many-core architecture. By means of micro-architectural knobs and a lightweight runtime control unit, the baseline architecture becomes dynamically tolerant to variations