58,802 research outputs found
A DVFS Cycle Accurate Simulation Framework with Asynchronous NoC Design for Power-Performance Optimizations
Network-on-Chip (NoC) is a flexible and scalable solution to interconnect multi-cores, with a strong influence on the performance of the whole chip. On-chip network affects also the overall power consumption, thus requiring accurate early-stage estimation and optimization methodologies. In this scenario, the Dynamic Voltage Frequency Scaling (DVFS) technique have been proposed both for CPUs and NoCs. The promise is to be a flexible and scalable way to jointly optimize power-performance, addressing both static and dynamic power sources. Being simulation a de-facto prime solution to explore novel multi-core architectures, a reliable full system analysis requires to integrate in the toolchain accurate timing and power models for the DVFS block and for the resynchronization logic between different Voltage and Frequency Islands (VFIs). In such a way, a more accurate validation of novel optimization methodologies which exploit such actuator is possible, since both architectural and actuator overheads are considered at the same time. This work proposes a complete cycle accurate framework for multi-core design supporting Global Asynchronous Local Synchronous (GALS) NoC design and DVFS actuators for the NoC. Furthermore, static and dynamic frequency assignment is possible with or without the use of the voltage regulator. The proposed framework sits on accurate analytical timing model and SPICE-based power measures, providing accurate estimates of both timing and power overheads of the power control mechanisms
FDMA Enabled Phase-based Wireless Network-on-Chip using Graphene-based THz-band Antennas
The future growth in System-on-chip design is moving in the direction of multicore systems. Design of efficient interconnects between cores are crucial for improving the performance of a multicore processor. Such trends are seen due to the benefits the multicore systems provide in terms of power reduction and scalability. Network-on-chips (NoC) are viewed as an emerging solution in the design of interconnects in multicore systems. However, Traditional Network-on-chip architectures are no longer able to satisfy the performance requirements due to long distance communication over multi-hop wireline paths. Multi-hop communication leads to higher energy consumption, increase in latency and reduction in bandwidth. Research in recent years has explored emerging technologies such as 3D integration, photonic and radio frequency based Network-on-chips. The use of wireless interconnects using mm-wave antennas are able to alleviate the performance issues in a wireline interconnect system. However, to satisfy the increasing demand for higher bandwidth and lower energy consumption, Wireless Network-on-Chip enabled with high speed direct links operating in THz band between distant cores is desired. Recent research has brought to light highly efficient graphene-based antennas operating in THz band. These antennas can provide high data rate and are found to consume less power with low area overheads.
In this thesis, an innovative approach using novel devices based on graphene structures is proposed to provide a high-performance on-chip interconnection. This novel approach combines the regular NoC structure with the proposed wireless infrastructure to exploit the performance benefits. An architecture with wireless interfaces on every core is explored in this work. Simultaneous multiple communications in a network can be achieved by adopting Frequency Division Multiple access (FDMA). However, in a system where all cores are equipped with a wireless interface, FDMA requires more number of frequency bands. This becomes difficult to achieve as the system scales and the number of cores increase. Therefore, a FDMA protocol along with a 4-phased repetitive multi-band architecture is envisioned in this work. The phase-based protocol allows multiple wireless links to be active at a time, the phase-based protocol along with the FDMA protocol provides a reliable data transfer between cores with lesser number of frequency bands. In this thesis, an architecture with a combination of FDMA and phase-based protocol using point-to-point graphene-based wireless links is proposed. The proposed architecture is also extended for a multichip system. With cycle accurate system-level simulations, it is shown that the proposed architecture provides huge gains in performance and energy-efficiency in data transfer both in NoC based multicore and multichip systems
DeSyRe: on-Demand System Reliability
The DeSyRe project builds on-demand adaptive and reliable Systems-on-Chips (SoCs). As fabrication technology scales down, chips are becoming less reliable, thereby incurring increased power and performance costs for fault tolerance. To make matters worse, power density is becoming a significant limiting factor in SoC design, in general. In the face of such changes in the technological landscape, current solutions for fault tolerance are expected to introduce excessive overheads in future systems. Moreover, attempting to design and manufacture a totally defect and fault-free system, would impact heavily, even prohibitively, the design, manufacturing, and testing costs, as well as the system performance and power consumption. In this context, DeSyRe delivers a new generation of systems that are reliable by design at well-balanced power, performance, and design costs. In our attempt to reduce the overheads of fault-tolerance, only a small fraction of the chip is built to be fault-free. This fault-free part is then employed to manage the remaining fault-prone resources of the SoC. The DeSyRe framework is applied to two medical systems with high safety requirements (measured using the IEC 61508 functional safety standard) and tight power and performance constraints
Reliable and Energy Efficient MLC STT-RAM Buffer for CNN Accelerators
We propose a lightweight scheme where the formation of a data block is changed in such a way that it can tolerate soft errors significantly better than the baseline. The key insight behind our work is that CNN weights are normalized between -1 and 1 after each convolutional layer, and this leaves one bit unused in half-precision floating-point representation. By taking advantage of the unused bit, we create a backup for the most significant bit to protect it against the soft errors. Also, considering the fact that in MLC STT-RAMs the cost of memory operations (read and write), and reliability of a cell are content-dependent (some patterns take larger current and longer time, while they are more susceptible to soft error), we rearrange the data block to minimize the number of costly bit patterns. Combining these two techniques provides the same level of accuracy compared to an error-free baseline while improving the read and write energy by 9% and 6%, respectively
GRAPE-6: The massively-parallel special-purpose computer for astrophysical particle simulation
In this paper, we describe the architecture and performance of the GRAPE-6
system, a massively-parallel special-purpose computer for astrophysical
-body simulations. GRAPE-6 is the successor of GRAPE-4, which was completed
in 1995 and achieved the theoretical peak speed of 1.08 Tflops. As was the case
with GRAPE-4, the primary application of GRAPE-6 is simulation of collisional
systems, though it can be used for collisionless systems. The main differences
between GRAPE-4 and GRAPE-6 are (a) The processor chip of GRAPE-6 integrates 6
force-calculation pipelines, compared to one pipeline of GRAPE-4 (which needed
3 clock cycles to calculate one interaction), (b) the clock speed is increased
from 32 to 90 MHz, and (c) the total number of processor chips is increased
from 1728 to 2048. These improvements resulted in the peak speed of 64 Tflops.
We also discuss the design of the successor of GRAPE-6.Comment: Accepted for publication in PASJ, scheduled to appear in Vol. 55, No.
Photonic integration enabling new multiplexing concepts in optical board-to-board and rack-to-rack interconnects
New broadband applications are causing the datacenters to proliferate, raising the bar for higher interconnection speeds. So far, optical board-to-board and rack-to-rack interconnects relied primarily on low-cost commodity optical components assembled in a single package. Although this concept proved successful in the first generations of optical-interconnect modules, scalability is a daunting issue as signaling rates extend beyond 25 Gb/s. In this paper we present our work towards the development of two technology platforms for migration beyond Infiniband enhanced data rate (EDR), introducing new concepts in board-to-board and rack-to-rack interconnects.
The first platform is developed in the framework of MIRAGE European project and relies on proven VCSEL technology, exploiting the inherent cost, yield, reliability and power consumption advantages of VCSELs. Wavelength multiplexing, PAM-4 modulation and multi-core fiber (MCF) multiplexing are introduced by combining VCSELs with integrated Si and glass photonics as well as BiCMOS electronics. An in-plane MCF-to-SOI interface is demonstrated, allowing coupling from the MCF cores to 340x400 nm Si waveguides. Development of a low-power VCSEL driver with integrated feed-forward equalizer is reported, allowing PAM-4 modulation of a bandwidth-limited VCSEL beyond 25 Gbaud.
The second platform, developed within the frames of the European project PHOXTROT, considers the use of modulation formats of increased complexity in the context of optical interconnects. Powered by the evolution of DSP technology and towards an integration path between inter and intra datacenter traffic, this platform investigates optical interconnection system concepts capable to support 16QAM 40GBd data traffic, exploiting the advancements of silicon and polymer technologies
On-Line Dependability Enhancement of Multiprocessor SoCs by Resource Management
This paper describes a new approach towards dependable design of homogeneous multi-processor SoCs in an example satellite-navigation application. First, the NoC dependability is functionally verified via embedded software. Then the Xentium processor tiles are periodically verified via on-line self-testing techniques, by using a new IIP Dependability Manager. Based on the Dependability Manager results, faulty tiles are electronically excluded and replaced by fault-free spare tiles via on-line resource management. This integrated approach enables fast electronic fault detection/diagnosis and repair, and hence a high system availability. The dependability application runs in parallel with the actual application, resulting in a very dependable system. All parts have been verified by simulation
Memory and information processing in neuromorphic systems
A striking difference between brain-inspired neuromorphic processors and
current von Neumann processors architectures is the way in which memory and
processing is organized. As Information and Communication Technologies continue
to address the need for increased computational power through the increase of
cores within a digital processor, neuromorphic engineers and scientists can
complement this need by building processor architectures where memory is
distributed with the processing. In this paper we present a survey of
brain-inspired processor architectures that support models of cortical networks
and deep neural networks. These architectures range from serial clocked
implementations of multi-neuron systems to massively parallel asynchronous ones
and from purely digital systems to mixed analog/digital systems which implement
more biological-like models of neurons and synapses together with a suite of
adaptation and learning mechanisms analogous to the ones found in biological
nervous systems. We describe the advantages of the different approaches being
pursued and present the challenges that need to be addressed for building
artificial neural processing systems that can display the richness of behaviors
seen in biological systems.Comment: Submitted to Proceedings of IEEE, review of recently proposed
neuromorphic computing platforms and system
- …