Search CORE

1,206 research outputs found

Modeling of thermally induced skew variations in clock distribution network

Author: Calimera Andrea
Liu Wei
Macii Alberto
Macii Enrico
Poncino Massimo
Sassone Alessandro
Publication venue: IEEE
Publication date: 01/01/2011
Field of study

Clock distribution network is sensitive to large thermal gradients on the die as the performance of both clock buffers and interconnects are affected by temperature. A robust clock network design relies on the accurate analysis of clock skew subject to temperature variations. In this work, we address the problem of thermally induced clock skew modeling in nanometer CMOS technologies. The complex thermal behavior of both buffers and interconnects are taken into account. In addition, our characterization of the temperature effect on buffers and interconnects provides valuable insight to designers about the potential impact of thermal variations on clock networks. The use of industrial standard data format in the interface allows our tool to be easily integrated into existing design flow

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Resource virtualization for mapping multiple applications on the same platform

Author: van Hintum M.M.W.
Publication venue
Publication date: 01/01/2007
Field of study

Repository TU/e

Pure OAI Repository

Low latency optical switch for high performance computing with minimized processor energy load [Invited]

Author: Cheng Q
Liu S
Madarbux MR
Penty RV
Watts PM
White IH
Wonfor A
Publication venue: Journal of Optical Communications and Networking
Publication date: 01/01/2015
Field of study

Power density and cooling issues are limiting the performance of high performance chip multiprocessors (CMPs), and off-chip communications currently consume more than 20% of power for memory, coherence, PCI, and Ethernet links. Photonic transceivers integrated with CMPs are being developed to overcome these issues, potentially allowing low hop count switched connections between chips or data center servers. However, latency in setting up optical connections is critically important in all computing applications, and having transceivers integrated on the processor chip also pushes other network functions and their associated power consumption onto the chip. In this paper, we propose a low latency optical switch architecture that minimizes the power consumed on the processor chip for two scenarios: multiple-socket shared memory coherence networks and optical top-of-rack switches for data centers. The switch architecture reduces power consumed on the CMP using a control plane with a simplified send and forget server interface and the use of a hybrid Mach–Zehnder interferometer and semiconductor optical amplifier integrated optical switch with electronic buffering. Results show that the proposed architecture offers a 42% reduction in head latency at low loads compared with a conventional scheduled optical switch as well as offering increased performance for streaming and incast traffic patterns. Power dissipated on the server chip is shown to be reduced by more than 60% compared with a scheduled optical switch architecture with ring resonator switching.This work was supported by the UK Engineering and Physical Sciences Research Council (EPSRC) INTERNET program grant and an EPSRC Fellowship grant to Philip Watts. Both University College London and the University of Cambridge are members of GreenTouch.This paper was published in the Journal of Optical Communications and Networking and is made available as an electronic reprint with the permission of OSA. The paper can be found at the following URL on the OSA website: http://www.opticsinfobase.org/jocn/abstract.cfm?uri=jocn-7-3-A498. Systematic or multiple reproduction or distribution to multiple locations via electronic or other means is prohibited and is subject to penalties under law. This is the accepted manuscript of a paper published in the Journal of Optical Communications and Networking, Vol. 7, Issue 3, pp. A498-A510 (2015) http://dx.doi.org/10.1364/JOCN.7.00A49

UCL Discovery

Apollo (Cambridge)

Towards Compelling Cases for the Viability of Silicon-Nanophotonic Technology in Future Many-core Systems

Author: Ramini Luca
Publication venue
Publication date: 01/01/2014
Field of study

Many crossbenchmarking results reported in the open literature raise optimistic expectations on the use of optical networks-on-chip (ONoCs) for high-performance and low-power on-chip communications in future Manycore Systems. However, these works ultimately fail to make a compelling case for the viability of silicon-nanophotonic technology for two fundamental reasons: (1)Lack of aggressive electrical baselines (ENoCs). (2) Inaccuracy in physical- and architecture-layer analysis of the ONoC. This thesis aims at providing the guidelines and minimum requirements so that nanophotonic emerging technology may become of practical relevance. The key enabler for this study is a cross-layer design methodology of the optical transport medium, ranging from the consideration of the predictability gap between ONoC logic schemes and their physical implementations, up to architecture-level design issues such as the network interface and its co-design requirements with the memory hierarchy. In order to increase the practical relevance of the study, we consider a consolidated electrical NoC counterpart with an optimized architecture from a performance and power viewpoint. The quality metrics of this latter are derived from synthesis and place&route on an industrial 40nm low-power technology library. Building on this methodology, we are able to provide a realistic energy efficiency comparison between ONoC and ENoC both at the level of the system interconnect and of the system as a whole, pointing out the sensitivity of the results to the maturity of the underlying silicon nanophotonic technology, and at the same time paving the way towards compelling cases for the viability of such technology in next generation many-cores systems

EprintsUnife

Archivio istituzionale della ricerca - Università di Ferrara

Driving the Network-on-Chip Revolution to Remove the Interconnect Bottleneck in Nanoscale Multi-Processor Systems-on-Chip

Author: Medardoni Simone
Publication venue
Publication date: 01/01/2009
Field of study

The sustained demand for faster, more powerful chips has been met by the availability of chip manufacturing processes allowing for the integration of increasing numbers of computation units onto a single die. The resulting outcome, especially in the embedded domain, has often been called SYSTEM-ON-CHIP (SoC) or MULTI-PROCESSOR SYSTEM-ON-CHIP (MP-SoC). MPSoC design brings to the foreground a large number of challenges, one of the most prominent of which is the design of the chip interconnection. With a number of on-chip blocks presently ranging in the tens, and quickly approaching the hundreds, the novel issue of how to best provide on-chip communication resources is clearly felt. NETWORKS-ON-CHIPS (NoCs) are the most comprehensive and scalable answer to this design concern. By bringing large-scale networking concepts to the on-chip domain, they guarantee a structured answer to present and future communication requirements. The point-to-point connection and packet switching paradigms they involve are also of great help in minimizing wiring overhead and physical routing issues. However, as with any technology of recent inception, NoC design is still an evolving discipline. Several main areas of interest require deep investigation for NoCs to become viable solutions: • The design of the NoC architecture needs to strike the best tradeoff among performance, features and the tight area and power constraints of the onchip domain. • Simulation and verification infrastructure must be put in place to explore, validate and optimize the NoC performance. • NoCs offer a huge design space, thanks to their extreme customizability in terms of topology and architectural parameters. Design tools are needed to prune this space and pick the best solutions. • Even more so given their global, distributed nature, it is essential to evaluate the physical implementation of NoCs to evaluate their suitability for next-generation designs and their area and power costs. This dissertation performs a design space exploration of network-on-chip architectures, in order to point-out the trade-offs associated with the design of each individual network building blocks and with the design of network topology overall. The design space exploration is preceded by a comparative analysis of state-of-the-art interconnect fabrics with themselves and with early networkon- chip prototypes. The ultimate objective is to point out the key advantages that NoC realizations provide with respect to state-of-the-art communication infrastructures and to point out the challenges that lie ahead in order to make this new interconnect technology come true. Among these latter, technologyrelated challenges are emerging that call for dedicated design techniques at all levels of the design hierarchy. In particular, leakage power dissipation, containment of process variations and of their effects. The achievement of the above objectives was enabled by means of a NoC simulation environment for cycleaccurate modelling and simulation and by means of a back-end facility for the study of NoC physical implementation effects. Overall, all the results provided by this work have been validated on actual silicon layout

EprintsUnife

Archivio istituzionale della ricerca - Università di Ferrara

Vega: A Ten-Core SoC for IoT Endnodes with DNN Acceleration and Cognitive Wake-Up from MRAM-Based State-Retentive Sleep Mode

Author: Benini L.
Chen J.
Conti F.
Eggiman M.
Flamand E.
Guermandi M.
Loi I.
Mach S.
Mauro A. D.
Pullini A.
Rossi D.
Tagliavini G.
Publication venue
Publication date: 01/01/2022
Field of study

The Internet-of-Things (IoT) requires endnodes with ultra-low-power always-on capability for a long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT endnode system on chip (SoC) capable of scaling from a 1.7- μW fully retentive cognitive sleep mode up to 32.2-GOPS (at 49.4 mW) peak performance on NSAAs, including mobile deep neural network (DNN) inference, exploiting 1.6 MB of state-retentive SRAM, and 4 MB of non-volatile magnetoresistive random access memory (MRAM). To meet the performance and flexibility requirements of NSAAs, the SoC features ten RISC-V cores: one core for SoC and IO management and a nine-core cluster supporting multi-precision single instruction multiple data (SIMD) integer and floating-point (FP) computation. Vega achieves the state-of-the-art (SoA)-leading efficiency of 615 GOPS/W on 8-bit INT computation (boosted to 1.3 TOPS/W for 8-bit DNN inference with hardware acceleration). On FP computation, it achieves the SoA-leading efficiency of 79 and 129 GFLOPS/W on 32- and 16-bit FP, respectively. Two programmable machine learning (ML) accelerators boost energy efficiency in cognitive sleep and active states

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Addressing Manufacturing Challenges in NoC-based ULSI Designs

Author: Hernández Luz Carles
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 19/07/2012
Field of study

Hernández Luz, C. (2012). Addressing Manufacturing Challenges in NoC-based ULSI Designs [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1669

RiuNet

Floorplan-Aware High Performance NoC Design

Author: Roca Pérez Antoni
Publication venue: 'Universitat Politecnica de Valencia'
Publication date: 20/11/2012
Field of study

Las actuales arquitecturas de m�ltiples n�cleos como los chip multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) han adoptado a las redes dentro del chip (NoC) como elemento -ptimo para la inter-conexi-n de los diversos elementos de dichos sistemas. En este sentido, fabricantes de CMPs y MPSoCs han adoptado NoCs sencillas, generalmente con una topolog'a en malla o anillo, ya que son suficientes para satisfacer las necesidades de los sistemas actuales. Sin embargo a medida que los requerimientos del sistema -- baja latencia y alto rendimiento -- se hacen m�s exigentes, estas redes tan simples dejan de ser una soluci-n real. As', la comunidad investigadora ha propuesto y analizado NoCs m�s complejas. No obstante, estas soluciones son m�s dif'ciles de implementar -- especialmente los enlaces largos -- haciendo que este tipo de topolog'as complejas sean demasiado costosas o incluso inviables. En esta tesis, presentamos una metodolog'a de dise-o que minimiza la p�rdida de prestaciones de la red debido a su implementaci-n real. Los principales problemas que se encuentran al implementar una NoC son los conmutadores y los enlaces largos. En esta tesis, el conmutador se ha hecho modular, es decir, formado como uni-n de m-dulos m�s peque-os. En nuestro caso, los m-dulos son id�nticos, donde cada m-dulo es capaz de arbitrar, conmutar, y almacenar los mensajes que le llegan. Posteriormente, flexibilizamos la colocaci-n de estos m-dulos en el chip, permitiendo que m-dulos de un mismo conmutador est�n distribuidos por el chip. Esta metodolog'a de dise-o la hemos aplicado a diferentes escenarios. Primeramente, hemos introducido nuestro conmutador modular en NoCs con topolog'as conocidas como la malla 2D. Los resultados muestran como la modularidad y la distribuci-n del conmutador reducen la latencia y el consumo de potencia de la red. En segundo lugar, hemos utilizado nuestra metodolog'a de dise-o para implementar un crossbar distribuidRoca Pérez, A. (2012). Floorplan-Aware High Performance NoC Design [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/17844Palanci

Crossref

RiuNet