156 research outputs found
Inducing Thermal-Awareness in Multicore Systems Using Networks-on-Chip
Technology scaling imposes an ever increasing temperature stress on digital circuit design due to transistor density, especially on highly integrated systems, such as Multi- Processor Systems-on-Chip (MPSoCs). Therefore, temperature-aware design is mandatory and should be performed at the early design stages of MPSoCs to avoid iterations and delays in the deployment of final consumer products. In this paper we present a novel hardware infrastructure to provide thermal control of MPSoC architectures, which is based on exploiting the NoC interconnects of the baseline system as an active component to communicate and coordinate between temperature sensors scattered around the chip, in order to globally monitor the actual temperature of of the system. Then, a thermal management unit and clock frequency controllers are included as part of the active NoC-based thermal control infrastructure to adjust the frequency and voltage of the processing elements according to the temperature requirements of each MPSoC design at runtime. We show experimental results of the application of the proposed active NoC-based thermal management infrastructure to implement effective global temperature control policies for a real-life 4-core MPSoC, running real-life video processing benchmarks, emulated on an FPGA-based thermal emulation framework. Furthermore, due to the better thermal balancing of our proposed active NoC-based thermal control, the MPSoC performance improves almost 40% and achieves 45% energy savings with respect to local DVFS thermal control approaches
Cost Effective Routing Implementations for On-chip Networks
Arquitecturas de múltiples núcleos como multiprocesadores (CMP) y soluciones multiprocesador para sistemas dentro del chip (MPSoCs) actuales se basan en la eficacia de las redes dentro del chip (NoC) para la comunicación entre los diversos núcleos. Un diseño eficiente de red dentro del chip debe ser escalable y al mismo tiempo obtener valores ajustados de área, latencia y consumo de energía. Para diseños de red dentro del chip de propósito general se suele usar topologías de malla 2D ya que se ajustan a la distribución del chip. Sin embargo, la aparición de nuevos retos debe ser abordada por los diseñadores. Una mayor probabilidad de defectos de fabricación, la necesidad de un uso optimizado de los recursos para aumentar el paralelismo a nivel de aplicación o la necesidad de técnicas eficaces de ahorro de energía, puede ocasionar patrones de irregularidad en las topologías. Además, el soporte para comunicación colectiva es una característica buscada para abordar con eficacia las necesidades de comunicación de los protocolos de coherencia de caché. En estas condiciones, un encaminamiento eficiente de los mensajes se convierte en un reto a superar.
El objetivo de esta tesis es establecer las bases de una nueva arquitectura para encaminamiento distribuido basado en lógica que es capaz de adaptarse a cualquier topología irregular derivada de una estructura de malla 2D, proporcionando así una cobertura total para cualquier caso resultado de soportar los retos mencionados anteriormente. Para conseguirlo, en primer lugar, se parte desde una base, para luego analizar una evolución de varios mecanismos, y finalmente llegar a una implementación, que abarca varios módulos para alcanzar el objetivo mencionado anteriormente. De hecho, esta última implementación tiene por nombre eLBDR (effective Logic-Based Distributed Routing). Este trabajo cubre desde el primer mecanismo, LBDR, hasta el resto de mecanismos que han surgido progresivamente.Rodrigo Mocholí, S. (2010). Cost Effective Routing Implementations for On-chip Networks [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8962Palanci
Design Space Exploration and Resource Management of Multi/Many-Core Systems
The increasing demand of processing a higher number of applications and related data on computing platforms has resulted in reliance on multi-/many-core chips as they facilitate parallel processing. However, there is a desire for these platforms to be energy-efficient and reliable, and they need to perform secure computations for the interest of the whole community. This book provides perspectives on the aforementioned aspects from leading researchers in terms of state-of-the-art contributions and upcoming trends
Toward Reliable, Secure, and Energy-Efficient Multi-Core System Design
Computer hardware researchers have perennially focussed on improving the performance of computers while stipulating the energy consumption under a strict budget. While several innovations over the years have led to high performance and energy efficient computers, more challenges have also emerged as a fallout. For example, smaller transistor devices in modern multi-core systems are afflicted with several reliability and security concerns, which were inconceivable even a decade ago. Tackling these bottlenecks happens to negatively impact the power and performance of the computers. This dissertation explores novel techniques to gracefully solve some of the pressing challenges of the modern computer design. Specifically, the proposed techniques improve the reliability of on-chip communication fabric under a high power supply noise, increase the energy-efficiency of low-power graphics processing units, and demonstrate an unprecedented security loophole of the low-power computing paradigm through rigorous hardware-based experiments
Recommended from our members
Silicon Photonics for All-Optical Processing and High-Bandwidth-Density Interconnects
Silicon photonics has emerged in recent years as one of the leading technologies poised to enable penetration of optical communications deeper and more intimately into computing systems than ever before. The integration potential of power efficient WDM links at the first level package or even deeper has been a strong driver for the rapid development this field has seen in recent years. The integration of photonic communication modules with very high bandwidth densities and virtually no bandwidth-distance limitations at the short reach regime of high performance computers and data centers has the potential to alleviate many of the bandwidth bottlenecks currently faced by board, rack, and facility levels. While networks on chip for chip multiprocessors (CMP) were initially deemed the target application of silicon photonic components, it has become evident in recent years that the initial lower hanging fruit is the CMP's I/O links to memory as well as other CMPs. The first chapter of the thesis provides more detailed motivation for the integration of silicon photonic modules into compute systems and surveys some of the recent developments in the field. The second chapter then proceeds to detail a technical case study of silicon photonic microring-based WDM links' scalability and power efficiency for these chip I/O applications which could be developed in the intermediate future. The analysis, initiated originally for a workshop on optical and electrical board and rack level interconnects, looks into a detailed model of the optical power budget for such a link capturing both single-channel aspects as well as WDM-operation-related considerations which are unique for a microring physical characteristics. The holistic analysis for the full link captures the wavelength-channel-spacing dependent characteristics, provides some methodologies for device design in the WDM-operation context, and provides performance predictions based on current best-of-class silicon photonic devices. The key results of the analysis are the determination of upper bounds on the aggregate achievable communication bandwidth per link, identifying design trade-offs for bandwidth versus power efficiency, and highlighting the need for continued technological improvements in both laser as well as photodetector technologies to allow acceptable power efficiency operation of such systems.The third chapter, while continuing on the theme silicon photonic high bandwidth density links, proceeds to detail the first experimental demonstration and characterization of an on-chip spatial division multiplexing (SDM) scheme based on microrings for the multiplexing and demultiplexing functionalities. In the context of more forward looking optical network-on-chip environments, SDM-enabled WDM photonic interconnects can potentially achieve superior bandwidth densities per waveguide compared to WDM-only photonic interconnects. The microring-based implementation allows dynamic tuning of the multiplexing and demultiplexing characteristic of the system which allows operation on WDM grid as well device tuning to combat intra-channel crosstalk. The characterization focuses on the first reported power penalty measurements for on-chip silicon photonic SDM link showing minimal penalties achievable with 3 spatial modes concurrently operating on a single waveguide with 10-Gb/s data carried by each mode. The chapter also details the first demonstration of WDM combined with SDM operation with six separate wavelength-and-spatial 10-Gb/s channels with error free operation and low power penalties. The fourth, fifth, and sixth chapters shift in topic from the application of silicon photonics to communication links to the evolving use of silicon waveguides for nonlinear all-optical processing. The unique tight mode confinement in sub-micron cross-sections combined with the high response of silicon have motivated the development of four-wave mixing (FWM)-based processing silicon devices. The key feature of the silicon platform for these nonlinear processing platforms is the ability to finely and uniformly control the dispersive properties of the optical structures in a way that enables completely offsetting the material dispersion and achieve dispersion profiles required for effective parametric interaction of waves in the optical structures. Chapter four primarily introduces and motivates nonlinear processing in communication applications and focuses on recent achievements in non-silicon and silicon FWM platforms. Chapter five describes some of the author's contributions on parametric processing of high speed data in silicon nonlinear devices, with first of a kind demonstrations of wavelength conversion of 160-Gb/s optically time division multiplexed (OTDM) data as well as the wavelength-multicasting of a 320-Gb/s OTDM stream. The chapter then details a methodical characterization and demonstration of several record wavelength conversion experiments of data in silicon with 40-Gb/s data wavelength-converted across more than 100 nm with only 1.4-dB of power penalties as well as the wavelength and format conversion of 10-Gb/s data across up to 168 nm with sensitivity gains stemming from the format conversion of about 2 dB and a residual conversion penalty of only 0.1 dB, achieved by implementing an improved experimental setup. Both experiments highlight the performance uniformity of the conversion process for a wide range of probe-idler detuning settings, showcasing the silicon platform's unique broadband phase matching properties. The sixth chapter presents a slight shift in motivation for parametric processing from traditional telecom-wavelength applications to functionalities developed targeting mid-IR operation. Parametric-processing in the silicon platform at long wavelengths holds large potential for performance improvements due to the elimination of two-photon absorption in silicon at long wavelengths as well as silicon's dispersion engineering capabilities which uniquely position the silicon platform for effective phase matching of significantly wavelength detuned waves. Four-wave mixing signal generation and reception at mid-IR wavelengths are attractive candidates for tunable flexible operation with modulation and detection speeds which are currently only available at telecom wavelengths. With this vision in mind, several contributions detailing extension of FWM functionalities in silicon to operate at wavelengths close to 2 μm with performance equivalent to much smaller detuning setting measurements. The contributions detail the experimental demonstration of the first silicon optical processing functionalities achieved at such long wavelengths including the wavelength conversion and unicast of 10-Gb/s signals with up to 700 nm of probe-idler detuning, the combined two-stage 10-Gb/s FWM-link in which both data generation and detection at 1900 nm is facilitated by parametric processing in silicon with only 2.1-dB overall penalty, the first ever 40-Gb/s receiver at 1900 nm based on a FWM stage for simultaneous temporal demultiplexing and wavelength conversion, and lastly, the demonstration of a 40-Gb/s FWM-link operation with only 3.6 dB of penalty. The chapter concludes with a short discussion on possible extensions to enable silicon parametric processing at even longer wavelengths targeting the mid-IR spectral transmission window of 3-5 μm
Towards Compelling Cases for the Viability of Silicon-Nanophotonic Technology in Future Many-core Systems
Many crossbenchmarking results reported in the open literature raise optimistic expectations on the use of optical networks-on-chip (ONoCs) for high-performance and low-power on-chip communications in future Manycore Systems. However, these works ultimately fail to make a compelling case for the viability of silicon-nanophotonic technology for two fundamental reasons:
(1)Lack of aggressive electrical baselines (ENoCs).
(2) Inaccuracy in physical- and architecture-layer analysis of the ONoC.
This thesis aims at providing the guidelines and minimum requirements so that nanophotonic emerging technology may become of practical relevance. The key enabler for this study is a cross-layer design methodology of the optical transport medium, ranging from the consideration of the predictability gap between ONoC logic schemes and their physical implementations, up to architecture-level design issues such as the network interface and its co-design requirements with the memory hierarchy. In order to increase the practical relevance of the study, we consider a consolidated electrical NoC counterpart with an optimized architecture from a performance and power viewpoint. The quality metrics of this latter are derived from synthesis and place&route on an industrial 40nm low-power technology library. Building on this methodology, we are able to provide a realistic energy efficiency comparison between ONoC and ENoC both at the level of the system interconnect and of the system as a whole, pointing out the sensitivity of the results to the maturity of the underlying silicon nanophotonic technology, and at the same time paving the way towards compelling cases for the viability of such technology in next generation many-cores systems
Power Bounded Computing on Current & Emerging HPC Systems
Power has become a critical constraint for the evolution of large scale High Performance Computing (HPC) systems and commercial data centers. This constraint spans almost every level of computing technologies, from IC chips all the way up to data centers due to physical, technical, and economic reasons. To cope with this reality, it is necessary to understand how available or permissible power impacts the design and performance of emergent computer systems. For this reason, we propose power bounded computing and corresponding technologies to optimize performance on HPC systems with limited power budgets.
We have multiple research objectives in this dissertation. They center on the understanding of the interaction between performance, power bounds, and a hierarchical power management strategy. First, we develop heuristics and application aware power allocation methods to improve application performance on a single node. Second, we develop algorithms to coordinate power across nodes and components based on application characteristic and power budget on a cluster. Third, we investigate performance interference induced by hardware and power contentions, and propose a contention aware job scheduling to maximize system throughput under given power budgets for node sharing system. Fourth, we extend to GPU-accelerated systems and workloads and develop an online dynamic performance & power approach to meet both performance requirement and power efficiency.
Power bounded computing improves performance scalability and power efficiency and decreases operation costs of HPC systems and data centers. This dissertation opens up several new ways for research in power bounded computing to address the power challenges in HPC systems. The proposed power and resource management techniques provide new directions and guidelines to green exscale computing and other computing systems
- …