Search CORE

3 research outputs found

Dynamic Thermal Management Of Vertically Stacked Heterogeneous Processors

Author: Sharma Ajay
Publication venue: eGrove
Publication date: 01/01/2016
Field of study

eGrove (Univ. of Mississippi)

Recommended from our members

Process Variation in Silicon Photonic Devices

Author: Chen Xi
Publication venue: University of Colorado Boulder
Publication date: 01/01/2013
Field of study

The high index contrast of the silicon - silicon dioxide material system allows for dense integration of optical waveguide devices. Possible applications include intra-chip, inter-chip and fiber optic interconnection systems. Optical intra-chip interconnections become more desirable as the complementary metal-oxide-semiconductor (CMOS) circuit density puts ever tighter constraint on on-chip interconnection performance. Board level, rack level and rack-to-rack data center interconnections are ever more constrained by space and bandwidth to which silicon photonic modules may offer an improvement. As fiber optic systems serve smaller and smaller area systems, integrated switching systems that are enabled by silicon photonic devices involving wavelength division multiplexing (WDM) become more desirable. In this thesis, we firstly take a brief review of the development history of information technology, optical communication and silicon photonics. Secondly we examine the optical performance of an array of photonic devices which are the basic building blocks for silicon photonic circuits. Thirdly we turn the attention to the fabrication related issues. Silicon photonic circuits are prone to the thermal and fabrication induced process variations. We discover the process variation exhibits a “random walk” pattern with spatial extent at wafer scale. Fourthly we propose a simple method to extract fundamental parameters out of fabricated silicon photonic devices. Based on the systemic wafer-scale measurement results, our method combines the advantage of both numerical simulation and simple analytical modeling techniques. Lastly, we propose a variation-aware on-chip interconnect design for multi-core processors. This design adapts to on-chip thermal and process variation effects, pointing to the improvement of wafer-scale fabrication yield and interconnect network communication throughput

CU Scholar Institutional Repository

Three-Dimensional Processing-In-Memory-Architectures: A Holistic Tool For Modeling And Simulation

Author: Siegl Patrick Daniel Marcus
Publication venue
Publication date: 01/01/2018
Field of study

Die gemeinhin als Memory Wall bekannte, sich stetig weitende Leistungslücke zwischen Prozessor- und Speicherarchitekturen erfordert neue Konzepte, um weiterhin eine Skalierung der Rechenleistung zu ermöglichen. Da Speicher als die Beschränkung innerhalb einer Von-Neumann-Architektur identifiziert wurden, widmet sich die Arbeit dieser Problemstellung. Obgleich dreidimensionale Speicher zu einer Linderung der Memory Wall beitragen können, sind diese alleinig für die zukünftige Skalierung ungenügend. Aufgrund höherer Effizienzen stellt die Integration von Rechenkapazität in den Speicher (Processing-In-Memory, PIM) ein vielversprechender Ausweg dar, jedoch existiert ein Mangel an PIM-Simulationsmodellen. Daher wurde ein flexibles Simulationswerkzeug für dreidimensionale Speicherstapel geschaffen, welches zur Modellierung von dreidimensionalen PIM erweitert wurde. Dieses kann Speicherstapel wie etwa Hybrid Memory Cube standardkonform simulieren und bietet zugleich eine hohe Genauigkeit indem auf elementaren Datenpaketen in Kombination mit dem Hardware validierten Simulator BOBSim modelliert wird. Ein eigens entworfener Simulationstaktbaum ermöglicht zugleich eine schnelle Ausführung. Messungen weisen im funktionalen Modus eine 100-fache Beschleunigung auf, wohingegen eine Verdoppelung der Ausführungsgeschwindigkeit mit Taktgenauigkeit erzielt wird. Anhand eines eigens implementierten, binärkompatiblen GPU-Beschleunigers wird die Modellierung einer vollständig dreidimensionalen PIM-Architektur demonstriert. Dabei orientieren sich die maximalen Hardwareressourcen an einem PIM-Beschleuniger aus der Literatur. Evaluiert wird einerseits das GPU-Simulationsmodell eigenständig, andererseits als PIM-Verbund jeweils mit Hilfe einer repräsentativ gewählten, speicherbeschränkten geophysikalischen Bildverarbeitung. Bei alleiniger Betrachtung des GPU-Simulationsmodells weist dieses eine signifikant gesteigerte Simulationsgeschwindigkeit auf, bei gleichzeitiger Abweichung von 6% gegenüber dem Verilator-Modell. Nachfolgend werden innerhalb dieser Arbeit unterschiedliche Konfigurationen des integrierten PIM-Beschleunigers evaluiert. Je nach gewählter Konfiguration kann der genutzte Algorithmus entweder bis zu 140GFLOPS an tatsächlicher Rechenleistung abrufen oder eine maximale Recheneffizienz von synthetisch 30% bzw. real 24,5% erzielen. Letzteres stellt eine Verdopplung des Stands der Technik dar. Eine anknüpfende Diskussion erläutert eingehend die Resultate.The steadily widening performance gap between processor- and memory-architectures - commonly known as the Memory Wall - requires novel concepts to achieve further scaling in processing performance. As memories were identified as the limitation within a Von-Neumann-architecture, this work addresses this constraining issue. Although three-dimensional memories alleviate the effects of the Memory Wall, the sole utilization of such memories would be insufficient. Due to higher efficiencies, the integration of processing capacity into memories (so-called Processing-In-Memory, PIM) depicts a promising alternative. However, a lack of PIM simulation models still remains. As a consequence, a flexible simulation tool for three-dimensional stacked memories was established, which was extended for modeling three-dimensional PIM architectures. This tool can simulate stacked memories such as Hybrid Memory Cube standard-compliant and simultaneously offers high accuracy by modeling on elementary data packets (FLIT) in combination with the hardware validated BOBSim simulator. To this, a specifically designed simulation clock tree enables an rapid simulation execution. A 100x speed up in simulation execution can be measured while utilizing the functional mode, whereas a 2x speed up is achieved during clock-cycle accuracy mode. With the aid of a specifically implemented, binary compatible GPU accelerator and the established tool, the modeling of a holistic three-dimensional PIM architecture is demonstrated within this work. Hardware resources used were constrained by a PIM architecture from literature. A representative, memory-bound, geophysical imaging algorithm was leveraged to evaluate the GPU model as well as the compound PIM simulation model. The sole GPU simulation model depicts a significantly improved simulation performance with a deviation of 6% compared to a Verilator model. Subsequently, various PIM accelerator configurations with the integrated GPU model were evaluated. Depending on the chosen PIM configuration, the utilized algorithm achieves 140GFLOPS of processing performance or a maximum computing efficiency of synthetically 30% or realistically 24.5%. The latter depicts a 2x improvement compared to state-of-the-art. A following discussion showcases the results in depth

Digitale Bibliothek Braunschweig