338 research outputs found

    Exploring the design space for 3D clustered architectures

    Get PDF
    Journal Article3D die-stacked chips are emerging as intriguing prospects for the future because of their ability to reduce on-chip wire delays and power consumption. However, they will likely cause an increase in chip operating temperature, which is already a major bottleneck in modern microprocessor design. We believe that 3D will provide the highest performance benefit for high-ILP cores, where wire delays for 2D designs can be substantial. A clustered microarchitecture is an example of a complexity-effective implementation of a high-ILP core. In this paper, we consider 3D organizations of a single-threaded clustered microarchitecture to understand how floorplanning impacts performance and temperature. We first show that delays between the data cache and ALUs are most critical to performance. We then present a novel 3D layout that provides the best balance between temperature and performance. The best-performing 3D layout has 12% higher performance than the best-performing 2D layout

    Understanding the impact of 3D stacked layouts on ILP

    Get PDF
    Journal Article3D die-stacked chips can alleviate the penalties imposed by long wires within micro-processor circuits. Many recent studies have attempted to partition each microprocessor structure across three dimensions to reduce their access times. In this paper, we implement each microprocessor structure on a single 2D die and leverage 3D to reduce the lengths of wires that communicate data between microprocessor structures within a single core. We begin with a criticality analysis of inter-structure wire delays and show that for most tra- ditional simple superscalar cores, 2D floorplans are already very efficient at minimizing critical wire delays. For an aggressive wire-constrained clustered superscalar architecture, an exploration of the design space reveals that 3D can yield higher benefit. However, this benefit may be negated by the higher power density and temperature entailed by 3D integration. Overall, we report a negative result and argue against leveraging 3D for higher ILP

    3D Execution Monitor (3D-EM): Using 3D Circuits to Detect Hardware Malicious Inclusions in General Purpose Processors

    Get PDF
    Best PhD Paper, Proceedings of the International Conference on Information Warfare and Security (ICIW), Washington, DC, USA, March 2011, Pages 289-298. [Paper] [Slides] [Abstract] [Conference

    Data processing and information classification— an in-memory approach

    Get PDF
    9noTo live in the information society means to be surrounded by billions of electronic devices full of sensors that constantly acquire data. This enormous amount of data must be processed and classified. A solution commonly adopted is to send these data to server farms to be remotely elaborated. The drawback is a huge battery drain due to high amount of information that must be exchanged. To compensate this problem data must be processed locally, near the sensor itself. But this solution requires huge computational capabilities. While microprocessors, even mobile ones, nowadays have enough computational power, their performance are severely limited by the Memory Wall problem. Memories are too slow, so microprocessors cannot fetch enough data from them, greatly limiting their performance. A solution is the Processing-In-Memory (PIM) approach. New memories are designed that can elaborate data inside them eliminating the Memory Wall problem. In this work we present an example of such a system, using as a case of study the Bitmap Indexing algorithm. Such algorithm is used to classify data coming from many sources in parallel. We propose a hardware accelerator designed around the Processing-In-Memory approach, that is capable of implementing this algorithm and that can also be reconfigured to do other tasks or to work as standard memory. The architecture has been synthesized using CMOS technology. The results that we have obtained highlights that, not only it is possible to process and classify huge amount of data locally, but also that it is possible to obtain this result with a very low power consumption.openopenAndrighetti, M. .; Turvani, G.; Santoro, G.; Vacca, M.; Marchesin, A.; Ottati, F.; Roch, M.R.; Graziano, M.; Zamboni, M.Andrighetti, M.; Turvani, G.; Santoro, G.; Vacca, M.; Marchesin, A.; Ottati, F.; Roch, M. R.; Graziano, M.; Zamboni, M

    Asynchronous 3D (Async3D): Design Methodology and Analysis of 3D Asynchronous Circuits

    Get PDF
    This dissertation focuses on the application of 3D integrated circuit (IC) technology on asynchronous logic paradigms, mainly NULL Convention Logic (NCL) and Multi-Threshold NCL (MTNCL). It presents the Async3D tool flow and library for NCL and MTNCL 3D ICs. It also analyzes NCL and MTNCL circuits in 3D IC. Several FIR filter designs were implement in NCL, MTNCL, and synchronous architecture to compare synchronous and asynchronous circuits in 2D and 3D ICs. The designs were normalized based on performance and several metrics were measured for comparison. Area, interconnect length, power consumption, and power density were compared among NCL, MTNCL, and synchronous designs. The NCL and MTNCL designs showed improvements in all metrics when moving from 2D to 3D. The 3D NCL and MTNCL designs also showed a balanced power distribution in post-layout analysis. This could alleviate the hotspot problem prevalently found in most 3D ICs. NCL and MTNCL have the potential to synergize well with 3D IC technology

    Design for pre-bond testability in 3D integrated circuits

    Get PDF
    In this dissertation we propose several DFT techniques specific to 3D stacked IC systems. The goal has explicitly been to create techniques that integrate easily with existing IC test systems. Specifically, this means utilizing scan- and wrapper-based techniques, two foundations of the digital IC test industry. First, we describe a general test architecture for 3D ICs. In this architecture, each tier of a 3D design is wrapped in test control logic that both manages tier test pre-bond and integrates the tier into the large test architecture post-bond. We describe a new kind of boundary scan to provide the necessary test control and observation of the partial circuits, and we propose a new design methodology for test hardcore that ensures both pre-bond functionality and post-bond optimality. We present the application of these techniques to the 3D-MAPS test vehicle, which has proven their effectiveness. Second, we extend these DFT techniques to circuit-partitioned designs. We find that boundary scan design is generally sufficient, but that some 3D designs require special DFT treatment. Most importantly, we demonstrate that the functional partitioning inherent in 3D design can potentially decrease the total test cost of verifying a circuit. Third, we present a new CAD algorithm for designing 3D test wrappers. This algorithm co-designs the pre-bond and post-bond wrappers to simultaneously minimize test time and routing cost. On average, our algorithm utilizes over 90% of the wires in both the pre-bond and post-bond wrappers. Finally, we look at the 3D vias themselves to develop a low-cost, high-volume pre-bond test methodology appropriate for production-level test. We describe the shorting probes methodology, wherein large test probes are used to contact multiple small 3D vias. This technique is an all-digital test method that integrates seamlessly into existing test flows. Our experimental results demonstrate two key facts: neither the large capacitance of the probe tips nor the process variation in the 3D vias and the probe tips significantly hinders the testability of the circuits. Taken together, this body of work defines a complete test methodology for testing 3D ICs pre-bond, eliminating one of the key hurdles to the commercialization of 3D technology.PhDCommittee Chair: Lee, Hsien-Hsin; Committee Member: Bakir, Muhannad; Committee Member: Lim, Sung Kyu; Committee Member: Vuduc, Richard; Committee Member: Yalamanchili, Sudhaka

    Implementing a hybrid SRAM / eDRAM NUCA architecture

    Get PDF
    In this paper, we propose a hybrid cache architecture that exploits the main features of both memory technologies, speed of SRAM and high density of eDRAM. We demonstrate, that due to the high locality found in emerging applications, a high percentage of data that enters to the on-chip last-level cache are not accessed again before they are replacedPreprin
    • …
    corecore