3 research outputs found

    Low-Power High-Performance Ternary Content Addressable Memory Circuits

    Get PDF
    Ternary content addressable memories (TCAMs) are hardware-based parallel lookup tables with bit-level masking capability. They are attractive for applications such as packet forwarding and classification in network routers. Despite the attractive features of TCAMs, high power consumption is one of the most critical challenges faced by TCAM designers. This work proposes circuit techniques for reducing TCAM power consumption. The main contribution of this work is divided in two parts: (i) reduction in match line (ML) sensing energy, and (ii) static-power reduction techniques. The ML sensing energy is reduced by employing (i) positive-feedback ML sense amplifiers (MLSAs), (ii) low-capacitance comparison logic, and (iii) low-power ML-segmentation techniques. The positive-feedback MLSAs include both resistive and active feedback to reduce the ML sensing energy. A body-bias technique can further improve the feedback action at the expense of additional area and ML capacitance. The measurement results of the active-feedback MLSA show 50-56% reduction in ML sensing energy. The measurement results of the proposed low-capacitance comparison logic show 25% and 42% reductions in ML sensing energy and time, respectively, which can further be improved by careful layout. The low-power ML-segmentation techniques include dual ML TCAM and charge-shared ML. Simulation results of the dual ML TCAM that connects two sides of the comparison logic to two ML segments for sequential sensing show 43% power savings for a small (4%) trade-off in the search speed. The charge-shared ML scheme achieves power savings by partial recycling of the charge stored in the first ML segment. Chip measurement results show that the charge-shared ML scheme results in 11% and 9% reductions in ML sensing time and energy, respectively, which can be improved to 19-25% by using a digitally controlled charge sharing time-window and a slightly modified MLSA. The static power reduction is achieved by a dual-VDD technique and low-leakage TCAM cells. The dual-VDD technique trades-off the excess noise margin of MLSA for smaller cell leakage by applying a smaller VDD to TCAM cells and a larger VDD to the peripheral circuits. The low-leakage TCAM cells trade off the speed of READ and WRITE operations for smaller cell area and leakage. Finally, design and testing of a complete TCAM chip are presented, and compared with other published designs

    Evaluation of an Associative Memory and FPGA-based System for the Track Trigger of the CMS-Detector

    Get PDF
    Im Jahr 2025 wird die Luminosität des Teilchenstrahls am Large Hadron Collider (LHC), dem größten Teilchenbeschleuniger der Welt mit den höchsten Energien, weiter erhöht. Dadurch werden noch mehr Teilchen gleichzeitig im Zentrum des Compact Muon Solenoid (CMS) Experimentes kollidieren. Um unter diesen neuen Bedingungen verwertbare Daten zu liefern, wird erstmals ein Spurtrigger für CMS entwickelt. Dieser verarbeitet die Daten des äußeren Spurdetektors und liefert die Parameter der Teilchenspuren an die erste Triggerstufe von CMS. Da die technischen Anforderungen an ein solches Spurtriggersystem enorm sind, wurde bisher noch nie ein Spurtrigger auf der ersten Triggerstufe eines Teilchenphysikexperimentes eingesetzt. Die Datenrate am Eingang des CMS-Spurtriggers wird beinahe 100 Tbit/s betragen und die Verarbeitungszeit darf 4 μs nicht überschreiten. Um diese außergewöhnlichen Anforderungen zu erfüllen, ist ein einzigartiges, heterogenes eingebettetes System erforderlich. Diese Dissertation präsentiert eine neu konzeptionierte Simulationsumgebung auf Systemebene für den CMS-Spurtrigger. Die Simulationsumgebung ermöglicht die Evaluation der CMS-Spurtriggerelektronik als Ganzes: von den Modulen mit den Siliziumdetektoren bis zu den Komponenten, welche die Algorithmen zur Spurerkennung ausführen. Die Simulation stellt dem Systementwickler drei Funktionen zur Verfügung: Erstens können Systemeigenschaften wie Latenz, Bandbreite und benötigte Puffergrößen abgeschätzt werden. Zweitens können verschiedene Systemarchitekturen miteinander verglichen werden. Drittens dient die Simulationsumgebung als Testumgebung für Algorithmen und Code, welcher in Field-Programmable Gate Arrays (FPGA) implementiert wird. Um realistische Ergebnisse zu erhalten, werden Daten einer Simulation des CMS-Experimentes als Eingangsdaten der Simulationsumgebung verwendet. Eines der untersuchten Konzepte für den CMS-Spurtrigger besteht aus bis zu 48 großen Baugruppenträgern mit Hunderten von Platinen. Zur Verarbeitung der Daten werden FPGAs und eigens für die Suche von Teilchenspuren entwickelte Assoziativspeicher genutzt. Prototypen einer Platine mit FPGAs und Assoziativspeicher Chips wurden am Karlsruher Institut für Technologie produziert und getestet. Zusätzlich wurde ein essenzieller Teil des CMS-Spurtriggers mithilfe der neuen Simulationsumgebung simuliert. Durch diese Implementierung wurde aufgezeigt, dass es möglich ist, ein solch großes System in der Simulationsumgebung zu simulieren. Innerhalb der Simulation werden viele Elemente des CMS-Spurttriggers vielfach instantiiert. Dabei sind die Elemente oft in regelmäßigen Strukturen wie zum Beispiel zwei- oder dreidimensionalen Rastern angeordnet. Eine SystemC-Bibliothek wurde entwickelt, um das Modellieren und Konfigurieren solcher Strukturen zu vereinfachen. Außerdem wurde eine unabhängige Kostenabschätzung des CMS-Spurtriggers durchgeführt. Diese zeigt, dass die veranschlagten 11,9 Millionen Euro ausreichen, um den auf Assoziativspeicher basierende CMS-Spurtrigger zu bauen. Werden die Werte anhand des Technologiefortschritts auf das Jahr 2022 hochgerechnet, kann sogar mit deutlich niedrigeren Kosten gerechnet werden

    Energy efficient core designs for upcoming process technologies

    Get PDF
    Energy efficiency has been a first order constraint in the design of micro processors for the last decade. As Moore's law sunsets, new technologies are being actively explored to extend the march in increasing the computational power and efficiency. It is essential for computer architects to understand the opportunities and challenges in utilizing the upcoming process technology trends in order to design the most efficient processors. In this work, we consider three process technology trends and propose core designs that are best suited for each of the technologies. The process technologies are expected to be viable over a span of timelines. We first consider the most popular method currently available to improve the energy efficiency, i.e. by lowering the operating voltage. We make key observations regarding the limiting factors in scaling down the operating voltage for general purpose high performance processors. Later, we propose our novel core design, ScalCore, one that can work in high performance mode at nominal Vdd, and in a very energy-efficient mode at low Vdd. The resulting core design can operate at much lower voltages providing higher parallel performance while consuming lower energy. While lowering Vdd improves the energy efficiency, CMOS devices are fundamentally limited in their low voltage operation. Therefore, we next consider an upcoming device technology -- Tunneling Field-Effect Transistors (TFETs), that is expected to supplement CMOS device technology in the near future. TFETs can attain much higher energy efficiency than CMOS at low voltages. However, their performance saturates at high voltages and, therefore, cannot entirely replace CMOS when high performance is needed. Ideally, we desire a core that is as energy-efficient as TFET and provides as much performance as CMOS. To reach this goal, we characterize the TFET device behavior for core design and judiciously integrate TFET units, CMOS units in a single core. The resulting core, called HetCore, can provide very high energy efficiency while limiting the slowdown when compared to a CMOS core. Finally, we analyze Monolithic 3D (M3D) integration technology that is widely considered to be the only way to integrate more transistors on a chip. We present the first analysis of the architectural implications of using M3D for core design and show how to partition the core across different layers. We also address one of the key challenges in realizing the technology, namely, the top layer performance degradation. We propose a critical path based partitioning for logic stages and asymmetric bit/port partitioning for storage stages. The result is a core that performs nearly as well as a core without any top layer slowdown. When compared to a 2D baseline design, an M3D core not only provides much higher performance, it also reduces the energy consumption at the same time. In summary, this thesis addresses one of the fundamental challenges in computer architecture -- overcoming the fact that CMOS is not scaling anymore. As we increase the computing power on a single chip, our ability to power the entire chip keeps decreasing. This thesis proposes three solutions aimed at solving this problem over different timelines. Across all our solutions, we improve energy efficiency without compromising the performance of the core. As a result, we are able to operate twice as many cores with in the same power budget as regular cores, significantly alleviating the problem of dark silicon
    corecore