73 research outputs found
Technology Mapping for Circuit Optimization Using Content-Addressable Memory
The growing complexity of Field Programmable Gate Arrays (FPGA's) is leading to architectures with high input cardinality look-up tables (LUT's). This thesis describes a methodology for area-minimizing technology mapping for combinational logic, specifically designed for such FPGA architectures. This methodology, called LURU, leverages the parallel search capabilities of Content-Addressable Memories (CAM's) to outperform traditional mapping algorithms in both execution time and quality of results. The LURU algorithm is fundamentally different from other techniques for technology mapping in that LURU uses textual string representations of circuit topology in order to efficiently store and search for circuit patterns in a CAM. A circuit is mapped to the target LUT technology using both exact and inexact string matching techniques. Common subcircuit expressions (CSE's) are also identified and used for architectural optimization---a small set of CSE's is shown to effectively cover an average of 96% of the test circuits. LURU was tested with the ISCAS'85 suite of combinational benchmark circuits and compared with the mapping algorithms FlowMap and CutMap. The area reduction shown by LURU is, on average, 20% better compared to FlowMap and CutMap. The asymptotic runtime complexity of LURU is shown to be better than that of both FlowMap and CutMap
Energy-Efficient Digital Circuit Design using Threshold Logic Gates
abstract: Improving energy efficiency has always been the prime objective of the custom and automated digital circuit design techniques. As a result, a multitude of methods to reduce power without sacrificing performance have been proposed. However, as the field of design automation has matured over the last few decades, there have been no new automated design techniques, that can provide considerable improvements in circuit power, leakage and area. Although emerging nano-devices are expected to replace the existing MOSFET devices, they are far from being as mature as semiconductor devices and their full potential and promises are many years away from being practical.
The research described in this dissertation consists of four main parts. First is a new circuit architecture of a differential threshold logic flipflop called PNAND. The PNAND gate is an edge-triggered multi-input sequential cell whose next state function is a threshold function of its inputs. Second a new approach, called hybridization, that replaces flipflops and parts of their logic cones with PNAND cells is described. The resulting \hybrid circuit, which consists of conventional logic cells and PNANDs, is shown to have significantly less power consumption, smaller area, less standby power and less power variation.
Third, a new architecture of a field programmable array, called field programmable threshold logic array (FPTLA), in which the standard lookup table (LUT) is replaced by a PNAND is described. The FPTLA is shown to have as much as 50% lower energy-delay product compared to conventional FPGA using well known FPGA modeling tool called VPR.
Fourth, a novel clock skewing technique that makes use of the completion detection feature of the differential mode flipflops is described. This clock skewing method improves the area and power of the ASIC circuits by increasing slack on timing paths. An additional advantage of this method is the elimination of hold time violation on given short paths.
Several circuit design methodologies such as retiming and asynchronous circuit design can use the proposed threshold logic gate effectively. Therefore, the use of threshold logic flipflops in conventional design methodologies opens new avenues of research towards more energy-efficient circuits.Dissertation/ThesisDoctoral Dissertation Computer Science 201
Recommended from our members
A system for microarchitecture and logic optimization
This thesis spans two levels of the design process by examining optimization at both the register-transfer level and at the logic level. More specifically, this thesis addresses the following two problems: 1) performing logic synthesis for custom layout rather than the traditional approach that focuses on synthesis for standard cells, and 2) performing optimization for custom layout from register-transfer level netlists. Thus optimization is performed on the microarchitecture design and at a lower level for individual microarchitecture components.First, techniques are introduced for generating gate-level netlists that take advantage of custom layout capabilities. Such techniques include limiting serial/parallel transistor chains, transistor sizes, and capacitive loads in forming complex gates. These considerations have not been incorporated in previous logic synthesis systems.Second, techniques are introduced for improving the microarchitecture structure and using estimates from lower-level optimization tools to guide microarchitecture design optimizations that attempt to meet user specified area and time constraints. These techniques include the capability for mixing layout styles such as custom layout for random-logic components and bit-slicing for regularly structured components. In this manner the entire design, control logic and datapath, can be optimized at the same time. Further, this paper presents a new methodology for microarchitecture-level optimization that greatly reduces the amount of technology-specific knowledge necessary to perform the optimizations
Voltaje de referencia BandGap y módulo de comunicación serial para SAR ADC 10 bits de baja potencia para aplicaciones biomédicas
The document presents two designs a BandGap Reference Voltage, and a Communication Serial Module for a 10 bits SAR ADC for low-power applications. Designs were implemented using TSMC 0.18 µm CMOS technology with 1.8 V supply voltage. The BandGap Reference Voltage was designed to provide a reference voltage of 900 mV ±500 µV. The bandgap was tested at simulation level under different temperature conditions to ensure constant output in a temperature range from –40 °C to 85 °C. The Communication Serial Module is designed using the hardware description language Verilog. This module receives the 10 bits parallel output of the SAR ADC and retransmits the conversion result into a serial format using the SPI format. The Communication Serial Module was tested under a simulator, where multiple test cases were applied to stimulate in different ways the module. Both circuits were designed to accomplish the SAR ADC requirements in which BandGap supplies the reference voltage to the capacitor array in the SAR ADC and the Serial Module sends the data values after the conversion is finalized.ITESO, A. C
Recommended from our members
On Multicast in Asynchronous Networks-on-Chip: Techniques, Architectures, and FPGA Implementation
In this era of exascale computing, conventional synchronous design techniques are facing unprecedented challenges. The consumer electronics market is replete with many-core systems in the range of 16 cores to thousands of cores on chip, integrating multi-billion transistors. However, with this ever increasing complexity, the traditional design approaches are facing key issues such as increasing chip power, process variability, aging, thermal problems, and scalability. An alternative paradigm that has gained significant interest in the last decade is asynchronous design. Asynchronous designs have several potential advantages: they are naturally energy proportional, burning power only when active, do not require complex clock distribution, are robust to different forms of variability, and provide ease of composability for heterogeneous platforms. Networks-on-chip (NoCs) is an interconnect paradigm that has been introduced to deal with the ever-increasing system complexity. NoCs provide a distributed, scalable, and efficient interconnect solution for today’s many-core systems. Moreover, NoCs are a natural match with asynchronous design techniques, as they separate communication infrastructure and timing from the computational elements. To this end, globally-asynchronous locally-synchronous (GALS) systems that interconnect multiple processing cores, operating at different clock speeds, using an asynchronous NoC, have gained significant interest. While asynchronous NoCs have several advantages, they also face a key challenge of supporting new types of traffic patterns. Once such pattern is multicast communication, where a source sends packets to arbitrary number of destinations. Multicast is not only common in parallel computing, such as for cache coherency, but also for emerging areas such as neuromorphic computing. This important capability has been largely missing from asynchronous NoCs. This thesis introduces several efficient multicast solutions for these interconnects. In particular, techniques, and network architectures are introduced to support high-performance and low-power multicast. Two leading network topologies are the focus: a variant mesh-of-trees (MoT) and a 2D mesh. In addition, for a more realistic implementation and analysis, as well as significantly advancing the field of asynchronous NoCs, this thesis also targets synthesis of these NoCs on commercial FPGAs. While there has been significant advances in FPGA technologies, there has been only limited research on implementing asynchronous NoCs on FPGAs. To this end, a systematic computeraided design (CAD) methodology has been introduced to efficiently and safely map asynchronous NoCs on FPGAs. Overall, this thesis makes the following three contributions. The first contribution is a multicast solution for a variant MoT network topology. This topology consists of simple low-radix switches, and has been used in high-performance computing platforms. A novel local speculation technique is introduced, where a subset of the network’s switches are speculative that always broadcast every packet. These switches are very simple and have high performance. Speculative switches are surrounded by non-speculative ones that route packets based on their destinations and also throttle any redundant copies created by the former. This hybrid network architecture achieved significant performance and power benefits over other multicast approaches. The second contribution is a multicast solution for a 2D-mesh topology, which is more complex with higher-radix switches and also is more commonly used. A novel continuous-time replication strategy is introduced to optimize the critical multi-way forking operation of a multicast transmission. In this technique, a multicast packet is first stored in an input port of a switch, from where it is sent through distinct output ports towards different destinations concurrently, at each output’s own rate and in continuous time. This strategy is shown to have significant latency and energy benefits over an approach that performs multicast using multiple distinct serial unicasts to each destination. Finally, a systematic CAD methodology is introduced to synthesize asynchronous NoCs on commercial FPGAs. A two-fold goal is targeted: correctness and high performance. For ease of implementation, only existing FPGA synthesis tools are used. Moreover, since asynchronous NoCs involve special asynchronous components, a comprehensive guide is introduced to map these elements correctly and efficiently. Two asynchronous NoC switches are synthesized using the proposed approach on a leading Xilinx FPGA in 28 nm: one that only handles unicast, and the other that also supports multicast. Both showed significant energy benefits with some performance gains over a state-of-the-art synchronous switch
Digital Serializer Design for a SerDes Chip in 130nm CMOS Technology
The development of this project is derived from the effort of previous generations from the System on Chip Design Specialty Program at ITESO, who have pioneered the creation of a serializer-deserializer device for high-speed communications in CMOS technology, aiming towards a small and efficient device. The design flow and enhancements implemented within the digital serializer module of the SerDes system, consists of an 8b10b encoder followed by a parallel to serial converter that together reaches a maximum frequency of 239 MHz in a typical cmrf8sf (130 nm) technology manufacturing process, implemented with Cadence tools. The rtl and testbench were taken from the work of Efrain Arrambide, adding a register to store the current disparity value, and thus, enhance the code by adding primitive blocks to improve the behavior of the serializer module and the validation process, generating a summary for every run. The system on chip flow is followed by choosing the variables that best fit the design and a layout with no design violations is generated during the physical synthesis. The individual module layouts were completed successfully in terms of behavior and violations, while the integration of the mixed signal device showed errors that were not resolved in time for manufacturing.El desarrollo de este proyecto parte del trabajo realizado por las generaciones anteriores de la especialidad de diseño de circuitos integrados del ITESO, quienes fueron pioneros en la creación de un dispositivo para comunicaciones de alta velocidad en tecnología CMOS, con el objetivo de obtener un producto final pequeño y eficiente. El flujo de diseño y mejoras implementadas al módulo serializador digital del sistema SerDes, el cual consiste en un codificador 8b10b seguido de un convertidor de datos de paralelo a serial, alcanza una frecuencia máxima de 239 MHz al ser fabricado y operado en condiciones típicas con la tecnología cmrf8sf (130 nm), además de ser implementado con las herramientas proveídas por Cadence. El código de descripción de hardware y banco de pruebas fueron tomados originalmente de los entregados por Efrain Arrambide, a lo que se le agregó un registro para almacenar el valor de la disparidad del dato enviado, así como la adición de bloques básicos para mejorar el comportamiento y se simplificó el código Verilog. El proceso de validación fue mejorado de tal manera que se prueban bloques por separado y cada iteración genera un registro de transacciones y un resumen al final con los resultados de manera automática para cada iteración. El flujo del diseño de sistemas en chip fue seguido por completo, eligiendo las variables que mejor se adaptan a la respuesta y especificaciones del sistema, así como buscar que genere ninguna violación en el diseño físico. Los distintos bloques del sistema serializador-deserializador fueron diseñados y verificados con éxito, sin embargo, la integración del sistema de señal mixta no fue completada debido a errores que no se lograron resolver a tiempo para cumplir con la fecha de fabricación.ITESO, A. C.Consejo Nacional de Ciencia y Tecnologí
- …