6,716 research outputs found

    Technology mapping of timed circuits

    Get PDF
    Journal ArticleAbstract This paper presents an automated procedure for the technology mapping of timed circuits to practical gate libraries. Timed circuits are a class of asynchronous circuits that incorporate explicit timing information in the specification which is used throughout the design process to optimize the implementation. Our procedure begins with a timed specification and a delay-annotated gate library description which must include 2-input AND gates, OR gates, and C-elements, but optionally can include higher-fanin gates, AND-OR-INVERT blocks, and generalized C-elements. Our procedure first generates a technology-independent timed circuit netlist composed of possibly high-fanin AND gates, OR gates, and 2-input Celements. The procedure then investigates simultaneous decompositions of all high-fanin gates by adding state variables to the the specfication and performing resyn-thesis. Although multiple decompositions are explored, timing information is utilized to significantly reduce their number. Once all gates are sufficiently decomposed, the netlist can be mapped to the given gate library, taking advantage of any compact complex gates available. The decomposition and resyn-thesis steps have been fully automated within the synthesis tool ATACS and we present results for several examples

    Practical advances in asynchronous design and in asynchronous/synchronous interfaces

    Get PDF
    Journal ArticleAsynchronous systems are being viewed as an increasingly viable alternative to purely synchronous systems. This paper gives an overview of the current state of the art in practical asynchronous circuit and system design in four areas: controllers, datapaths, processors, and the design of asynchronous/synchronous interfaces

    A new look at the conditions for the synthesis of speed-independent circuits

    Get PDF
    This paper presents a set of sufficient conditions for the gate-level synthesis of speed-independent circuits when constrained to a given class of gate library. Existing synthesis methodologies are restricted to architectures that use simple AND-gates, and do not exploit the advantages offered by the existence of complex gates. The use of complex gates increases the speed and reduces the area of the circuits. These improvements are achieved because of (1) the elimination of the distributivity, signal persistency and unique minimal state requirements imposed by other techniques; (2) the reduction in the number of internal signals necessary to guarantee the synthesis; and finally (3) the utilization of optimization techniques to reduce the fan-in of the involved gates and the number of required memory elements.Peer ReviewedPostprint (published version

    Achieving fast and exact hazard-free logic minimization of extended burst-mode gC finite state machines

    Get PDF
    Journal ArticleAbstract This paper presents a new approach to two-level hazard-free logic minimization in the context of extended burst-mode finite state machine synthesis targeting generalized C-elements (gC). No currently available minimizers for literal-exact two-level hazard-free logic minimization of extended burst-mode gC controllers can handle large circuits without synthesis times ranging up over thousands of seconds. Even existing heuristic approaches take too much time when iterative exploration over a large design space is required and do not yield minimum results. The logic minimization approach presented in this paper is based on state graph exploration in conjunction with single-cube cover algorithms, an approach that has not been considered for minimization of extended burst-mode finite state machines previously. Our algorithm achieves very fast logic minimization by introducing compacted state graphs and cover tables and an efficient single-cube cover algorithm for single-output minimization. Our exact logic minimizer finds minimal number of literal solutions to all currently available benchmarks, in less than one second on a 333 MHz microprocessor - more than three orders of magnitude faster than existing literal exact methods, and over an order of magnitude faster than existing heuristic methods for the largest benchmarks. This includes a benchmark that has never been possible to solve exactly in number of literals before

    High-level asynchronous system design using the ACK framework

    Get PDF
    Journal ArticleDesigning asynchronous circuits is becoming easier as a number of design styles are making the transition from research projects to real, usable tools. However, designing asynchronous "systems" is still a difficult problem. We define asynchronous systems to be medium to large digital systems whose descriptions include both datapath and control, that may involve non-trivial interface requirements, and whose control is too large to be synthesized in one large controller. ACK is a framework for designing high performance asynchronous systems of this type. In ACK we advocate an approach that begins with procedural level descriptions of control and datapath and results in a hybrid system that mixes a variety of hardware implementation styles including burst-mode AFSMs, macromodule circuits, and programmable control. We present our views on what makes asynchronous high level system design different from lower level circuit design, motivate our ACK approach, and demonstrate using an example system design

    System-on-chip Computing and Interconnection Architectures for Telecommunications and Signal Processing

    Get PDF
    This dissertation proposes novel architectures and design techniques targeting SoC building blocks for telecommunications and signal processing applications. Hardware implementation of Low-Density Parity-Check decoders is approached at both the algorithmic and the architecture level. Low-Density Parity-Check codes are a promising coding scheme for future communication standards due to their outstanding error correction performance. This work proposes a methodology for analyzing effects of finite precision arithmetic on error correction performance and hardware complexity. The methodology is throughout employed for co-designing the decoder. First, a low-complexity check node based on the P-output decoding principle is designed and characterized on a CMOS standard-cells library. Results demonstrate implementation loss below 0.2 dB down to BER of 10^{-8} and a saving in complexity up to 59% with respect to other works in recent literature. High-throughput and low-latency issues are addressed with modified single-phase decoding schedules. A new "memory-aware" schedule is proposed requiring down to 20% of memory with respect to the traditional two-phase flooding decoding. Additionally, throughput is doubled and logic complexity reduced of 12%. These advantages are traded-off with error correction performance, thus making the solution attractive only for long codes, as those adopted in the DVB-S2 standard. The "layered decoding" principle is extended to those codes not specifically conceived for this technique. Proposed architectures exhibit complexity savings in the order of 40% for both area and power consumption figures, while implementation loss is smaller than 0.05 dB. Most modern communication standards employ Orthogonal Frequency Division Multiplexing as part of their physical layer. The core of OFDM is the Fast Fourier Transform and its inverse in charge of symbols (de)modulation. Requirements on throughput and energy efficiency call for FFT hardware implementation, while ubiquity of FFT suggests the design of parametric, re-configurable and re-usable IP hardware macrocells. In this context, this thesis describes an FFT/IFFT core compiler particularly suited for implementation of OFDM communication systems. The tool employs an accuracy-driven configuration engine which automatically profiles the internal arithmetic and generates a core with minimum operands bit-width and thus minimum circuit complexity. The engine performs a closed-loop optimization over three different internal arithmetic models (fixed-point, block floating-point and convergent block floating-point) using the numerical accuracy budget given by the user as a reference point. The flexibility and re-usability of the proposed macrocell are illustrated through several case studies which encompass all current state-of-the-art OFDM communications standards (WLAN, WMAN, xDSL, DVB-T/H, DAB and UWB). Implementations results are presented for two deep sub-micron standard-cells libraries (65 and 90 nm) and commercially available FPGA devices. Compared with other FFT core compilers, the proposed environment produces macrocells with lower circuit complexity and same system level performance (throughput, transform size and numerical accuracy). The final part of this dissertation focuses on the Network-on-Chip design paradigm whose goal is building scalable communication infrastructures connecting hundreds of core. A low-complexity link architecture for mesochronous on-chip communication is discussed. The link enables skew constraint looseness in the clock tree synthesis, frequency speed-up, power consumption reduction and faster back-end turnarounds. The proposed architecture reaches a maximum clock frequency of 1 GHz on 65 nm low-leakage CMOS standard-cells library. In a complex test case with a full-blown NoC infrastructure, the link overhead is only 3% of chip area and 0.5% of leakage power consumption. Finally, a new methodology, named metacoding, is proposed. Metacoding generates correct-by-construction technology independent RTL codebases for NoC building blocks. The RTL coding phase is abstracted and modeled with an Object Oriented framework, integrated within a commercial tool for IP packaging (Synopsys CoreTools suite). Compared with traditional coding styles based on pre-processor directives, metacoding produces 65% smaller codebases and reduces the configurations to verify up to three orders of magnitude

    Computer aided design

    Get PDF
    technical reportThe report is based on the proposal submitted to the National Science Foundation in September 1981, as part of the Coordinated Experimental Computer Science Research Program. The sections covering the budget and biographical data on the senior research personnel have not been included. Also, the section describing the department facilities at the time of the proposal submission is not included, because it would be only of historical interest

    Average-case optimized technology mapping of one-hot domino circuits*

    Get PDF
    Journal ArticleThis paper presents a technology mapping technique for optimizing the average-case delay of asynchronous combinational circuits implemented using domino logic and one-hot encoded outputs. The technique minimizes the critical path for common input patterns at the possible expense of making less common critical paths longer. To demonstrate the application of this technique, we present a case study of a combinational length decoding block, an integral component of an Asynchronous Instruction Length Decoder (AILD) which can be used in PentiumR processors. The experimental results demonstrate that the average-case delay of our mapped circuits can be dramatically lower than the worst-case delay of the circuits obtained using conventional worst-case mapping techniques

    An integrated hardware/software design methodology for signal processing systems

    Get PDF
    This paper presents a new methodology for design and implementation of signal processing systems on system-on-chip (SoC) platforms. The methodology is centered on the use of lightweight application programming interfaces for applying principles of dataflow design at different layers of abstraction. The development processes integrated in our approach are software implementation, hardware implementation, hardware-software co-design, and optimized application mapping. The proposed methodology facilitates development and integration of signal processing hardware and software modules that involve heterogeneous programming languages and platforms. As a demonstration of the proposed design framework, we present a dataflow-based deep neural network (DNN) implementation for vehicle classification that is streamlined for real-time operation on embedded SoC devices. Using the proposed methodology, we apply and integrate a variety of dataflow graph optimizations that are important for efficient mapping of the DNN system into a resource constrained implementation that involves cooperating multicore CPUs and field-programmable gate array subsystems. Through experiments, we demonstrate the flexibility and effectiveness with which different design transformations can be applied and integrated across multiple scales of the targeted computing system
    • …
    corecore