613 research outputs found
Desynchronization: Synthesis of asynchronous circuits from synchronous specifications
Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur
Evaluating Architectural, Redundancy, and Implementation Strategies for Radiation Hardening of FinFET Integrated Circuits
In this article, authors explore radiation hardening techniques through the design of a test chip implemented in 16-nm FinFET technology, along with architectural and redundancy design space exploration of its modules. Nine variants of matrix multiplication were taped out and irradiated with neutrons. The results obtained from the neutron campaign revealed that the radiation-hardened variants present superior resiliency when either local or global triple modular redundancy (TMR) schemes are employed. Furthermore, simulation-based fault injection was utilized to validate the measurements and to explore the effects of different implementation strategies on failure rates. We further show that the interplay between these different implementation strategies is not trivial to capture and that synthesis optimizations can effectively break assumptions about the effectiveness of redundancy schemes
Fast algorithms for retiming large digital circuits
The increasing complexity of VLSI systems and shrinking time to market requirements demand good optimization tools capable of handling large circuits. Retiming is a powerful transformation that preserves functionality, and can be used to optimize sequential circuits for a wide range of objective functions by judiciously relocating the memory elements. Leiserson and Saxe, who introduced the concept, presented algorithms for period optimization (minperiod retiming) and area optimization (minarea retiming). The ASTRA algorithm proposed an alternative view of retiming using the equivalence between retiming and clock skew optimization;The first part of this thesis defines the relationship between the Leiserson-Saxe and the ASTRA approaches and utilizes it for efficient minarea retiming of large circuits. The new algorithm, Minaret, uses the same linear program formulation as the Leiserson-Saxe approach. The underlying philosophy of the ASTRA approach is incorporated to reduce the number of variables and constraints in this linear program. This allows minarea retiming of circuits with over 56,000 gates in under fifteen minutes;The movement of flip-flops in control logic changes the state encoding of finite state machines, requiring the preservation of initial (reset) states. In the next part of this work the problem of minimizing the number of flip-flops in control logic subject to a specified clock period and with the guarantee of an equivalent initial state, is formulated as a mixed integer linear program. Bounds on the retiming variables are used to guarantee an equivalent initial state in the retimed circuit. These bounds lead to a simple method for calculating an equivalent initial state for the retimed circuit;The transparent nature of level sensitive latches enables level-clocked circuits to operate faster and require less area. However, this transparency makes the operation of level-clocked circuits very complex, and optimization of level-clocked circuits is a difficult task. This thesis also presents efficient algorithms for retiming large level-clocked circuits. The relationship between retiming and clock skew optimization for level-clocked circuits is defined and utilized to develop efficient retiming algorithms for period and area optimization. Using these algorithms a circuit with 56,000 gates could be retimed for minimum period in under twenty seconds and for minimum area in under 1.5 hours
Elastic systems
Elastic systems provide tolerance to the variations in computation and communication delays. The incorporation of elasticity opens new opportunities for optimization using new correct-by-construction transformations that cannot be applied to rigid non-elastic systems. The basics of synchronous and asynchronous elastic systems will be reviewed. A set of behavior-preserving transformations will be presented: retiming, recycling, early evaluation, variable-latency units and speculative execution. The application of these transformations for performance and power optimization will be discussed. Finally, a novel framework for microarchitectural exploration will be introduced, showing that the optimal pipelining of a circuit can be automatically obtained by using the previous transformations.Peer ReviewedPostprint (published version
Logic Design of Neural Networks for High-Throughput and Low-Power Applications
Neural networks (NNs) have been successfully deployed in various fields. In
NNs, a large number of multiplyaccumulate (MAC) operations need to be
performed. Most existing digital hardware platforms rely on parallel MAC units
to accelerate these MAC operations. However, under a given area constraint, the
number of MAC units in such platforms is limited, so MAC units have to be
reused to perform MAC operations in a neural network. Accordingly, the
throughput in generating classification results is not high, which prevents the
application of traditional hardware platforms in extreme-throughput scenarios.
Besides, the power consumption of such platforms is also high, mainly due to
data movement. To overcome this challenge, in this paper, we propose to flatten
and implement all the operations at neurons, e.g., MAC and ReLU, in a neural
network with their corresponding logic circuits. To improve the throughput and
reduce the power consumption of such logic designs, the weight values are
embedded into the MAC units to simplify the logic, which can reduce the delay
of the MAC units and the power consumption incurred by weight movement. The
retiming technique is further used to improve the throughput of the logic
circuits for neural networks. In addition, we propose a hardware-aware training
method to reduce the area of logic designs of neural networks. Experimental
results demonstrate that the proposed logic designs can achieve high throughput
and low power consumption for several high-throughput applications.Comment: accepted by ASPDAC 202
- …