78 research outputs found

    Behavior synthesis for high speed 3D color interpolation using VHDL

    Get PDF
    The purpose of this thesis is to study the methodology of behavioral synthesis and evaluate its usefulness compared to Register Transfer Level (RTL) synthesis. Custom IC design uses high-powered synthesis tools. Engineers have traditionally used RTL level descriptions of their circuits as input to these synthesis tools. As new Behavioral Synthesis tools are becoming more powerful, the option to describe their circuitry in a higher and more abstract level is becoming a more feasible option. Describing circuitry at a higher level has many advantages. It is easier to make architecture changes and higher level descriptions generally have significantly less lines of code and faster development times. To study behavioral synthesis a tri-linear interpolation algorithm is used. An RTL style and two different behavioral styles are used. Each are compared for area, power consumption, synthesis time, code length and throughput. The design is simulated before and after synthesis to verify the accuracy of the design using VHDL. Behavioral Compiler from Synopsys will be used to synthesize the design from VHDL to the gate level. It was found that behavioral synthesis can produce results nearly as good as an RTL described circuit. The results were generally 20% - 30% worse for this implementation using behavioral synthesis

    Desynchronization: Synthesis of asynchronous circuits from synchronous specifications

    Get PDF
    Asynchronous implementation techniques, which measure logic delays at run time and activate registers accordingly, are inherently more robust than their synchronous counterparts, which estimate worst-case delays at design time, and constrain the clock cycle accordingly. De-synchronization is a new paradigm to automate the design of asynchronous circuits from synchronous specifications, thus permitting widespread adoption of asynchronicity, without requiring special design skills or tools. In this paper, we first of all study different protocols for de-synchronization and formally prove their correctness, using techniques originally developed for distributed deployment of synchronous language specifications. We also provide a taxonomy of existing protocols for asynchronous latch controllers, covering in particular the four-phase handshake protocols devised in the literature for micro-pipelines. We then propose a new controller which exhibits provably maximal concurrency, and analyze the performance of desynchronized circuits with respect to the original synchronous optimized implementation. We finally prove the feasibility and effectiveness of our approach, by showing its application to a set of real designs, including a complete implementation of the DLX microprocessor architectur

    Fast algorithms for retiming large digital circuits

    Get PDF
    The increasing complexity of VLSI systems and shrinking time to market requirements demand good optimization tools capable of handling large circuits. Retiming is a powerful transformation that preserves functionality, and can be used to optimize sequential circuits for a wide range of objective functions by judiciously relocating the memory elements. Leiserson and Saxe, who introduced the concept, presented algorithms for period optimization (minperiod retiming) and area optimization (minarea retiming). The ASTRA algorithm proposed an alternative view of retiming using the equivalence between retiming and clock skew optimization;The first part of this thesis defines the relationship between the Leiserson-Saxe and the ASTRA approaches and utilizes it for efficient minarea retiming of large circuits. The new algorithm, Minaret, uses the same linear program formulation as the Leiserson-Saxe approach. The underlying philosophy of the ASTRA approach is incorporated to reduce the number of variables and constraints in this linear program. This allows minarea retiming of circuits with over 56,000 gates in under fifteen minutes;The movement of flip-flops in control logic changes the state encoding of finite state machines, requiring the preservation of initial (reset) states. In the next part of this work the problem of minimizing the number of flip-flops in control logic subject to a specified clock period and with the guarantee of an equivalent initial state, is formulated as a mixed integer linear program. Bounds on the retiming variables are used to guarantee an equivalent initial state in the retimed circuit. These bounds lead to a simple method for calculating an equivalent initial state for the retimed circuit;The transparent nature of level sensitive latches enables level-clocked circuits to operate faster and require less area. However, this transparency makes the operation of level-clocked circuits very complex, and optimization of level-clocked circuits is a difficult task. This thesis also presents efficient algorithms for retiming large level-clocked circuits. The relationship between retiming and clock skew optimization for level-clocked circuits is defined and utilized to develop efficient retiming algorithms for period and area optimization. Using these algorithms a circuit with 56,000 gates could be retimed for minimum period in under twenty seconds and for minimum area in under 1.5 hours

    Fine Grain Incremental Rescheduling via Architectural Retiming".

    Get PDF
    Abstract With the decreasing feature sizes during VLSI fabrication and the dominance of interconnect delay over that of gates, control logic and wiring no longer have a negligible impact on delay and area. The need thus arises for developing techniques and tools to redesign incrementally to eliminate performance b ottlenecks. Such a redesign e ort corresponds to incrementally modifying an existing schedule obtained via high-level synthesis. In this paper we demonstrate that applying architectural retiming, a technique for pipelining latencyconstrained circuits, results in incrementally modifying an existing schedule. Architectural retiming reschedules ne grain operations ones that have a delay equal to or less than one clock cycle to occur in earlier time steps, while modifying the design to preserve its correctness

    Retiming and Resynthesis with Sweep Are Complete for Sequential Transformation

    Get PDF
    Abstract-There is a long history of investigations and debates on whether a sequence of retiming and resynthesis is complete for all sequential transformations (on steady states). It has been shown that the sweep operation, which adds or removes registers not used by any output, is necessary for some sequential transformations. However, it is an open question whether retiming and resynthesis with sweep are complete. This paper proves that the operations are complete, but with one caveat: at least one resynthesis operation needs to look through the register boundary into the logic of previous cycle. We showed that this one-cycle reachability is required for retiming and resynthesis to be complete for re-encodings with different code length. This requirement comes from the fact that Boolean circuit is used for a discrete function thus its range needs to be computed by a traversal of the circuit. In theory, five operations in the order of sweep, resynthesis, retiming, resynthesis, and sweep are already complete. However, some practical limitations on resynthesis must be considered. The complexity of retiming and resynthesis verification is also discussed

    Advanced Timing and Synchronization Methodologies for Digital VLSI Integrated Circuits

    Get PDF
    This dissertation addresses timing and synchronization methodologies that are critical to the design, analysis and optimization of high-performance, integrated digital VLSI systems. As process sizes shrink and design complexities increase, achieving timing closure for digital VLSI circuits becomes a significant bottleneck in the integrated circuit design flow. Circuit designers are motivated to investigate and employ alternative methods to satisfy the timing and physical design performance targets. Such novel methods for the timing and synchronization of complex circuitry are developed in this dissertation and analyzed for performance and applicability.Mainstream integrated circuit design flow is normally tuned for zero clock skew, edge-triggered circuit design. Non-zero clock skew or multi-phase clock synchronization is seldom used because the lack of design automation tools increases the length and cost of the design cycle. For similar reasons, level-sensitive registers have not become an industry standard despite their superior size, speed and power consumption characteristics compared to conventional edge-triggered flip-flops.In this dissertation, novel design and analysis techniques that fully automate the design and analysis of non-zero clock skew circuits are presented. Clock skew scheduling of both edge-triggered and level-sensitive circuits are investigated in order to exploit maximum circuit performances. The effects of multi-phase clocking on non-zero clock skew, level-sensitive circuits are investigated leading to advanced synchronization methodologies. Improvements in the scalability of the computational timing analysis process with clock skew scheduling are explored through partitioning and parallelization.The integration of the proposed design and analysis methods to the physical design flow of integrated circuits synchronized with a next-generation clocking technology-resonant rotary clocking technology-is also presented. Based on the design and analysis methods presented in this dissertation, a computer-aided design tool for the design of rotary clock synchronized integrated circuits is developed

    Parameterized Implementation of K-means Clustering on Reconfigurable Systems

    Get PDF
    Processing power of pattern classification algorithms on conventional platforms has not been able to keep up with exponentially growing datasets. However, algorithms such as k-means clustering include significant potential parallelism that could be exploited to enhance processing speed on conventional platforms. A better and effective solution to speed-up the algorithm performance is the use of a hardware assist since parallel kernels can be partitioned and concurrently run on hardware as opposed to the sequential software flow. A parameterized hardware implementation of k-means clustering is presented as a proof of concept on the Pilchard Reconfigurable computing system. The hardware implementation is shown to have speedups of about 500 over conventional implementations on a general-purpose processor. A scalability analysis is done to provide a future direction to take the current implementation of 3 classes and scale it to over N classes

    Active hardware metering for intellectual property protection and security

    Get PDF
    Abstract We introduce the first active hardware metering scheme that aims to protect integrated circuits (IC) intellectual property (IP) against piracy and runtime tampering. The novel metering method simultaneously employs inherent unclonable variability in modern manufacturing technology, and functionality preserving alternations of the structural IC specifications. Active metering works by enabling the designers to lock each IC and to remotely disable it. The objectives are realized by adding new states and transitions to the original finite state machine (FSM) to create boosted finite state machines(BFSM) of the pertinent design. A unique and unpredictable ID generated by an IC is utilized to place an BFSM into the power-up state upon activation. The designer, knowing the transition table, is the only one who can generate input sequences required to bring the BFSM into the functional initial (reset) state. To facilitate remote disabling of ICs, black hole states are integrated within the BFSM. We introduce nine types of potential attacks against the proposed active metering method. We further describe a number of countermeasures that must be taken to preserve the security of active metering against the potential attacks. The implementation details of the method with the objectives of being low-overhead, unclonable, obfuscated, stable, while having a diverse set of keys is presented. The active metering method was implemented, synthesized and mapped on the standard benchmark circuits. Experimental evaluations illustrate that the method has a low-overhead in terms of power, delay, and area, while it is extremely resilient against the considered attacks

    Energy Efficient Hardware Design for Securing the Internet-of-Things

    Full text link
    The Internet of Things (IoT) is a rapidly growing field that holds potential to transform our everyday lives by placing tiny devices and sensors everywhere. The ubiquity and scale of IoT devices require them to be extremely energy efficient. Given the physical exposure to malicious agents, security is a critical challenge within the constrained resources. This dissertation presents energy-efficient hardware designs for IoT security. First, this dissertation presents a lightweight Advanced Encryption Standard (AES) accelerator design. By analyzing the algorithm, a novel method to manipulate two internal steps to eliminate storage registers and replace flip-flops with latches to save area is discovered. The proposed AES accelerator achieves state-of-art area and energy efficiency. Second, the inflexibility and high Non-Recurring Engineering (NRE) costs of Application-Specific-Integrated-Circuits (ASICs) motivate a more flexible solution. This dissertation presents a reconfigurable cryptographic processor, called Recryptor, which achieves performance and energy improvements for a wide range of security algorithms across public key/secret key cryptography and hash functions. The proposed design employs circuit techniques in-memory and near-memory computing and is more resilient to power analysis attack. In addition, a simulator for in-memory computation is proposed. It is of high cost to design and evaluate new-architecture like in-memory computing in Register-transfer level (RTL). A C-based simulator is designed to enable fast design space exploration and large workload simulations. Elliptic curve arithmetic and Galois counter mode are evaluated in this work. Lastly, an error resilient register circuit, called iRazor, is designed to tolerate unpredictable variations in manufacturing process operating temperature and voltage of VLSI systems. When integrated into an ARM processor, this adaptive approach outperforms competing industrial techniques such as frequency binning and canary circuits in performance and energy.PHDElectrical EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147546/1/zhyiqun_1.pd
    • …
    corecore