126 research outputs found

    Automated Exploration of the ASIC Design Space for Minimum Power-Delay-Area Product at the Register Transfer Level

    Get PDF
    Exploring the integrated circuit design space for minimum power-delay-area (PDA) product can be time-consuming and tedious, especially when the target standard-cell library has hundreds of options. In this dissertation, heuristic algorithms that automate this process have been developed, implemented and validated at the reg- ister transfer level. In some cases, the PDA product was 1.9 times better than the initial baseline solution. The parallel search algorithm exhibited 9x speed up when executed on 10 machines simultaneously. These two new methods also characterize the design space for the given RTL code by generating power-delay-area points in addition to the minimum PDA point in case the designer wishes to select a different solution that is a tradeoff among these metrics. As a final step, these two search algorithms are integrated into a fully automated ASIC design flow

    3D Printed Magnets for use in an Artificial Muscle Actuator

    Get PDF
    The purpose of this project was to iterate and modify a student-built 3D printer with the goal of manufacturing a radially oriented magnet for use in a novel biomimetic magnetic actuator. The actuator is intended to simulate a sarcomere and to allow for unpowered stability, a low environmental impact, and easy interface with computers and programming. An artificial muscle with an actuator stimulated by a magnet would be favorable over a pneumatic actuating muscle because the bulky air tubes currently used in these muscles could be replaced with smaller, more lightweight batteries and microcontrollers. The ability to 3D print NdFeB magnets will allow for the creation of novel shapes and fields for ultimate design freedom

    Embedding Logic and Non-volatile Devices in CMOS Digital Circuits for Improving Energy Efficiency

    Get PDF
    abstract: Static CMOS logic has remained the dominant design style of digital systems for more than four decades due to its robustness and near zero standby current. Static CMOS logic circuits consist of a network of combinational logic cells and clocked sequential elements, such as latches and flip-flops that are used for sequencing computations over time. The majority of the digital design techniques to reduce power, area, and leakage over the past four decades have focused almost entirely on optimizing the combinational logic. This work explores alternate architectures for the flip-flops for improving the overall circuit performance, power and area. It consists of three main sections. First, is the design of a multi-input configurable flip-flop structure with embedded logic. A conventional D-type flip-flop may be viewed as realizing an identity function, in which the output is simply the value of the input sampled at the clock edge. In contrast, the proposed multi-input flip-flop, named PNAND, can be configured to realize one of a family of Boolean functions called threshold functions. In essence, the PNAND is a circuit implementation of the well-known binary perceptron. Unlike other reconfigurable circuits, a PNAND can be configured by simply changing the assignment of signals to its inputs. Using a standard cell library of such gates, a technology mapping algorithm can be applied to transform a given netlist into one with an optimal mixture of conventional logic gates and threshold gates. This approach was used to fabricate a 32-bit Wallace Tree multiplier and a 32-bit booth multiplier in 65nm LP technology. Simulation and chip measurements show more than 30% improvement in dynamic power and more than 20% reduction in core area. The functional yield of the PNAND reduces with geometry and voltage scaling. The second part of this research investigates the use of two mechanisms to improve the robustness of the PNAND circuit architecture. One is the use of forward and reverse body biases to change the device threshold and the other is the use of RRAM devices for low voltage operation. The third part of this research focused on the design of flip-flops with non-volatile storage. Spin-transfer torque magnetic tunnel junctions (STT-MTJ) are integrated with both conventional D-flipflop and the PNAND circuits to implement non-volatile logic (NVL). These non-volatile storage enhanced flip-flops are able to save the state of system locally when a power interruption occurs. However, manufacturing variations in the STT-MTJs and in the CMOS transistors significantly reduce the yield, leading to an overly pessimistic design and consequently, higher energy consumption. A detailed analysis of the design trade-offs in the driver circuitry for performing backup and restore, and a novel method to design the energy optimal driver for a given yield is presented. Efficient designs of two nonvolatile flip-flop (NVFF) circuits are presented, in which the backup time is determined on a per-chip basis, resulting in minimizing the energy wastage and satisfying the yield constraint. To achieve a yield of 98%, the conventional approach would have to expend nearly 5X more energy than the minimum required, whereas the proposed tunable approach expends only 26% more energy than the minimum. A non-volatile threshold gate architecture NV-TLFF are designed with the same backup and restore circuitry in 65nm technology. The embedded logic in NV-TLFF compensates performance overhead of NVL. This leads to the possibility of zero-overhead non-volatile datapath circuits. An 8-bit multiply-and- accumulate (MAC) unit is designed to demonstrate the performance benefits of the proposed architecture. Based on the results of HSPICE simulations, the MAC circuit with the proposed NV-TLFF cells is shown to consume at least 20% less power and area as compared to the circuit designed with conventional DFFs, without sacrificing any performance.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

    A Partially Randomized Approach to Trajectory Planning and Optimization for Mobile Robots with Flat Dynamics

    Get PDF
    Motion planning problems are characterized by huge search spaces and complex obstacle structures with no concise mathematical expression. The fixed-wing airplane application considered in this thesis adds differential constraints and point-wise bounds, i. e. an infinite number of equality and inequality constraints. An optimal trajectory planning approach is presented, based on the randomized Rapidly-exploring Random Trees framework (RRT*). The local planner relies on differential flatness of the equations of motion to obtain tree branch candidates that automatically satisfy the differential constraints. Flat output trajectories, in this case equivalent to the airplane's flight path, are designed using BĂ©zier curves. Segment feasibility in terms of point-wise inequality constraints is tested by an indicator integral, which is evaluated alongside the segment cost functional. Although the RRT* guarantees optimality in the limit of infinite planning time, it is argued by intuition and experimentation that convergence is not approached at a practically useful rate. Therefore, the randomized planner is augmented by a deterministic variational optimization technique. To this end, the optimal planning task is formulated as a semi-infinite optimization problem, using the intermediate result of the RRT(*) as an initial guess. The proposed optimization algorithm follows the feasible flavor of the primal-dual interior point paradigm. Discretization of functional (infinite) constraints is deferred to the linear subproblems, where it is realized implicitly by numeric quadrature. An inherent numerical ill-conditioning of the method is circumvented by a reduction-like approach, which tracks active constraint locations by introducing new problem variables. Obstacle avoidance is achieved by extending the line search procedure and dynamically adding obstacle-awareness constraints to the problem formulation. Experimental evaluation confirms that the hybrid approach is practically feasible and does indeed outperform RRT*'s built-in optimization mechanism, but the computational burden is still significant.Bewegungsplanungsaufgaben sind typischerweise gekennzeichnet durch umfangreiche SuchrĂ€ume, deren vollstĂ€ndige Exploration nicht praktikabel ist, sowie durch unstrukturierte Hindernisse, fĂŒr die nur selten eine geschlossene mathematische Beschreibung existiert. Bei der in dieser Arbeit betrachteten Anwendung auf FlĂ€chenflugzeuge kommen differentielle Randbedingungen und beschrĂ€nkte SystemgrĂ¶ĂŸen erschwerend hinzu. Der vorgestellte Ansatz zur optimalen Trajektorienplanung basiert auf dem Rapidly-exploring Random Trees-Algorithmus (RRT*), welcher die SuchraumkomplexitĂ€t durch Randomisierung beherrschbar macht. Der spezifische Beitrag ist eine Realisierung des lokalen Planers zur Generierung der Äste des Suchbaums. Dieser erfordert ein flaches Bewegungsmodell, sodass differentielle Randbedingungen automatisch erfĂŒllt sind. Die Trajektorien des flachen Ausgangs, welche im betrachteten Beispiel der Flugbahn entsprechen, werden mittels BĂ©zier-Kurven entworfen. Die Einhaltung der Ungleichungsnebenbedingungen wird durch ein Indikator-Integral ĂŒberprĂŒft, welches sich mit wenig Zusatzaufwand parallel zum Kostenfunktional berechnen lĂ€sst. Zwar konvergiert der RRT*-Algorithmus (im probabilistischen Sinne) zu einer optimalen Lösung, jedoch ist die Konvergenzrate aus praktischer Sicht unbrauchbar langsam. Es ist daher naheliegend, den Planer durch ein gradientenbasiertes lokales Optimierungsverfahren mit besseren Konvergenzeigenschaften zu unterstĂŒtzen. Hierzu wird die aktuelle Zwischenlösung des Planers als InitialschĂ€tzung fĂŒr ein kompatibles semi-infinites Optimierungsproblem verwendet. Der vorgeschlagene Optimierungsalgorithmus erweitert das verbreitete innere-Punkte-Konzept (primal dual interior point method) auf semi-infinite Probleme. Eine explizite Diskretisierung der funktionalen Ungleichungsnebenbedingungen ist nicht erforderlich, denn diese erfolgt implizit durch eine numerische Integralauswertung im Rahmen der linearen Teilprobleme. Da die Methode an Stellen aktiver Nebenbedingungen nicht wohldefiniert ist, kommt zusĂ€tzlich eine Variante des Reduktions-Ansatzes zum Einsatz, bei welcher der Vektor der Optimierungsvariablen um die (endliche) Menge der aktiven Indizes erweitert wird. Weiterhin wurde eine Kollisionsvermeidung integriert, die in den Teilschritt der Liniensuche eingreift und die Problemformulierung dynamisch um Randbedingungen zur lokalen BerĂŒcksichtigung von Hindernissen erweitert. Experimentelle Untersuchungen bestĂ€tigen, dass die Ergebnisse des hybriden Ansatzes aus RRT(*) und numerischem Optimierungsverfahren der klassischen RRT*-basierten Trajektorienoptimierung ĂŒberlegen sind. Der erforderliche Rechenaufwand ist zwar betrĂ€chtlich, aber unter realistischen Bedingungen praktisch beherrschbar

    Fast structured design of VLSI circuits

    Get PDF
    technical reportWe believe that a structured, user-friendly, cost-effective tool for rapid implementation of VLSI circuits which encourages students to participate directly in research projects are the key components in digital integrated circuit (IC) education. In this paper, we introduce our VLSI education activities, with t h e emphasis on t h e presentation of Path Programmable Logic (PPL) design methodology, in addition to a short description of a representative student project. Students using PPL are able to implement MOS or GaAs VLSI circuits with several thousands to over 100,000 transistors in a few weeks. They have designed and built numerous VLSI architectures and computer systems which play an influential role in various research areas. Our educational activities and the Utah Annual Student VLSI Design Contest supported by over a dozen leading American firms have attracted multiple university involvement in recent years

    Optimising and evaluating designs for reconfigurable hardware

    No full text
    Growing demand for computational performance, and the rising cost for chip design and manufacturing make reconfigurable hardware increasingly attractive for digital system implementation. Reconfigurable hardware, such as field-programmable gate arrays (FPGAs), can deliver performance through parallelism while also providing flexibility to enable application builders to reconfigure them. However, reconfigurable systems, particularly those involving run-time reconfiguration, are often developed in an ad-hoc manner. Such an approach usually results in low designer productivity and can lead to inefficient designs. This thesis covers three main achievements that address this situation. The first achievement is a model that captures design parameters of reconfigurable hardware and performance parameters of a given application domain. This model supports optimisations for several design metrics such as performance, area, and power consumption. The second achievement is a technique that enhances the relocatability of bitstreams for reconfigurable devices, taking into account heterogeneous resources. This method increases the flexibility of modules represented by these bitstreams while reducing configuration storage size and design compilation time. The third achievement is a technique to characterise the power consumption of FPGAs in different activity modes. This technique includes the evaluation of standby power and dedicated low-power modes, which are crucial in meeting the requirements for battery-based mobile devices

    Formal Analysis of Arithmetic Circuits using Computer Algebra - Verification, Abstraction and Reverse Engineering

    Get PDF
    Despite a considerable progress in verification and abstraction of random and control logic, advances in formal verification of arithmetic designs have been lagging. This can be attributed mostly to the difficulty in an efficient modeling of arithmetic circuits and datapaths without resorting to computationally expensive Boolean methods, such as Binary Decision Diagrams (BDDs) and Boolean Satisfiability (SAT), that require “bit blasting”, i.e., flattening the design to a bit-level netlist. Approaches that rely on computer algebra and Satisfiability Modulo Theories (SMT) methods are either too abstract to handle the bit-level nature of arithmetic designs or require solving computationally expensive decision or satisfiability problems. The work proposed in this thesis aims at overcoming the limitations of analyzing arithmetic circuits, specifically at the post-synthesized phase. It addresses the verification, abstraction and reverse engineering problems of arithmetic circuits at an algebraic level, treating an arithmetic circuit and its specification as a properly constructed algebraic system. The proposed technique solves these problems by function extraction, i.e., by deriving arithmetic function computed by the circuit from its low-level circuit implementation using computer algebraic rewriting technique. The proposed techniques work on large integer arithmetic circuits and finite field arithmetic circuits, up to 512-bit wide containing millions of logic gates

    Hybrid FPGA: Architecture and Interface

    No full text
    Hybrid FPGAs (Field Programmable Gate Arrays) are composed of general-purpose logic resources with different granularities, together with domain-specific coarse-grained units. This thesis proposes a novel hybrid FPGA architecture with embedded coarse-grained Floating Point Units (FPUs) to improve the floating point capability of FPGAs. Based on the proposed hybrid FPGA architecture, we examine three aspects to optimise the speed and area for domain-specific applications. First, we examine the interface between large coarse-grained embedded blocks (EBs) and fine-grained elements in hybrid FPGAs. The interface includes parameters for varying: (1) aspect ratio of EBs, (2) position of the EBs in the FPGA, (3) I/O pins arrangement of EBs, (4) interconnect flexibility of EBs, and (5) location of additional embedded elements such as memory. Second, we examine the interconnect structure for hybrid FPGAs. We investigate how large and highdensity EBs affect the routing demand for hybrid FPGAs over a set of domain-specific applications. We then propose three routing optimisation methods to meet the additional routing demand introduced by large EBs: (1) identifying the best separation distance between EBs, (2) adding routing switches on EBs to increase routing flexibility, and (3) introducing wider channel width near the edge of EBs. We study and compare the trade-offs in delay, area and routability of these three optimisation methods. Finally, we employ common subgraph extraction to determine the number of floating point adders/subtractors, multipliers and wordblocks in the FPUs. The wordblocks include registers and can implement fixed point operations. We study the area, speed and utilisation trade-offs of the selected FPU subgraphs in a set of floating point benchmark circuits. We develop an optimised coarse-grained FPU, taking into account both architectural and system-level issues. Furthermore, we investigate the trade-offs between granularities and performance by composing small FPUs into a large FPU. The results of this thesis would help design a domain-specific hybrid FPGA to meet user requirements, by optimising for speed, area or a combination of speed and area

    Synaptic rewiring in neuromorphic VLSI for topographic map formation

    Get PDF
    A generalised model of biological topographic map development is presented which combines both weight plasticity and the formation and elimination of synapses (synaptic rewiring) as well as both activity-dependent and -independent processes. The question of whether an activity-dependent process can refine a mapping created by an activity-independent process is investigated using a statistical approach to analysingmapping quality. The model is then implemented in custom mixed-signal VLSI. Novel aspects of this implementation include: (1) a distributed and locally reprogrammable address-event receiver, with which large axonal fan-out does not reduce channel capacity; (2) an analogue current-mode circuit for Euclidean distance calculation which is suitable for operation across multiple chips; (3) slow probabilistic synaptic rewiring driven by (pseudo-)random noise; (4) the application of a very-low-current design technique to improving the stability of weights stored on capacitors; (5) exploiting transistor non-ideality to implement partially weightdependent spike-timing-dependent plasticity; (6) the use of the non-linear capacitance of MOSCAP devices to compensate for other non-linearities. The performance of the chip is characterised and it is shown that the fabricated chips are capable of implementing the model, resulting in biologically relevant behaviours such as activity-dependent reduction of the spatial variance of receptive fields. Complementing a fast synaptic weight change mechanism with a slow synapse rewiring mechanism is suggested as a method of increasing the stability of learned patterns
    • 

    corecore