189 research outputs found

    Advances in Functional Decomposition: Theory and Applications

    Get PDF
    Functional decomposition aims at finding efficient representations for Boolean functions. It is used in many applications, including multi-level logic synthesis, formal verification, and testing. This dissertation presents novel heuristic algorithms for functional decomposition. These algorithms take advantage of suitable representations of the Boolean functions in order to be efficient. The first two algorithms compute simple-disjoint and disjoint-support decompositions. They are based on representing the target function by a Reduced Ordered Binary Decision Diagram (BDD). Unlike other BDD-based algorithms, the presented ones can deal with larger target functions and produce more decompositions without requiring expensive manipulations of the representation, particularly BDD reordering. The third algorithm also finds disjoint-support decompositions, but it is based on a technique which integrates circuit graph analysis and BDD-based decomposition. The combination of the two approaches results in an algorithm which is more robust than a purely BDD-based one, and that improves both the quality of the results and the running time. The fourth algorithm uses circuit graph analysis to obtain non-disjoint decompositions. We show that the problem of computing non-disjoint decompositions can be reduced to the problem of computing multiple-vertex dominators. We also prove that multiple-vertex dominators can be found in polynomial time. This result is important because there is no known polynomial time algorithm for computing all non-disjoint decompositions of a Boolean function. The fifth algorithm provides an efficient means to decompose a function at the circuit graph level, by using information derived from a BDD representation. This is done without the expensive circuit re-synthesis normally associated with BDD-based decomposition approaches. Finally we present two publications that resulted from the many detours we have taken along the winding path of our research

    Subject Index

    Get PDF

    Synthesis and Verification of Digital Circuits using Functional Simulation and Boolean Satisfiability.

    Full text link
    The semiconductor industry has long relied on the steady trend of transistor scaling, that is, the shrinking of the dimensions of silicon transistor devices, as a way to improve the cost and performance of electronic devices. However, several design challenges have emerged as transistors have become smaller. For instance, wires are not scaling as fast as transistors, and delay associated with wires is becoming more significant. Moreover, in the design flow for integrated circuits, accurate modeling of wire-related delay is available only toward the end of the design process, when the physical placement of logic units is known. Consequently, one can only know whether timing performance objectives are satisfied, i.e., if timing closure is achieved, after several design optimizations. Unless timing closure is achieved, time-consuming design-flow iterations are required. Given the challenges arising from increasingly complex designs, failing to quickly achieve timing closure threatens the ability of designers to produce high-performance chips that can match continually growing consumer demands. In this dissertation, we introduce powerful constraint-guided synthesis optimizations that take into account upcoming timing closure challenges and eliminate expensive design iterations. In particular, we use logic simulation to approximate the behavior of increasingly complex designs leveraging a recently proposed concept, called bit signatures, which allows us to represent a large fraction of a complex circuit's behavior in a compact data structure. By manipulating these signatures, we can efficiently discover a greater set of valid logic transformations than was previously possible and, as a result, enhance timing optimization. Based on the abstractions enabled through signatures, we propose a comprehensive suite of novel techniques: (1) a fast computation of circuit don't-cares that increases restructuring opportunities, (2) a verification methodology to prove the correctness of speculative optimizations that efficiently utilizes the computational power of modern multi-core systems, and (3) a physical synthesis strategy using signatures that re-implements sections of a critical path while minimizing perturbations to the existing placement. Our results indicate that logic simulation is effective in approximating the behavior of complex designs and enables a broader family of optimizations than previous synthesis approaches.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/61793/1/splaza_1.pd

    Pipelined Asynchronous High Level Synthesis for General Programs

    Get PDF
    High-level synthesis (HLS) translates algorithms from software programming language into hardware. We use the dataflow HLS methodology to translate programs into asynchronous circuits by implementing programs using asynchronous dataflow elements as hardware building blocks. We extend the prior work in dataflow synthesis in the following aspects:i) we propose Fluid to synthesize pipelined dataflow circuits for real-world programs with complex control flows, which are not supported in the previous work; ii) we propose PipeLink to permit pipelined access to shared resources in the dataflow circuit. Dataflow circuit results in distributed control and an implicitly pipelined implementation. However, resource sharing in the presence of pipelining is challenging in this context due to the absence of a global scheduler. Traditional solutions to this problem impose restrictions on pipelining to guarantee mutually exclusive access to the shared resource, but PipeLink removes such restrictions and can generate pipelined asynchronous dataflow circuits for shared function calls, pipelined memory accesses and function pointers; iii) we apply several dataflow optimizations to improve the quality of the synthesized dataflow circuits; iv) we implement our system (Fluid + PipeLink) on the LLVM compiler framework, which allows us to take advantage of the optimization efforts from the compiler community; v) we compare our system with a widely-used academic HLS tool and two commercial HLS tools. Compared to commercial (academic) HLS tools, our system results in 12X (20X) reduction in energy, 1.29X (1.64X) improvement in throughput, 1.27X (1.61X) improvement in latency at a cost of 2.4X (1.61X) increase in the area

    Data Aggregation Scheduling in Wireless Networks

    Get PDF
    Data aggregation is one of the most essential data gathering operations in wireless networks. It is an efficient strategy to alleviate energy consumption and reduce medium access contention. In this dissertation, the data aggregation scheduling problem in different wireless networks is investigated. Since Wireless Sensor Networks (WSNs) are one of the most important types of wireless networks and data aggregation plays a vital role in WSNs, the minimum latency data aggregation scheduling problem for multi-regional queries in WSNs is first studied. A scheduling algorithm is proposed with comprehensive theoretical and simulation analysis regarding time efficiency. Second, with the increasing popularity of Cognitive Radio Networks (CRNs), data aggregation scheduling in CRNs is studied. Considering the precious spectrum opportunity in CRNs, a routing hierarchy, which allows a secondary user to seek a transmission opportunity among a group of receivers, is introduced. Several scheduling algorithms are proposed for both the Unit Disk Graph (UDG) interference model and the Physical Interference Model (PhIM), followed by performance evaluation through simulations. Third, the data aggregation scheduling problem in wireless networks with cognitive radio capability is investigated. Under the defined network model, besides a default working spectrum, users can access extra available spectrum through a cognitive radio. The problem is formalized as an Integer Linear Programming (ILP) problem and solved through an optimization method in the beginning. The simulation results show that the ILP based method has a good performance. However, it is difficult to evaluate the solution theoretically. A heuristic scheduling algorithm with guaranteed latency bound is presented in our further investigation. Finally, we investigate how to make use of cognitive radio capability to accelerate data aggregation in probabilistic wireless networks with lossy links. A two-phase scheduling algorithm is proposed, and the effectiveness of the algorithm is verified through both theoretical analysis and numerical simulations

    Loop transformations for clustered VLIW architectures

    Get PDF
    With increasing demands for performance by embedded systems, especially by digital signal processing (DSP) applications, embedded processors must increase available instructionlevel parallelism (ILP) within significant constraints on power consumption and chip cost. Unfortunately, supporting a large amount of ILP on a processor while maintaining a single register file increases chip cost and potentially decreases overall performance due to increased cycle time. To address this problem, some modern embedded processors partition the register file into multiple low-ported register files, each directly connected with one or more functional units. These functional unit/register file groups are called clusters. Clustered VLIW (very long instruction word) architectures need extra copy operations or delays to transfer values among clusters. To take advantage of clustered architectures, the compiler must expose parallelism for maximal functional-unit utilization, and schedule instructions to reduce intercluster communication overhead. High-level loop transformations offer an excellent opportunity to enhance the abilities of low-level optimizers to generate code for clustered architectures. This dissertation investigates the effects of three loop transformations, i.e., loop fusion, loop unrolling, and unroll-and-jam, on clustered VLIW architectures. The objective is to achieve high performance with low communication overhead. This dissertation discusses the following techniques: Loop Fusion This research examines the impact of loop fusion on clustered architectures. A metric based upon communication costs for guiding loop fusion is developed and tested on DSP benchmarks. Unroll-and-jam and Loop Unrolling A new method that integrates a communication cost model with an integer-optimization problem is developed to determine unroll amounts for loop unrolling and unroll-and-jam automatically for a specific loop on a specific architecture. These techniques have been implemented and tested using DSP benchmarks on simulated, clustered VLIW architectures and a real clustered, embedded processor, the TI TMS320C64X. The results show that the new techniques achieve an average speedup of 1.72-1.89 on five different clustered architectures. These techniques have been implemented and tested using DSP benchmarks on simulated, clustered VLIW architectures and a real clustered, embedded processor, the TI TMS320C64X. The results show that the new techniques achieve an average speedup of 1.72-1.89 on five different clustered architectures

    Efficient alternative wiring techniques and applications.

    Get PDF
    Sze, Chin Ngai.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 80-84) and index.Abstracts in English and Chinese.Abstract --- p.iAcknowledgments --- p.iiiCurriculum Vitae --- p.ivList of Figures --- p.ixList of Tables --- p.xiiChapter 1 --- Introduction --- p.1Chapter 1.1 --- Motivation and Aims --- p.1Chapter 1.2 --- Contribution --- p.8Chapter 1.3 --- Organization of Dissertation --- p.10Chapter 2 --- Definitions and Notations --- p.11Chapter 3 --- Literature Review --- p.15Chapter 3.1 --- Logic Reconstruction --- p.15Chapter 3.1.1 --- SIS: A System for Sequential and Combinational Logic Synthesis --- p.16Chapter 3.2 --- ATPG-based Alternative Wiring --- p.17Chapter 3.2.1 --- Redundancy Addition and Removal for Logic Optimization --- p.18Chapter 3.2.2 --- Perturb and Simplify Logic Optimization --- p.18Chapter 3.2.3 --- REWIRE --- p.21Chapter 3.2.4 --- Implication-tree Based Alternative Wiring Logic Trans- formation --- p.22Chapter 3.3 --- Graph-based Alternative Wiring --- p.24Chapter 4 --- Implication Based Alternative Wiring Logic Transformation --- p.25Chapter 4.1 --- Source Node Implication --- p.25Chapter 4.1.1 --- Introduction --- p.25Chapter 4.1.2 --- Implication Relationship and Implication-tree --- p.25Chapter 4.1.3 --- Selection of Alternative Wire Based on Implication-tree --- p.29Chapter 4.1.4 --- Implication-tree Based Logic Transformation --- p.32Chapter 4.2 --- Destination Node Implication --- p.35Chapter 4.2.1 --- Introduction --- p.35Chapter 4.2.2 --- Destination Node Relationship --- p.35Chapter 4.2.3 --- Destination Node Implication-tree --- p.39Chapter 4.2.4 --- Selection of Alternative Wire --- p.41Chapter 4.3 --- The Algorithm --- p.43Chapter 4.3.1 --- IB AW Implementation --- p.43Chapter 4.3.2 --- Experimental Results --- p.43Chapter 4.4 --- Conclusion --- p.45Chapter 5 --- Graph Based Alternative Wiring Logic Transformation --- p.47Chapter 5.1 --- Introduction --- p.47Chapter 5.2 --- Notations and Definitions --- p.48Chapter 5.3 --- Alternative Wire Patterns --- p.50Chapter 5.4 --- Construction of Minimal Patterns --- p.54Chapter 5.4.1 --- Minimality of Patterns --- p.54Chapter 5.4.2 --- Minimal Pattern Formation --- p.56Chapter 5.4.3 --- Pattern Extraction --- p.61Chapter 5.5 --- Experimental Results --- p.63Chapter 5.6 --- Conclusion --- p.63Chapter 6 --- Logic Optimization by GBAW --- p.66Chapter 6.1 --- Introduction --- p.66Chapter 6.2 --- Logic Simplification --- p.67Chapter 6.2.1 --- Single-Addition-Multiple-Removal by Pattern Feature . . --- p.67Chapter 6.2.2 --- Single-Addition-Multiple-Removal by Combination of Pat- terns --- p.68Chapter 6.2.3 --- Single-Addition-Single-Removal --- p.70Chapter 6.3 --- Incremental Perturbation Heuristic --- p.71Chapter 6.4 --- GBAW Optimization Algorithm --- p.73Chapter 6.5 --- Experimental Results --- p.73Chapter 6.6 --- Conclusion --- p.76Chapter 7 --- Conclusion --- p.78Bibliography --- p.80Chapter A --- VLSI Design Cycle --- p.85Chapter B --- Alternative Wire Patterns in [WLFOO] --- p.87Chapter B.1 --- 0-local Pattern --- p.87Chapter B.2 --- 1-local Pattern --- p.88Chapter B.3 --- 2-local Pattern --- p.89Chapter B.4 --- Fanout-reconvergent Pattern --- p.90Chapter C --- New Alternative Wire Patterns --- p.91Chapter C.1 --- Pattern Cluster C1 --- p.91Chapter C.1.1 --- NAND-NAND-AND/NAND;AND/NAND --- p.91Chapter C.1.2 --- NOR-NOR-OR/NOR;AND/NAND --- p.92Chapter C.1.3 --- AND-NOR-OR/NOR;OR/NOR --- p.95Chapter C.1.4 --- OR-NAND-AND/NAND;AND/NAND --- p.95Chapter C.2 --- Pattern Cluster C2 --- p.98Chapter C.3 --- Pattern Cluster C3 --- p.99Chapter C.4 --- Pattern Cluster C4 --- p.104Chapter C.5 --- Pattern Cluster C5 --- p.105Glossary --- p.106Index --- p.10

    Skyline on sliding window data stream: a parallel approach

    Get PDF
    In this thesis we apply high-performance Parallel Data Stream Processing methodologies to approach the problem of computing the skyline over a stream of d-dimensional points. Since the stream is possibly unbounded, we adopt the sliding window specifications in order to maintain the skyline over the most recent received points. We propose a parallel implementation of a module that given as input a stream of points, produces skyline updates

    Simulation-based Performance Evaluation of MANET Backbone Formation Algorithms

    Get PDF
    As a result of the recent advances in the computation and communications industries, wireless communications-enabled computing devices are ubiquitous nowadays. Even though these devices are introduced to satisfy the user’s mobile computing needs, they are still unable to provide for the full mobile computing functionality as they confine the user mobility to be within certain regions in order to benefit from services provided by fixed network access points. Mobile ad hoc networks (MANETs) are introduced as the technology that potentially will make the nowadays illusion of mobile computing a tangible reality. MANETs are created by the mobile computing devices on an ad hoc basis, without any support or administration provided by a fixed or pre-installed communications infrastructure. Along with their appealing autonomy and fast deployment properties, MANETs exhibit some other properties that make their realization a very challenging task. Topology dynamism and bandwidth limitations of the communication channel adversely affect the performance of routing protocols designed for MANETs, especially with the increase in the number of mobile hosts and/or mobility rates. The Connected Dominating Set (CDS), a.k.a. virtual backbone or Spine, is proposed to facilitate routing, broadcasting, and establishing a dynamic infrastructure for distributed location databases. Minimizing the CDS produces a simpler abstracted topology of the MANET and allows for using shorter routes between any pair of hosts. Since it is NP-complete to find the minimum connected dominating set, MCDS, researchers resorted to approximation algorithms and heuristics to tackle this problem. The literature is rich of many CDS approximation algorithms that compete in terms of CDS size, running time, and signaling overhead. It has been reported that localized CDS creation algorithms are the fastest and the lightest in terms of signaling overhead among all other techniques. Examples of these localized CDS algorithms are Wu and Li algorithm and its Stojmenovic variant, the MPR algorithm, and Alzoubi algorithm. The designers of each of these algorithms claim that their algorithm exhibits the highest degree of localization and hence incurs the lowest cost in the CDS creation phase. However, these claims are not supported by any physical or at least simulation-based evidence. Moreover, the cost of maintaining the CDS (in terms of the change in CDS size, running time, and signaling overhead), in the presence of unpredictable and frequent topology changes, is an important factor that has to be taken into account -a cost that is overlooked most of the time. A simulation-based comparative study between the performance of these algorithms will be conducted using the ns2 network simulator. This study will focus on the total costs incurred by these algorithms in terms of CDS size, running time, and signaling overhead generated during the CDS creation and maintenance phases. Moreover, the effects of mobility rates, network size, and mobility models on the performance of each algorithm will be investigated. Conclusions regarding the pros and cons of each algorithm will be drawn, and directions for future research work will be recommended
    • …
    corecore