10 research outputs found
Peephole optimization of asynchronous macromodule networks
Journal ArticleAbstract- Most high-level synthesis tools for asynchronous circuits take descriptions in concurrent hardware description languages and generate networks of macromodules or handshake components. In this paper, we propose a peephole optimizer for these networks. Our peephole optimizer first deduces an equivalent blackbox behavior for the network using Dill's tracetheoretic parallel composition operator. It then applies a new procedure called burst-mode reduction to obtain burst-mode machines from the deduced behavior. In a significant number of examples, our optimizer achieves gate-count improvements by a factor of five, and speed (cycle-time) improvements by a factor of two. Burst-mode reduction can be applied to any macromodule network that is delay insensitive as well as deterministic. A significant number of asynchronous circuits, especially those generated by asynchronous high-level synthesis tools, fall into this class, thus making our procedure widely applicable
Peephole optimization of asynchronous networks through process composition and burst-mode machine generation
Journal ArticleIn this paper, we discuss the problem of improving the efficiency of macromodule networks generated through asynchronous high level synthesis. We compose the behaviors of the modules in the sub-network being optimized using Dill's trace-theoretic operators to get a single behavioral description for the whole sub-network. From the composite trace structures so obtained, we obtain interface state graphs (ISG) (as described by Sutherland, Sproull, and Molnar), encode the ISGs to obtain encoded ISGs (EISGs), and then apply a procedure we have developed called Burst-mode machine reduction (BM-reduction) to obtain burstmode machines from EISGs. We then synthesize burst-mode machine circuits (currently) using the tool of Ken Yun (Stanford). We can report significant area- and time-improvements on a number of examples, as a result of our optimization method
Peephole optimization of asynchronous macromodule networks
Journal ArticleMost high level synthesis tools for asynchronous circuits take descriptions in concurrent hardware description languages and generate networks of macromodules or handshake components. In this paper we describe a peephole optimizer for such macromodule networks that often effects area and/or time improvements. Our optimizer first deduces an equivalent black-box behavior for the given network of macrmodules using Dill's trace-theoretic parallel composition operator. It then applies a new procedure culled Burst-mode reduction to obtain burst-mode machines, which can be synthesized into gate networks using available tools. Since burst-mode reduction can be applied to any macromodule network that is delay-insensitive as well as deterministic, our optimizer covers a significant number of asynchronous circuits especially those generated by asynchronous high level synthesis tools
Recommended from our members
Contributions to the Design of Asynchronous Macromodular Systems
In this thesis, I advocate the use of macromodules to design and build robust and performance-competitive asynchronous systems. The contributions of the work relate to different aspects of the design of asynchronous macromodular systems. First, an architectural optimization for 4-phase systems is introduced. The goal of the optimization is to increase the performance of a system by increasing the level of concurrent activity in the sequencing of data processing stages. In particular, three new asynchronous sequencers are designed, which increase the throughput of the system. Existing asynchronous data paths do not operate correctly at this increased level of concurrency: data hazards may result. Interlock mechanisms are introduced to insure correct operation. The technique can also be regarded as a low-power optimization: The increased throughput can be traded for a significant reduction in the power consumption of the entire system. SPICE simulation results show that the new sequencers allow roughly twice the throughput of non-concurrent sequencers. The simulations also show that, after voltage scaling, energy dissipation is reduced by a factor of 2.5. Second, the use of pulses for efficient inter-module synchronization is introduced. The idea is complemented with the definition of a pulse-mode handshake protocol and the characterization of Pulse-Burst Operation (PBO), an important extension to traditional pulse-mode operation. Also, a basic set of macromodules, that efficiently implement control operations such as sequencing, selection, iteration, concurrency control, resource sharing, and arbitration is presented. Modules for interfacing pulse-mode circuits with traditional 2-phase and 4-phase circuits are also included in the set. Finally, the design of a packet switch is used to demonstrate the viability of pulse-mode macromodules to implement complex, high performance systems. The switch organization, its asynchronous operation, and the low control overhead introduced by pulse-mode macromodules result in a design that can handle 2.4 times the target throughput of 155 Mbits/Sec. Also, the switch is characterized by very low input-to-output latency. These results suggest that pulse-mode macromodules can keep control overhead low without introducing complex, unsafe timing considerations, two necessary conditions to achieve robust, performance-competitive systems
Master of Science
thesisIntegrated circuits often consist of multiple processing elements that are regularly tiled across the two-dimensional surface of a die. This work presents the design and integration of high speed relative timed routers for asynchronous network-on-chip. It researches NoC's efficiency through simplicity by directly translating simple T-router, source-routing, single-flit packet to higher radix routers. This work is intended to study performance and power trade-offs adding higher radix routers, 3D topologies, Virtual Channels, Accurate NoC modeling, and Transmission line communication links. Routers with and without virtual channels are designed and integrated to arrayed communication networks. Furthermore, the work investigates 3D networks with diffusive RC wires and transmission lines on long wrap interconnects
Pond: A Robust, scalable, massively parallel computer architecture
A new computer architecture, intended for implementation in late and post silicon technologies, is proposed. The architecture is a fine-grained, inherently parallel system consisting of a large grid of thousands or millions of simple atomic processors (APs) employing a simple instruction set. Each AP is configured as either a program instruction or data storage element. These elements are organized into logical entities, analogous to traditional programming functions/methods and data structures. Programming work is underway to compile and run programs from traditional sequential code where parallelism is automatically discovered at the high level on both instruction level and function level, and integrated into the object code that is then sent to the processor. The result is a massively parallel architecture that fully exploits instruction and thread-level parallelism. The architecture design is presented, in-progress work involving conversion of existing code is discussed, and examples are shown to indicate the speedup potential that exists in this new architecture when compared to current architectures