Search CORE

3,022 research outputs found

Stochastic Optimization Over Correlated Data Set: A Case Study on VLSI Decoupling Capacitance Budgeting

Author: Jinjun Xiong
Lei He
Yiyu Shi
Publication venue: 'IntechOpen'
Publication date: 28/02/2011
Field of study

Lagrangian relaxation-based multi-threaded discrete gate sizer

Author: Sharma Ankur
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2018
Field of study

In integrated circuit design gate sizing is one of the key optimization techniques which is repeatedly invoked to trade-off delays for area and/or power of the gates during logic design and physical design stages. With increasing design sizes of a million gates and larger, discrete gate sizes and non-convex delay models the gate sizing algorithms that were designed for continuous sizes and convex delay models are slow and timing inaccurate. Of the several published discrete gate sizing algorithms, recent works have shown that Lagrangian relaxation based gate sizers have produced designs with the lowest power on average with high timing accuracy. But they are also very slow due to a large number of expensive timing updates spread across hundreds of iterations of solving the Lagrangian sub-problem. In this thesis we present a Lagrangian relaxation based multi-threaded discrete gate sizer for fast timing and power reduction by swapping the gate sizes and the threshold voltages. We developed two parallelization enabling techniques to reduce the runtime of Lagrangian sub-problem solver, namely, mutual exclusion edge (MEE) assignment and directed acyclic graph (DAG) based netlist traversal. MEEs are dummy edges assigned to reduce computational dependencies among gates sharing one or more common fan-ins. DAG based netlist traversal facilitates simultaneous resizing of gates belonging to different topological levels. We designed a Lagrange multiplier update framework that enables rapid convergence of the timing recovery and power recovery algorithms. To reduce the runtime of timing updates, we proposed a simple and fast-to-compute effective capacitance model and several mechanisms to calibrate the timing models to improve their accuracy. Compared to the state-of-the-art gate sizer, our proposed gate sizer is on average 15x faster and the optimized designs have only 1.7\% higher power. In digital synchronous designs simultaneous gate sizing and clock skew scheduling provides significantly more power saving. We extend the gate sizer to simultaneously schedule the clock skew. It can achieve an average of 18.8\% more reduction in power with only 20\% increase in the runtime

Digital Repository @ Iowa State University (ISU)

Elasticity and Petri nets

Author: A. Dasdan
A. Peeters
A. Schrijver
A. Yakovlev
A.J. Martin
C.E. Leiserson
C.V. Ramamoorthy
D. Misunas
D.H. Linder
G. Chiola
I.E. Sutherland
J. Campos
J. Cortadella
J. Cortadella
J. O’Leary
J.D.C. Little
L.P. Carloni
L.P. Carloni
L.Y. Rosenblum
M. Yoeli
M.R. Garey
R. Karp
R. Manohar
R.B. Reese
R.W. Wolff
T. Chelcea
T.E. Williams
T.H. Cormen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Digital electronic systems typically use synchronous clocks and primarily assume fixed duration of their operations to simplify the design process. Time elastic systems can be constructed either by replacing the clock with communication handshakes (asynchronous version) or by augmenting the clock with a synchronous version of a handshake (synchronous version). Time elastic systems can tolerate static and dynamic changes in delays (asynchronous case) or latencies (synchronous case) of operations that can be used for modularity, ease of reuse and better power-delay trade-off. This paper describes methods for the modeling, performance analysis and optimization of elastic systems using Marked Graphs and their extensions capable of describing behavior with early evaluation. The paper uses synchronous elastic systems (aka latency-tolerant systems) for illustrating the use of Petri nets, however, most of the methods can be applied without changes (except changing the delay model associated with events of the system) to asynchronous elastic systems.Peer ReviewedPostprint (author's final draft

CiteSeerX

Crossref

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

ARM2GC: Succinct Garbled Processor for Secure Computation

Author: Hussain Siam U.
Koushanfar Farinaz
Riazi M. Sadegh
Sadeghi Ahmad-Reza
Songhori Ebrahim M.
Publication venue
Publication date: 06/02/2019
Field of study

We present ARM2GC, a novel secure computation framework based on Yao's Garbled Circuit (GC) protocol and the ARM processor. It allows users to develop privacy-preserving applications using standard high-level programming languages (e.g., C) and compile them using off-the-shelf ARM compilers (e.g., gcc-arm). The main enabler of this framework is the introduction of SkipGate, an algorithm that dynamically omits the communication and encryption cost of the gates whose outputs are independent of the private data. SkipGate greatly enhances the performance of ARM2GC by omitting costs of the gates associated with the instructions of the compiled binary, which is known by both parties involved in the computation. Our evaluation on benchmark functions demonstrates that ARM2GC not only outperforms the current GC frameworks that support high-level languages, it also achieves efficiency comparable to the best prior solutions based on hardware description languages. Moreover, in contrast to previous high-level frameworks with domain-specific languages and customized compilers, ARM2GC relies on standard ARM compiler which is rigorously verified and supports programs written in the standard syntax.Comment: 13 page

arXiv.org e-Print Archive

TUbiblio

Cryptology ePrint Archive

Doctor of Philosophy

Author: Vij Vikas S.
Publication venue: University of Utah
Publication date: 01/12/2013
Field of study

dissertationAsynchronous design has a very promising potential even though it has largely received a cold reception from industry. Part of this reluctance has been due to the necessity of custom design languages and computer aided design (CAD) flows to design, optimize, and validate asynchronous modules and systems. Next generation asynchronous flows should support modern programming languages (e.g., Verilog) and application specific integrated circuits (ASIC) CAD tools. They also have to support multifrequency designs with mixed synchronous (clocked) and asynchronous (unclocked) designs. This work presents a novel relative timing (RT) based methodology for generating multifrequency designs using synchronous CAD tools and flows. Synchronous CAD tools must be constrained for them to work with asynchronous circuits. Identification of these constraints and characterization flow to automatically derive the constraints is presented. The effect of the constraints on the designs and the way they are handled by the synchronous CAD tools are analyzed and reported in this work. The automation of the generation of asynchronous design templates and also the constraint generation is an important problem. Algorithms for automation of reset addition to asynchronous circuits and power and/or performance optimizations applied to the circuits using logical effort are explored thus filling an important hole in the automation flow. Constraints representing cyclic asynchronous circuits as directed acyclic graphs (DAGs) to the CAD tools is necessary for applying synchronous CAD optimizations like sizing, path delay optimizations and also using static timing analysis (STA) on these circuits. A thorough investigation for the requirements of cycle cutting while preserving timing paths is presented with an algorithm to automate the process of generating them. A large set of designs for 4 phase handshake protocol circuit implementations with early and late data validity are characterized for area, power and performance. Benchmark circuits with automated scripts to generate various configurations for better understanding of the designs are proposed and analyzed. Extension to the methodology like addition of scan insertion using automatic test pattern generation (ATPG) tools to add testability of datapath in bundled data asynchronous circuit implementations and timing closure approaches are also described. Energy, area, and performance of purely asynchronous circuits and circuits with mixed synchronous and asynchronous blocks are explored. Results indicate the benefits that can be derived by generating circuits with asynchronous components using this methodology

The University of Utah: J. Willard Marriott Digital Library

Recommended from our members

Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems

Author: Carloni Luca
Collins Rebecca L.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2008
Field of study

Latency-insensitive protocols allow system-on-chip engineers to decouple the design of the computing cores from the design of the inter-core communication channels while following the synchronous design paradigm. In a latency-insensitive system (LIS) each core is encapsulated within a shell, a synthesized interface module that dynamically controls its operation. At each clock period, if new data has not arrived on an input channel or a stalling request has arrived on an output channel, the shell stalls the core and buffers other incoming valid data for future processing. The combination of finite buffers and backpressure from stalling can cause throughput degradation. Previous works addressed this problem by increasing buffer space to reduce the backpressure requests or inserting extra buffering to balance the channel latency around a LIS. We explore the theoretical complexity of these approaches and propose a heuristic algorithm for efficient queue sizing. We also practically characterize several LIS topologies and how the topology of a LIS can impact not only how much throughput degradation will occur, but also the difficulty of finding optimal queue sizing solutions

Columbia University Academic Commons

Optimization techniques for high-performance digital circuits

Author: Chandu Visweswariah
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/1997
Field of study

The relentless push for high performance in custom dig-ital circuits has led to renewed emphasis on circuit opti-mization or tuning. The parameters of the optimization are typically transistor and interconnect sizes. The de-sign metrics are not just delay, transition times, power and area, but also signal integrity and manufacturability. This tutorial paper discusses some of the recently pro-posed methods of circuit optimization, with an emphasis on practical application and methodology impact. Circuit optimization techniques fall into three broad categories. The rst is dynamic tuning, based on time-domain simulation of the underlying circuit, typically combined with adjoint sensitivity computation. These methods are accurate but require the specication of in-put signals, and are best applied to small data- ow cir-cuits and \cross-sections " of larger circuits. Ecient sensitivity computation renders feasible the tuning of cir-cuits with a few thousand transistors. Second, static tuners employ static timing analysis to evaluate the per-formance of the circuit. All paths through the logic are simultaneously tuned, and no input vectors are required. Large control macros are best tuned by these methods. However, in the context of deep submicron custom de-sign, the inaccuracy of the delay models employed by these methods often limits their utility. Aggressive dy-namic or static tuning can push a circuit into a precip-itous corner of the manufacturing process space, which is a problem addressed by the third class of circuit op-timization tools, statistical tuners. Statistical techniques are used to enhance manufacturability or maximize yield. In addition to surveying the above techniques, topics such as the use of state-of-the-art nonlinear optimization methods and special considerations for interconnect siz-ing, clock tree optimization and noise-aware tuning will be brie y considered.

CiteSeerX

Crossref

Parallel VLSI Circuit Analysis and Optimization

Author: Ye Xiaoji
Publication venue
Publication date
Field of study

The prevalence of multi-core processors in recent years has introduced new opportunities and challenges to Electronic Design Automation (EDA) research and development. In this dissertation, a few parallel Very Large Scale Integration (VLSI) circuit analysis and optimization methods which utilize the multi-core computing platform to tackle some of the most difficult contemporary Computer-Aided Design (CAD) problems are presented. The first CAD application that is addressed in this dissertation is analyzing and optimizing mesh-based clock distribution network. Mesh-based clock distribution network (also known as clock mesh) is used in high-performance microprocessor designs as a reliable way of distributing clock signals to the entire chip. The second CAD application addressed in this dissertation is the Simulation Program with Integrated Circuit Emphasis (SPICE) like circuit simulation. SPICE simulation is often regarded as the bottleneck of the design flow. Recently, parallel circuit simulation has attracted a lot of attention. The first part of the dissertation discusses circuit analysis techniques. First, a combination of clock network specific model order reduction algorithm and a port sliding scheme is presented to tackle the challenges in analyzing large clock meshes with a large number of clock drivers. Our techniques run much faster than the standard SPICE simulation and existing model order reduction techniques. They also provide a basis for the clock mesh optimization. Then, a hierarchical multi-algorithm parallel circuit simulation (HMAPS) framework is presented as an novel technique of parallel circuit simulation. The inter-algorithm parallelism approach in HMAPS is completely different from the existing intra-algorithm parallel circuit simulation techniques and achieves superlinear speedup in practice. The second part of the dissertation talks about parallel circuit optimization. A modified asynchronous parallel pattern search (APPS) based method which utilizes the efficient clock mesh simulation techniques for the clock driver size optimization problem is presented. Our modified APPS method runs much faster than a continuous optimization method and effectively reduces the clock skew for all test circuits. The third part of the dissertation describes parallel performance modeling and optimization of the HMAPS framework. The performance models and runtime optimization scheme improve the speed of HMAPS further more. The dynamically adapted HMAPS becomes a complete solution for parallel circuit simulation

Texas A&M Repository

Effective network grid synthesis and optimization for high performance very large scale integration system design

Author: Yang Yun
Publication venue
Publication date: 01/02/2008
Field of study

制度:新 ; 文部省報告番号:甲2642号 ; 学位の種類:博士(工学) ; 授与年月日:2008/3/15 ; 早大学位記番号:新480

Waseda University Repository