8,514 research outputs found
Configuration Sharing Optimized Placement and Routing
Reconfigurable systems have been shown to achieve very high computational performance. However, the overhead associated with reconfiguration of hardware remains a critical factor in overall system performance. This paper discusses the development and evaluation of a technique to minimize the delay associated with reconfiguration based upon optimized sharing of configuration bit streams between design contexts. This is achieved through modified placement and routing algorithms
Recommended from our members
Interconnect optimizations for nanometer VLSI design
textAs the semiconductor technology scales into deeper sub-micron domain, billions of transistors can be used on a single system-on-chip (SOC) makes interconnection optimization more important roughly for two reasons. First, congestion, power, timing in routing and buffering requirements make inter- connection optimization more and more challenging. Second, gate delay get- ting shorter while the RC delay gets longer due to scaling. Study of interconnection construction and optimization algorithms in real industry flows and designs ends up with interesting findings. One used to be overlooked but very important and practical problem is how to utilize over- the-block routing resources intelligently. Routing over large IP blocks needs special attention as there is almost no way to insert buffers inside hard IP blocks, which can lead to unsolvable slew/timing violations. In current design flows we have seen, the routing resources over the IP blocks were either dealt as routing blockages leading to a significant waste, or simply treated in the same way as outside-the-block routing resources, which would violate the slew constraints and thus fail buffering. To handle that, this work proposes a novel buffering-aware over-the- block rectilinear Steiner minimum tree (BOB-RSMT) algorithm which helps reclaim the āwastedā over-the-block routing resources while meeting user-specified slew constraints. Proposed algorithm incrementally and efficiently migrates initial tree structures with buffering-awareness to meet slew constraints while minimizing wire-length. Moreover, due to the fact that timing optimization is important for the VLSI design, in this work, timing-driven over-the-block rectilinear Steiner tree (TOB-RST) is also studied to optimize critical paths. This proposed TOB-RST algorithm can be used in routing or post-routing stage to provide high-quality topologies to help close timing. Then a follow-up problem emerges: how to accomplish the whole routing with over-the-block routing resources used properly. Utilizing over-the- block routing resources could dramatically improve the routing solution, yet require special attention, since the slew, affected by different RC on different metal layers, must be constrained by buffering and is easily violated. Moreover, even of all nets are slew-legalized, the routing solution could still suffer from heavy congestion problem. A new global router, BOB-Router, is to solve the over-the-block global routing problem through minimizing overflows, wire-length and via count simultaneously without violating slew constraints. Based on my completed works, BOB-RSMT and BOB-Router tremendously improve the overall routing and buffering quality. Experimental results show that proposed over-the-block rectilinear Steiner tree construction and routing completely satisfies the slew constraints and significantly outperforms the obstacle-avoiding rectilinear Steiner tree construction and routing in terms of wire-length, via count and overflows.Electrical and Computer Engineerin
Throughput-driven floorplanning with wire pipelining
The size of future high-performance SoC is such that the time-of-flight of wires connecting distant pins in the layout can be much higher than the clock period. In order to keep the frequency as high as possible, the wires may be pipelined. However, the insertion of flip-flops may alter the throughput of the system due to the presence of loops in the logic netlist. In this paper, we address the problem of floorplanning a large design where long interconnects are pipelined by inserting the throughput in the cost function of a tool based on simulated annealing. The results obtained on a series of benchmarks are then validated using a simple router that breaks long interconnects by suitably placing flip-flops along the wires
Mapping constrained optimization problems to quantum annealing with application to fault diagnosis
Current quantum annealing (QA) hardware suffers from practical limitations
such as finite temperature, sparse connectivity, small qubit numbers, and
control error. We propose new algorithms for mapping boolean constraint
satisfaction problems (CSPs) onto QA hardware mitigating these limitations. In
particular we develop a new embedding algorithm for mapping a CSP onto a
hardware Ising model with a fixed sparse set of interactions, and propose two
new decomposition algorithms for solving problems too large to map directly
into hardware.
The mapping technique is locally-structured, as hardware compatible Ising
models are generated for each problem constraint, and variables appearing in
different constraints are chained together using ferromagnetic couplings. In
contrast, global embedding techniques generate a hardware independent Ising
model for all the constraints, and then use a minor-embedding algorithm to
generate a hardware compatible Ising model. We give an example of a class of
CSPs for which the scaling performance of D-Wave's QA hardware using the local
mapping technique is significantly better than global embedding.
We validate the approach by applying D-Wave's hardware to circuit-based
fault-diagnosis. For circuits that embed directly, we find that the hardware is
typically able to find all solutions from a min-fault diagnosis set of size N
using 1000N samples, using an annealing rate that is 25 times faster than a
leading SAT-based sampling method. Further, we apply decomposition algorithms
to find min-cardinality faults for circuits that are up to 5 times larger than
can be solved directly on current hardware.Comment: 22 pages, 4 figure
Distributed Traffic Signal Control for Maximum Network Throughput
We propose a distributed algorithm for controlling traffic signals. Our
algorithm is adapted from backpressure routing, which has been mainly applied
to communication and power networks. We formally prove that our algorithm
ensures global optimality as it leads to maximum network throughput even though
the controller is constructed and implemented in a completely distributed
manner. Simulation results show that our algorithm significantly outperforms
SCATS, an adaptive traffic signal control system that is being used in many
cities
Performance and power optimization in VLSI physical design
As VLSI technology enters the nanoscale regime, a great amount of efforts have
been made to reduce interconnect delay. Among them, buffer insertion stands out
as an effective technique for timing optimization. A dramatic rise in on-chip buffer
density has been witnessed. For example, in two recent IBM ASIC designs, 25% gates
are buffers.
In this thesis, three buffer insertion algorithms are presented for the procedure
of performance and power optimization. The second chapter focuses on improving circuit performance under inductance effect. The new algorithm works under
the dynamic programming framework and runs in provably linear time for multiple
buffer types due to two novel techniques: restrictive cost bucketing and efficient delay
update. The experimental results demonstrate that our linear time algorithm consistently outperforms all known RLC buffering algorithms in terms of both solution
quality and runtime. That is, the new algorithm uses fewer buffers, runs in shorter
time and the buffered tree has better timing.
The third chapter presents a method to guarantee a high fidelity signal transmission in global bus. It proposes a new redundant via insertion technique to reduce
via variation and signal distortion in twisted differential line. In addition, a new
buffer insertion technique is proposed to synchronize the transmitted signals, thus
further improving the effectiveness of the twisted differential line. Experimental results demonstrate a 6GHz signal can be transmitted with high fidelity using the new
approaches. In contrast, only a 100MHz signal can be reliably transmitted using a
single-end bus with power/ground shielding. Compared to conventional twisted differential line structure, our new techniques can reduce the magnitude of noise by 45%
as witnessed in our simulation.
The fourth chapter proposes a buffer insertion and gate sizing algorithm for
million plus gates. The algorithm takes a combinational circuit as input instead of
individual nets and greatly reduces the buffer and gate cost of the entire circuit.
The algorithm has two main features: 1) A circuit partition technique based on the
criticality of the primary inputs, which provides the scalability for the algorithm, and
2) A linear programming formulation of non-linear delay versus cost tradeoff, which
formulates the simultaneous buffer insertion and gate sizing into linear programming
problem. Experimental results on ISCAS85 circuits show that even without the circuit
partition technique, the new algorithm achieves 17X speedup compared with path
based algorithm. In the meantime, the new algorithm saves 16.0% buffer cost, 4.9%
gate cost, 5.8% total cost and results in less circuit delay
- ā¦