1,517 research outputs found
Dynamic Resource Allocation
Computer systems are subject to continuously increasing performance demands. However, energy consumption has become a critical issue, both for high-end large-scale parallel systems [12], as well as for portable devices [34]. In other words, more work needs to be done in less time, preferably with the same or smaller energy budget. Future performance and efficiency goals of computer systems can only be reached with large-scale, heterogeneous architectures [6]. Due to their distributed nature, control software is required to coordinate the parallel execution of applications on such platforms. Abstraction, arbitration and multi-objective optimization are only a subset of the tasks this software has to fulfill [6, 31]. The essential problem in all this is the allocation of platform resources to satisfy the needs of an application.\ud
\ud
This work considers the dynamic resource allocation problem, also known as the run-time mapping problem. This problem consists of task assignment to (processing) elements and communication routing through the interconnect between the elements. In mathematical terms, the combined problem is defined as the multi-resource quadratic assignment and routing problem (MRQARP). An integer linear programming formulation is provided, as well as complexity proofs on the N P-hardness of the problem.\ud
\ud
This work builds upon state-of-the-art work of Yagiura et al. [39, 40, 42] on metaheuristics for various generalizations of the generalized assignment problem. Specifically, we focus on the guided local search (GLS) approach for the multi-resource quadratic assignment problem (MRQAP). The quadratic assignment problem defines a cost relation between tasks and between elements. We generalize the multi-resource quadratic assignment problem with the addition of a capacitated interconnect and a communication topology between tasks. Numerical experiments show that the performance of the approach is comparable with commercial solvers. The footprint, the time versus quality trade-off and available metadata make guided local search a suitable candidate for run-time mapping
Recommended from our members
Nanometer VLSI placement and optimization for multi-objective design closure
In a VLSI physical synthesis flow, placement directly defines the interconnection,
which affects many other design objectives, such as timing, power consumption,
congestion, and thermal issues. With the scaling of technology, the relative interconnect
delay increases dramatically. As a result, placement has become a bottleneck
in deep sub-micron physical synthesis. In this dissertation, I propose several
optimization algorithms from global placement, placement migration, timing driven
placements, to incremental power optimizations for multi-objective VLSI design
closure. The first work is DPlace, a new global placement algorithm that scales
well to the modern large-scale circuit placement problems. DPlace simulates the
natural diffusion process to spread cells smoothly over the placement region, and
uses both analytical and discrete techniques to improve the wire length. However,
global placement is never sufficient for multi-objective design closure, a variety of
design objectives have to be improved incrementally, such as timing, routing congestion,
signal integrity, and heat distribution. Placement migration is a critical step
to address the cell overlaps appearing during incremental optimizations. To achieve
high placement stability, I propose a computational geometry based placement migration
flow to cope with placement changes, and a new stability metric to measure
the “similarity” between two placements accurately. Our placement migration algorithm
has clear advantage over conventional legalization algorithms such that the
neighborhood characteristics of the original placement are preserved. For timing
closure in high performance designs, I present a linear programming based incremental
timing driven placement to improve the timing on critical paths directly.
I further present an efficient timing driven placement algorithm (Pyramids). Two
formulations of Pyramids are proposed, which are suitable for different optimization
stages in a physical synthesis flow. Both approaches find the optimal location
for timing of a cell in constant time, through computational geometry based approaches.
For fast convergence of design closure, placement should be integrated
with other optimization techniques. I propose to combine placement, gate sizing
and Vt swapping techniques to reduce the total power consumption, especially the
leakage power, which is becoming increasingly critical for nanometer VLSI design
closure.Electrical and Computer Engineerin
An integrated placement and routing approach
As the feature size continues scaling down, interconnects become the major contributor of signal delay. Since interconnects are mainly determined by placement and routing, these two stages play key roles to achieve high performance. Historically, they are divided into two separate stages to make the problem tractable. Therefore, the routing information is not available during the placement process. Net models such as HPWL, are employed to approximate the routing to simplify the placement problem. However, the good placement in terms of these objectives may not be routable at all in the routing stage because different objectives are optimized in placement and routing stages. This inconsistancy makes the results obtained by the two-step optimization method far from optimal;In order to achieve high-quality placement solution and ensure the following routing, we propose an integrated placement and routing approach. In this approach, we integrate placement and routing into the same framework so that the objective optimized in placement is the same as that in routing. Since both placement and routing are very hard problems (NP-hard), we need to have very efficient algorithms so that integrating them together will not lead to intractable complexity;In this dissertation, we first develop a highly efficient placer - FastPlace 3.0 for large-scale mixed-size placement problem. Then, an efficient and effective detailed placer - FastDP is proposed to improve global placement by moving standard cells in designs. For high-degree nets in designs, we propose a novel performance-driven topology design algorithm to generate good topologies to achieve very strict timing requirement. In the routing phase, we develop two global routers, FastRoute and FastRoute 2.0. Compared to traditional global routers, they can generate better solutions and are two orders of magnitude faster. Finally, based on these efficient and high-quality placement and routing algorithms, we propose a new flow which integrates placement and routing together closely. In this flow, global routing is extensively applied to obtain the interconnect information and direct the placement process. In this way, we can get very good placement solutions with guaranteed routability
On The Engineering of a Stable Force-Directed Placer
Analytic and force-directed placement methods that simultaneously minimize wire length and spread cells are receiving renewed attention from both academia and industry. However, these methods are by no means trivial to implement---to date, published works have failed to provide sufficient engineering details to replicate results.
This dissertation addresses the implementation of a generic force-directed placer entitled FDP. Specifically, this thesis provides (1) a description of efficient force computation for spreading cells, (2) an illustration of numerical instability in this method and a means to avoid the instability, (3) metrics for measuring cell distribution throughout the placement area, and (4) a complementary technique that aids in minimizing wire length. FDP is compared to Kraftwerk and other leading academic tools including Capo, Dragon, and mPG for both standard cell and mixed-size circuits. Wire lengths produced by FDP are found to be, on average, up to 9% and 3% better than Kraftwerk and Capo, respectively. All told, this thesis confirms the validity and applicability of the approach, and provides clarifying details of the intricacies surrounding the implementation of a force-directed global placer
Recommended from our members
Simulation for Reliability, Hardware Security, and Ising Computing in VLSI Chip Design
The continued scaling of VLSI circuits has provided a wealth of opportunities andchallenges to the VLSI circuit design area. Both these challenges and opportunities, however,require new simulation tools that can enable their solution or exploitation as classicalmethods typically dealt with problem domains with smaller scales or less complexity. Inthis dissertation, simulation methods are presented to address the emerging VLSI designtopics of Electromigration induced aging and Ising computing and are then applied to theapplication areas of hardware security and graph partitioning respectively.The Electromigration aging effect in VLSI circuits is a long-term reliability issueaffecting current carrying metal wires leading to IR drop degradation. Typically, simpleanalytical equations can determine a wire’s effective age or if it will be affected by the EMaging effect at all. However, these classical methods are overly conservative and can lead toover design or unnecessary design iterations. Furthermore, it is expected that the EM agingeffect will become more severe in future Integrated Cirucits (ICs) due to increasing currentdensities and the prevalance of polycrystaline copper atom structures seen at small wiredimensions. For this reason, more comprehensive simulation techniques that can efficientlysimulate the EM effect with less conservative results can help mitigate overdesign andincrease design margins while reducing design iterations.The area of Hardware Security is becoming increasingly important as the chipsupply chain becomes more globalized and the integrity of chips becomes more diffiuclt toverify. Utilizing the accurate simulation techniques for EM, we can utilize this reliabilityeffect to demonstrate how a reliability based attack could be perpatrated. Furthermore, wecan utilize this aging effect as a defense mechanism to help us validate the integrity of anIC and detect counterfeit chips in the component supply chain market.Ising computing is an emerging method of solving combinatorial optimization problemsby simulating the interactions of so-called spin glasses and their interactions. Borrowingconcepts from quantum computing, this methods mimics the quantum interaction betweenspin glasses in such a way that finding a ground state of these spin glass models leadsto the solution of a particular problem. In this dissertation, effective methods of simulatingthe spin glass interactions using General Purpose Graphics Processing Units (GPGPUs)and finding their ground state are developed.In addition to the GPU based Ising model simulations, important combinatorialproblems can be mapped to the Ising model. In this dissertation the Ising solver is appliedto graph partitioning which can be utilized in VLSI design and many other domains as well.Specifically, solvers for the maxcut problem and the balanced min-cut partitioning problemare developed
Timing Closure in Chip Design
Achieving timing closure is a major challenge to the physical design of a computer chip. Its task is to find a physical realization fulfilling the speed specifications. In this thesis, we propose new algorithms for the key tasks of performance optimization, namely repeater tree construction; circuit sizing; clock skew scheduling; threshold voltage optimization and plane assignment. Furthermore, a new program flow for timing closure is developed that integrates these algorithms with placement and clocktree construction. For repeater tree construction a new algorithm for computing topologies, which are later filled with repeaters, is presented. To this end, we propose a new delay model for topologies that not only accounts for the path lengths, as existing approaches do, but also for the number of bifurcations on a path, which introduce extra capacitance and thereby delay. In the extreme cases of pure power optimization and pure delay optimization the optimum topologies regarding our delay model are minimum Steiner trees and alphabetic code trees with the shortest possible path lengths. We presented a new, extremely fast algorithm that scales seamlessly between the two opposite objectives. For special cases, we prove the optimality of our algorithm. The efficiency and effectiveness in practice is demonstrated by comprehensive experimental results. The task of circuit sizing is to assign millions of small elementary logic circuits to elements from a discrete set of logically equivalent, predefined physical layouts such that power consumption is minimized and all signal paths are sufficiently fast. In this thesis we develop a fast heuristic approach for global circuit sizing, followed by a local search into a local optimum. Our algorithms use, in contrast to existing approaches, the available discrete layout choices and accurate delay models with slew propagation. The global approach iteratively assigns slew targets to all source pins of the chip and chooses a discrete layout of minimum size preserving the slew targets. In comprehensive experiments on real instances, we demonstrate that the worst path delay is within 7% of its lower bound on average after a few iterations. The subsequent local search reduces this gap to 2% on average. Combining global and local sizing we are able to size more than 5.7 million circuits within 3 hours. For the clock skew scheduling problem we develop the first algorithm with a strongly polynomial running time for the cycle time minimization in the presence of different cycle times and multi-cycle paths. In practice, an iterative local search method is much more efficient. We prove that this iterative method maximizes the worst slack, even when restricting the feasible schedule to certain time intervals. Furthermore, we enhance the iterative local approach to determine a lexicographically optimum slack distribution. The clock skew scheduling problem is then generalized to allow for simultaneous data path optimization. In fact, this is a time-cost tradeoff problem. We developed the first combinatorial algorithm for computing time-cost tradeoff curves in graphs that may contain cycles. Starting from the lowest-cost solution, the algorithm iteratively computes a descent direction by a minimum cost flow computation. The maximum feasible step length is then determined by a minimum ratio cycle computation. This approach can be used in chip design for several optimization tasks, e.g. threshold voltage optimization or plane assignment. Finally, the optimization routines are combined into a timing closure flow. Here, the global placement is alternated with global performance optimization. Netweights are used to penalize the length of critical nets during placement. After the global phase, the performance is improved further by applying more comprehensive optimization routines on the most critical paths. In the end, the clock schedule is optimized and clocktrees are inserted. Computational results of the design flow are obtained on real-world computer chips
On the Generation of Large Passive Macromodels for Complex Interconnect Structures
This paper addresses some issues related to the passivity of interconnect macromodels computed from measured or simulated port responses. The generation of such macromodels is usually performed via suitable least squares fitting algorithms. When the number of ports and the dynamic order of the macromodel is large, the inclusion of passivity constraints in the fitting process is cumbersome and results in excessive computational and storage requirements. Therefore, we consider in this work a post-processing approach for passivity enforcement, aimed at the detection and compensation of passivity violations without compromising the model accuracy. Two complementary issues are addressed. First, we consider the enforcement of asymptotic passivity at high frequencies based on the perturbation of the direct coupling term in the transfer matrix. We show how potential problems may arise when off-band poles are present in the model. Second, the enforcement of uniform passivity throughout the entire frequency axis is performed via an iterative perturbation scheme on the purely imaginary eigenvalues of associated Hamiltonian matrices. A special formulation of this spectral perturbation using possibly large but sparse matrices allows the passivity compensation to be performed at a cost which scales only linearly with the order of the system. This formulation involves a restarted Arnoldi iteration combined with a complex frequency hopping algorithm for the selective computation of the imaginary eigenvalues to be perturbed. Some examples of interconnect models are used to illustrate the performance of the proposed technique
- …