10,117 research outputs found
Recent Advances in Graph Partitioning
We survey recent trends in practical algorithms for balanced graph
partitioning together with applications and future research directions
Tight Continuous Relaxation of the Balanced -Cut Problem
Spectral Clustering as a relaxation of the normalized/ratio cut has become
one of the standard graph-based clustering methods. Existing methods for the
computation of multiple clusters, corresponding to a balanced -cut of the
graph, are either based on greedy techniques or heuristics which have weak
connection to the original motivation of minimizing the normalized cut. In this
paper we propose a new tight continuous relaxation for any balanced -cut
problem and show that a related recently proposed relaxation is in most cases
loose leading to poor performance in practice. For the optimization of our
tight continuous relaxation we propose a new algorithm for the difficult
sum-of-ratios minimization problem which achieves monotonic descent. Extensive
comparisons show that our method outperforms all existing approaches for ratio
cut and other balanced -cut criteria.Comment: Long version of paper accepted at NIPS 201
Optimal dynamic remapping of parallel computations
A large class of computations are characterized by a sequence of phases, with phase changes occurring unpredictably. The decision problem was considered regarding the remapping of workload to processors in a parallel computation when the utility of remapping and the future behavior of the workload is uncertain, and phases exhibit stable execution requirements during a given phase, but requirements may change radically between phases. For these problems a workload assignment generated for one phase may hinder performance during the next phase. This problem is treated formally for a probabilistic model of computation with at most two phases. The fundamental problem of balancing the expected remapping performance gain against the delay cost was addressed. Stochastic dynamic programming is used to show that the remapping decision policy minimizing the expected running time of the computation has an extremely simple structure. Because the gain may not be predictable, the performance of a heuristic policy that does not require estimnation of the gain is examined. The heuristic method's feasibility is demonstrated by its use on an adaptive fluid dynamics code on a multiprocessor. The results suggest that except in extreme cases, the remapping decision problem is essentially that of dynamically determining whether gain can be achieved by remapping after a phase change. The results also suggest that this heuristic is applicable to computations with more than two phases
Parallel symbolic state-space exploration is difficult, but what is the alternative?
State-space exploration is an essential step in many modeling and analysis
problems. Its goal is to find the states reachable from the initial state of a
discrete-state model described. The state space can used to answer important
questions, e.g., "Is there a dead state?" and "Can N become negative?", or as a
starting point for sophisticated investigations expressed in temporal logic.
Unfortunately, the state space is often so large that ordinary explicit data
structures and sequential algorithms cannot cope, prompting the exploration of
(1) parallel approaches using multiple processors, from simple workstation
networks to shared-memory supercomputers, to satisfy large memory and runtime
requirements and (2) symbolic approaches using decision diagrams to encode the
large structured sets and relations manipulated during state-space generation.
Both approaches have merits and limitations. Parallel explicit state-space
generation is challenging, but almost linear speedup can be achieved; however,
the analysis is ultimately limited by the memory and processors available.
Symbolic methods are a heuristic that can efficiently encode many, but not all,
functions over a structured and exponentially large domain; here the pitfalls
are subtler: their performance varies widely depending on the class of decision
diagram chosen, the state variable order, and obscure algorithmic parameters.
As symbolic approaches are often much more efficient than explicit ones for
many practical models, we argue for the need to parallelize symbolic
state-space generation algorithms, so that we can realize the advantage of both
approaches. This is a challenging endeavor, as the most efficient symbolic
algorithm, Saturation, is inherently sequential. We conclude by discussing
challenges, efforts, and promising directions toward this goal
Inherently workload-balanced clustered microarchitecture
The performance of clustered microarchitectures relies on steering schemes that try to find the best trade-off between workload balance and inter-cluster communication penalties. In previously proposed clustered processors, reducing communication penalties and balancing the workload are opposite targets, since improving one usually implies a detriment in the other. In this paper we propose a new clustered microarchitecture that can minimize communication penalties without compromising workload balance. The key idea is to arrange the clusters in a ring topology in such a way that results of one cluster can be forwarded to the neighbor cluster with a very short latency. In this way, minimizing communication penalties is favored when the producer of a value and its consumer are placed in adjacent clusters, which also favors workload balance. The proposed microarchitecture is shown to outperform a state-of-the-art clustered processor. For instance, for an 8-cluster configuration and just one fully pipelined unidirectional bus, 15% speedup is achieved on average for FP programs.Peer ReviewedPostprint (published version
Magic-State Functional Units: Mapping and Scheduling Multi-Level Distillation Circuits for Fault-Tolerant Quantum Architectures
Quantum computers have recently made great strides and are on a long-term
path towards useful fault-tolerant computation. A dominant overhead in
fault-tolerant quantum computation is the production of high-fidelity encoded
qubits, called magic states, which enable reliable error-corrected computation.
We present the first detailed designs of hardware functional units that
implement space-time optimized magic-state factories for surface code
error-corrected machines. Interactions among distant qubits require surface
code braids (physical pathways on chip) which must be routed. Magic-state
factories are circuits comprised of a complex set of braids that is more
difficult to route than quantum circuits considered in previous work [1]. This
paper explores the impact of scheduling techniques, such as gate reordering and
qubit renaming, and we propose two novel mapping techniques: braid repulsion
and dipole moment braid rotation. We combine these techniques with graph
partitioning and community detection algorithms, and further introduce a
stitching algorithm for mapping subgraphs onto a physical machine. Our results
show a factor of 5.64 reduction in space-time volume compared to the best-known
previous designs for magic-state factories.Comment: 13 pages, 10 figure
Low Power Processor Architectures and Contemporary Techniques for Power Optimization – A Review
The technological evolution has increased the number of transistors for a given die area significantly and increased the switching speed from few MHz to GHz range. Such inversely proportional decline in size and boost in performance consequently demands shrinking of supply voltage and effective power dissipation in chips with millions of transistors. This has triggered substantial amount of research in power reduction techniques into almost every aspect of the chip and particularly the processor cores contained in the chip. This paper presents an overview of techniques for achieving the power efficiency mainly at the processor core level but also visits related domains such as buses and memories. There are various processor parameters and features such as supply voltage, clock frequency, cache and pipelining which can be optimized to reduce the power consumption of the processor. This paper discusses various ways in which these parameters can be optimized. Also, emerging power efficient processor architectures are overviewed and research activities are discussed which should help reader identify how these factors in a processor contribute to power consumption. Some of these concepts have been already established whereas others are still active research areas. © 2009 ACADEMY PUBLISHER
An optimal repartitioning decision policy
A central problem to parallel processing is the determination of an effective partitioning of workload to processors. The effectiveness of any given partition is dependent on the stochastic nature of the workload. The problem of determining when and if the stochastic behavior of the workload has changed enough to warrant the calculation of a new partition is treated. The problem is modeled as a Markov decision process, and an optimal decision policy is derived. Quantification of this policy is usually intractable. A heuristic policy which performs nearly optimally is investigated empirically. The results suggest that the detection of change is the predominant issue in this problem
- …