180 research outputs found
On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution
Technology trends are making the cost of data movement increasingly dominant,
both in terms of energy and time, over the cost of performing arithmetic
operations in computer systems. The fundamental ratio of aggregate data
movement bandwidth to the total computational power (also referred to the
machine balance parameter) in parallel computer systems is decreasing. It is
there- fore of considerable importance to characterize the inherent data
movement requirements of parallel algorithms, so that the minimal architectural
balance parameters required to support it on future systems can be well
understood. In this paper, we develop an extension of the well-known red-blue
pebble game to develop lower bounds on the data movement complexity for the
parallel execution of computational directed acyclic graphs (CDAGs) on parallel
systems. We model multi-node multi-core parallel systems, with the total
physical memory distributed across the nodes (that are connected through some
interconnection network) and in a multi-level shared cache hierarchy for
processors within a node. We also develop new techniques for lower bound
characterization of non-homogeneous CDAGs. We demonstrate the use of the
methodology by analyzing the CDAGs of several numerical algorithms, to develop
lower bounds on data movement for their parallel execution
All-Pairs LCA in DAGs: Breaking through the barrier
Let be an -vertex directed acyclic graph (DAG). A lowest common
ancestor (LCA) of two vertices and is a common ancestor of and
such that no descendant of has the same property. In this paper, we
consider the problem of computing an LCA, if any, for all pairs of vertices in
a DAG. The fastest known algorithms for this problem exploit fast matrix
multiplication subroutines and have running times ranging from
[Bender et al.~SODA'01] down to [Kowaluk and Lingas~ICALP'05]
and [Czumaj et al.~TCS'07]. Somewhat surprisingly, all those
bounds would still be even if matrix multiplication could be
solved optimally (i.e., ). This appears to be an inherent barrier for
all the currently known approaches, which raises the natural question on
whether one could break through the barrier for this problem.
In this paper, we answer this question affirmatively: in particular, we
present an ( for ) algorithm
for finding an LCA for all pairs of vertices in a DAG, which represents the
first improvement on the running times for this problem in the last 13 years. A
key tool in our approach is a fast algorithm to partition the vertex set of the
transitive closure of into a collection of chains and
antichains, for a given parameter . As usual, a chain is a path while an
antichain is an independent set. We then find, for all pairs of vertices, a
\emph{candidate} LCA among the chain and antichain vertices, separately. The
first set is obtained via a reduction to min-max matrix multiplication. The
computation of the second set can be reduced to Boolean matrix multiplication
similarly to previous results on this problem. We finally combine the two
solutions together in a careful (non-obvious) manner
A constructive commutative quantum Lovasz Local Lemma, and beyond
The recently proven Quantum Lovasz Local Lemma generalises the well-known
Lovasz Local Lemma. It states that, if a collection of subspace constraints are
"weakly dependent", there necessarily exists a state satisfying all
constraints. It implies e.g. that certain instances of the kQSAT quantum
satisfiability problem are necessarily satisfiable, or that many-body systems
with "not too many" interactions are always frustration-free.
However, the QLLL only asserts existence; it says nothing about how to find
the state. Inspired by Moser's breakthrough classical results, we present a
constructive version of the QLLL in the setting of commuting constraints,
proving that a simple quantum algorithm converges efficiently to the required
state. In fact, we provide two different proofs, one using a novel quantum
coupling argument, the other a more explicit combinatorial analysis. Both
proofs are independent of the QLLL. So these results also provide independent,
constructive proofs of the commutative QLLL itself, but strengthen it
significantly by giving an efficient algorithm for finding the state whose
existence is asserted by the QLLL. We give an application of the constructive
commutative QLLL to convergence of CP maps.
We also extend these results to the non-commutative setting. However, our
proof of the general constructive QLLL relies on a conjecture which we are only
able to prove in special cases.Comment: 43 pages, 2 conjectures, no figures; unresolved gap in the proof; see
arXiv:1311.6474 or arXiv:1310.7766 for correct proofs of the symmetric cas
Faster algorithms for minimum path cover by graph decomposition
Minimum-cost minimum path cover is a graph-theoretic problem with an application in gene sequencing problems in bioinformatics. This thesis studies decomposing graphs as a preprocessing step for solving the minimum-cost minimum path cover problem. By decomposing graphs, we mean splitting graphs into smaller pieces. When the graph is split along the maximum anti-chains of the graph, the solution for the minimum-cost minimum path cover problem can be computed independently in the small pieces. In the end all the partial solutions are joined together to form the solution for the original graph. As a part of our decomposition pipeline, we will introduce a novel way to solve the unweighted minimum path cover problem and with that algorithm, we will also obtain a new time/space tradeoff for reachability queries in directed acyclic graphs. This thesis also includes an experimental section, where an example implementation of the decomposition is tested on randomly generated graphs. On the test graphs we do not really get a speedup with the decomposition compared to solving the same instances without the decomposition. However, from the experiments we get some insight on the parameters that affect the decomposition's performance and how the implementation could be improved
On Characterizing the Data Access Complexity of Programs
Technology trends will cause data movement to account for the majority of
energy expenditure and execution time on emerging computers. Therefore,
computational complexity will no longer be a sufficient metric for comparing
algorithms, and a fundamental characterization of data access complexity will
be increasingly important. The problem of developing lower bounds for data
access complexity has been modeled using the formalism of Hong & Kung's
red/blue pebble game for computational directed acyclic graphs (CDAGs).
However, previously developed approaches to lower bounds analysis for the
red/blue pebble game are very limited in effectiveness when applied to CDAGs of
real programs, with computations comprised of multiple sub-computations with
differing DAG structure. We address this problem by developing an approach for
effectively composing lower bounds based on graph decomposition. We also
develop a static analysis algorithm to derive the asymptotic data-access lower
bounds of programs, as a function of the problem size and cache size
Flow Decomposition With Subpath Constraints
Flow network decomposition is a natural model for problems where we are given a flow network arising from superimposing a set of weighted paths and would like to recover the underlying data, i.e., decompose the flow into the original paths and their weights. Thus, variations on flow decomposition are often used as subroutines in multiassembly problems such as RNA transcript assembly. In practice, we frequently have access to information beyond flow values in the form of subpaths, and many tools incorporate these heuristically. But despite acknowledging their utility in practice, previous work has not formally addressed the effect of subpath constraints on the accuracy of flow network decomposition approaches. We formalize the flow decomposition with subpath constraints problem, give the first algorithms for it, and study its usefulness for recovering ground truth decompositions. For finding a minimum decomposition, we propose both a heuristic and an FPTalgorithm. Experiments on RNA transcript datasets show that for instances with larger solution path sets, the addition of subpath constraints finds 13% more ground truth solutions when minimal decompositions are found exactly, and 30% more ground truth solutions when minimal decompositions are found heuristically.Peer reviewe
- …