180 research outputs found

    On Characterizing the Data Movement Complexity of Computational DAGs for Parallel Execution

    Get PDF
    Technology trends are making the cost of data movement increasingly dominant, both in terms of energy and time, over the cost of performing arithmetic operations in computer systems. The fundamental ratio of aggregate data movement bandwidth to the total computational power (also referred to the machine balance parameter) in parallel computer systems is decreasing. It is there- fore of considerable importance to characterize the inherent data movement requirements of parallel algorithms, so that the minimal architectural balance parameters required to support it on future systems can be well understood. In this paper, we develop an extension of the well-known red-blue pebble game to develop lower bounds on the data movement complexity for the parallel execution of computational directed acyclic graphs (CDAGs) on parallel systems. We model multi-node multi-core parallel systems, with the total physical memory distributed across the nodes (that are connected through some interconnection network) and in a multi-level shared cache hierarchy for processors within a node. We also develop new techniques for lower bound characterization of non-homogeneous CDAGs. We demonstrate the use of the methodology by analyzing the CDAGs of several numerical algorithms, to develop lower bounds on data movement for their parallel execution

    All-Pairs LCA in DAGs: Breaking through the O(n2.5)O(n^{2.5}) barrier

    Get PDF
    Let G=(V,E)G=(V,E) be an nn-vertex directed acyclic graph (DAG). A lowest common ancestor (LCA) of two vertices uu and vv is a common ancestor ww of uu and vv such that no descendant of ww has the same property. In this paper, we consider the problem of computing an LCA, if any, for all pairs of vertices in a DAG. The fastest known algorithms for this problem exploit fast matrix multiplication subroutines and have running times ranging from O(n2.687)O(n^{2.687}) [Bender et al.~SODA'01] down to O(n2.615)O(n^{2.615}) [Kowaluk and Lingas~ICALP'05] and O(n2.569)O(n^{2.569}) [Czumaj et al.~TCS'07]. Somewhat surprisingly, all those bounds would still be Ω(n2.5)\Omega(n^{2.5}) even if matrix multiplication could be solved optimally (i.e., ω=2\omega=2). This appears to be an inherent barrier for all the currently known approaches, which raises the natural question on whether one could break through the O(n2.5)O(n^{2.5}) barrier for this problem. In this paper, we answer this question affirmatively: in particular, we present an O~(n2.447)\tilde O(n^{2.447}) (O~(n7/3)\tilde O(n^{7/3}) for ω=2\omega=2) algorithm for finding an LCA for all pairs of vertices in a DAG, which represents the first improvement on the running times for this problem in the last 13 years. A key tool in our approach is a fast algorithm to partition the vertex set of the transitive closure of GG into a collection of O()O(\ell) chains and O(n/)O(n/\ell) antichains, for a given parameter \ell. As usual, a chain is a path while an antichain is an independent set. We then find, for all pairs of vertices, a \emph{candidate} LCA among the chain and antichain vertices, separately. The first set is obtained via a reduction to min-max matrix multiplication. The computation of the second set can be reduced to Boolean matrix multiplication similarly to previous results on this problem. We finally combine the two solutions together in a careful (non-obvious) manner

    A constructive commutative quantum Lovasz Local Lemma, and beyond

    Get PDF
    The recently proven Quantum Lovasz Local Lemma generalises the well-known Lovasz Local Lemma. It states that, if a collection of subspace constraints are "weakly dependent", there necessarily exists a state satisfying all constraints. It implies e.g. that certain instances of the kQSAT quantum satisfiability problem are necessarily satisfiable, or that many-body systems with "not too many" interactions are always frustration-free. However, the QLLL only asserts existence; it says nothing about how to find the state. Inspired by Moser's breakthrough classical results, we present a constructive version of the QLLL in the setting of commuting constraints, proving that a simple quantum algorithm converges efficiently to the required state. In fact, we provide two different proofs, one using a novel quantum coupling argument, the other a more explicit combinatorial analysis. Both proofs are independent of the QLLL. So these results also provide independent, constructive proofs of the commutative QLLL itself, but strengthen it significantly by giving an efficient algorithm for finding the state whose existence is asserted by the QLLL. We give an application of the constructive commutative QLLL to convergence of CP maps. We also extend these results to the non-commutative setting. However, our proof of the general constructive QLLL relies on a conjecture which we are only able to prove in special cases.Comment: 43 pages, 2 conjectures, no figures; unresolved gap in the proof; see arXiv:1311.6474 or arXiv:1310.7766 for correct proofs of the symmetric cas

    Faster algorithms for minimum path cover by graph decomposition

    Get PDF
    Minimum-cost minimum path cover is a graph-theoretic problem with an application in gene sequencing problems in bioinformatics. This thesis studies decomposing graphs as a preprocessing step for solving the minimum-cost minimum path cover problem. By decomposing graphs, we mean splitting graphs into smaller pieces. When the graph is split along the maximum anti-chains of the graph, the solution for the minimum-cost minimum path cover problem can be computed independently in the small pieces. In the end all the partial solutions are joined together to form the solution for the original graph. As a part of our decomposition pipeline, we will introduce a novel way to solve the unweighted minimum path cover problem and with that algorithm, we will also obtain a new time/space tradeoff for reachability queries in directed acyclic graphs. This thesis also includes an experimental section, where an example implementation of the decomposition is tested on randomly generated graphs. On the test graphs we do not really get a speedup with the decomposition compared to solving the same instances without the decomposition. However, from the experiments we get some insight on the parameters that affect the decomposition's performance and how the implementation could be improved

    On Characterizing the Data Access Complexity of Programs

    Full text link
    Technology trends will cause data movement to account for the majority of energy expenditure and execution time on emerging computers. Therefore, computational complexity will no longer be a sufficient metric for comparing algorithms, and a fundamental characterization of data access complexity will be increasingly important. The problem of developing lower bounds for data access complexity has been modeled using the formalism of Hong & Kung's red/blue pebble game for computational directed acyclic graphs (CDAGs). However, previously developed approaches to lower bounds analysis for the red/blue pebble game are very limited in effectiveness when applied to CDAGs of real programs, with computations comprised of multiple sub-computations with differing DAG structure. We address this problem by developing an approach for effectively composing lower bounds based on graph decomposition. We also develop a static analysis algorithm to derive the asymptotic data-access lower bounds of programs, as a function of the problem size and cache size

    Flow Decomposition With Subpath Constraints

    Get PDF
    Flow network decomposition is a natural model for problems where we are given a flow network arising from superimposing a set of weighted paths and would like to recover the underlying data, i.e., decompose the flow into the original paths and their weights. Thus, variations on flow decomposition are often used as subroutines in multiassembly problems such as RNA transcript assembly. In practice, we frequently have access to information beyond flow values in the form of subpaths, and many tools incorporate these heuristically. But despite acknowledging their utility in practice, previous work has not formally addressed the effect of subpath constraints on the accuracy of flow network decomposition approaches. We formalize the flow decomposition with subpath constraints problem, give the first algorithms for it, and study its usefulness for recovering ground truth decompositions. For finding a minimum decomposition, we propose both a heuristic and an FPTalgorithm. Experiments on RNA transcript datasets show that for instances with larger solution path sets, the addition of subpath constraints finds 13% more ground truth solutions when minimal decompositions are found exactly, and 30% more ground truth solutions when minimal decompositions are found heuristically.Peer reviewe
    corecore