1,323 research outputs found

    A practical fpt algorithm for Flow Decomposition and transcript assembly

    Full text link
    The Flow Decomposition problem, which asks for the smallest set of weighted paths that "covers" a flow on a DAG, has recently been used as an important computational step in transcript assembly. We prove the problem is in FPT when parameterized by the number of paths by giving a practical linear fpt algorithm. Further, we implement and engineer a Flow Decomposition solver based on this algorithm, and evaluate its performance on RNA-sequence data. Crucially, our solver finds exact solutions while achieving runtimes competitive with a state-of-the-art heuristic. Finally, we contextualize our design choices with two hardness results related to preprocessing and weight recovery. Specifically, kk-Flow Decomposition does not admit polynomial kernels under standard complexity assumptions, and the related problem of assigning (known) weights to a given set of paths is NP-hard.Comment: Introduces software package Toboggan: Version 1.0. http://dx.doi.org/10.5281/zenodo.82163

    Efficient Minimum Flow Decomposition via Integer Linear Programming

    Get PDF
    Extended version of RECOMB 2022 paperMinimum flow decomposition (MFD) is an NP-hard problem asking to decompose a network flow into a minimum set of paths (together with associated weights). Variants of it are powerful models in multiassembly problems in Bioinformatics, such as RNA assembly. Owing to its hardness, practical multiassembly tools either use heuristics or solve simpler, polynomial time-solvable versions of the problem, which may yield solutions that are not minimal or do not perfectly decompose the flow. Here, we provide the first fast and exact solver for MFD on acyclic flow networks, based on Integer Linear Programming (ILP). Key to our approach is an encoding of all the exponentially many solution paths using only a quadratic number of variables. We also extend our ILP formulation to many practical variants, such as incorporating longer or paired-end reads, or minimizing flow errors. On both simulated and real-flow splicing graphs, our approach solves any instance inPeer reviewe

    Improving RNA Assembly via Safety and Completeness in Flow Decompositions

    Get PDF
    Extended version of RECOMB 2022 paperDecomposing a network flow into weighted paths is a problem with numerous applications, ranging from networking, transportation planning, to bioinformatics. In some applications we look for a decomposition that is optimal with respect to some property, such as the number of paths used, robustness to edge deletion, or length of the longest path. However, in many bioinformatic applications, we seek a specific decomposition where the paths correspond to some underlying data that generated the flow. In these cases, no optimization criteria guarantee the identification of the correct decomposition. Therefore, we propose to instead report the safe paths, which are subpaths of at least one path in every flow decomposition. In this work, we give the first local characterization of safe paths for flow decompositions in directed acyclic graphs, leading to a practical algorithm for finding the complete set of safe paths. In addition, we evaluate our algorithm on RNA transcript data sets against a trivial safe algorithm (extended unitigs), the recently proposed safe paths for path covers (TCBB 2021) and the popular heuristic greedy-width. On the one hand, we found that besides maintaining perfect precision, our safe and complete algorithm reports a significantly higher coverage (≈50 compared with the other safe algorithms. On the other hand, the greedy-width algorithm although reporting a better coverage, it also reports a significantly lower precision on complex graphs (for genes expressing a large number of transcripts). Overall, our safe and complete algorithm outperforms (by ≈20 greedy-width on a unified metric (F-score) considering both coverage and precision when the evaluated data set has a significant number of complex graphs. Moreover, it also has a superior time (4−5×) and space performance (1.2−2.2×), resulting in a better and more practical approach for bioinformatic applications of flow decomposition.Peer reviewe

    Flow Decomposition With Subpath Constraints

    Get PDF
    Flow network decomposition is a natural model for problems where we are given a flow network arising from superimposing a set of weighted paths and would like to recover the underlying data, i.e., decompose the flow into the original paths and their weights. Thus, variations on flow decomposition are often used as subroutines in multiassembly problems such as RNA transcript assembly. In practice, we frequently have access to information beyond flow values in the form of subpaths, and many tools incorporate these heuristically. But despite acknowledging their utility in practice, previous work has not formally addressed the effect of subpath constraints on the accuracy of flow network decomposition approaches. We formalize the flow decomposition with subpath constraints problem, give the first algorithms for it, and study its usefulness for recovering ground truth decompositions. For finding a minimum decomposition, we propose both a heuristic and an FPTalgorithm. Experiments on RNA transcript datasets show that for instances with larger solution path sets, the addition of subpath constraints finds 13% more ground truth solutions when minimal decompositions are found exactly, and 30% more ground truth solutions when minimal decompositions are found heuristically.Peer reviewe

    Networked Computing in Wireless Sensor Networks for Structural Health Monitoring

    Full text link
    This paper studies the problem of distributed computation over a network of wireless sensors. While this problem applies to many emerging applications, to keep our discussion concrete we will focus on sensor networks used for structural health monitoring. Within this context, the heaviest computation is to determine the singular value decomposition (SVD) to extract mode shapes (eigenvectors) of a structure. Compared to collecting raw vibration data and performing SVD at a central location, computing SVD within the network can result in significantly lower energy consumption and delay. Using recent results on decomposing SVD, a well-known centralized operation, into components, we seek to determine a near-optimal communication structure that enables the distribution of this computation and the reassembly of the final results, with the objective of minimizing energy consumption subject to a computational delay constraint. We show that this reduces to a generalized clustering problem; a cluster forms a unit on which a component of the overall computation is performed. We establish that this problem is NP-hard. By relaxing the delay constraint, we derive a lower bound to this problem. We then propose an integer linear program (ILP) to solve the constrained problem exactly as well as an approximate algorithm with a proven approximation ratio. We further present a distributed version of the approximate algorithm. We present both simulation and experimentation results to demonstrate the effectiveness of these algorithms

    Work-Optimal Parallel Minimum Cuts for Non-Sparse Graphs

    Full text link
    We present the first work-optimal polylogarithmic-depth parallel algorithm for the minimum cut problem on non-sparse graphs. For mn1+ϵm\geq n^{1+\epsilon} for any constant ϵ>0\epsilon>0, our algorithm requires O(mlogn)O(m \log n) work and O(log3n)O(\log^3 n) depth and succeeds with high probability. Its work matches the best O(mlogn)O(m \log n) runtime for sequential algorithms [MN STOC 2020, GMW SOSA 2021]. This improves the previous best work by Geissmann and Gianinazzi [SPAA 2018] by O(log3n)O(\log^3 n) factor, while matching the depth of their algorithm. To do this, we design a work-efficient approximation algorithm and parallelize the recent sequential algorithms [MN STOC 2020; GMW SOSA 2021] that exploit a connection between 2-respecting minimum cuts and 2-dimensional orthogonal range searching.Comment: Updates on this version: Minor corrections for the previous and our resul
    corecore