38 research outputs found

    A new graph-based method for pairwise global network alignment

    Get PDF

    On Tree-Constrained Matchings and Generalizations

    Get PDF
    We consider the following \textsc{Tree-Constrained Bipartite Matching} problem: Given two rooted trees T1=(V1,E1)T_1=(V_1,E_1), T2=(V2,E2)T_2=(V_2,E_2) and a weight function w:V1×V2R+w: V_1\times V_2 \mapsto \mathbb{R}_+, find a maximum weight matching M\mathcal{M} between nodes of the two trees, such that none of the matched nodes is an ancestor of another matched node in either of the trees. This generalization of the classical bipartite matching problem appears, for example, in the computational analysis of live cell video data. We show that the problem is APX\mathcal{APX}-hard and thus, unless P=NP\mathcal{P} = \mathcal{NP}, disprove a previous claim that it is solvable in polynomial time. Furthermore, we give a 22-approximation algorithm based on a combination of the local ratio technique and a careful use of the structure of basic feasible solutions of a natural LP-relaxation, which we also show to have an integrality gap of 2o(1)2-o(1). In the second part of the paper, we consider a natural generalization of the problem, where trees are replaced by partially ordered sets (posets). We show that the local ratio technique gives a 2kρ2k\rho-approximation for the kk-dimensional matching generalization of the problem, in which the maximum number of incomparable elements below (or above) any given element in each poset is bounded by ρ\rho. We finally give an almost matching integrality gap example, and an inapproximability result showing that the dependence on ρ\rho is most likely unavoidable

    On tree-constrained matchings and generalizations

    Get PDF

    The copy-number tree mixture deconvolution problem and applications to multi-sample bulk sequencing tumor data

    Get PDF
    Cancer is an evolutionary process driven by somatic mutation. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the mutational complexity of cancer and the fact that nearly all cancer sequencing is of bulk tissue, measuring a super-position of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy number data from multiple samples of a tumor. CNTMD generalizes two approaches that have been researched intensively in recent years: deconvolution/factorization algorithms that aim to infer the number and proportions of clones in a mixed tumor sample; and phylogenetic models of copy number evolution that model the dependencies between copy number events that affect the same genomic loci. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that perform either deconvolution or phylogenetic tree construction under the assumption of a single tumor clone per sample. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher-resolution view of copy number evolution of this cancer than published analyses

    Genome sequence analysis with MonetDB - A case study on Ebola virus diversity

    Get PDF
    Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but yields terabytes of data to be stored and analyzed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables \textit{easy}, \textit{flexible}, and \textit{rapid} management and analysis of sequence alignment data stored as Sequence Alignment/Map \\(SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus \\genomes

    Genome sequence analysis with MonetDB: a case study on Ebola virus diversity

    Get PDF
    Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but results in terabytes of data to be stored and analysed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus genomes
    corecore