38 research outputs found
On Tree-Constrained Matchings and Generalizations
We consider the following \textsc{Tree-Constrained Bipartite Matching} problem: Given two rooted trees , and a weight function , find a maximum weight matching between nodes of the two trees, such that none of the matched nodes is an ancestor of another matched node in either of the trees. This generalization of the classical bipartite matching problem appears, for example, in the computational analysis of live cell video data. We show that the problem is -hard and thus, unless , disprove a previous claim that it is solvable in polynomial time. Furthermore, we give a -approximation algorithm based on a combination of the local ratio technique and a careful use of the structure of basic feasible solutions of a natural LP-relaxation, which we also show to have an integrality gap of .
In the second part of the paper, we consider a natural generalization of the problem, where trees are replaced by partially ordered sets (posets). We show that the local ratio technique gives a -approximation for the -dimensional matching generalization of the problem, in which the maximum number of incomparable elements below (or above) any given element in each poset is bounded by . We finally give an almost matching integrality gap example, and an inapproximability result showing that the dependence on is most likely unavoidable
The copy-number tree mixture deconvolution problem and applications to multi-sample bulk sequencing tumor data
Cancer is an evolutionary process driven by somatic mutation. This process can be represented as a phylogenetic tree. Constructing such a phylogenetic tree from genome sequencing data is a challenging task due to the mutational complexity of cancer and the fact that nearly all cancer sequencing is of bulk tissue, measuring a super-position of somatic mutations present in different cells. We study the problem of reconstructing tumor phylogenies from copy number aberrations (CNAs) measured in bulk-sequencing data. We introduce the Copy-Number Tree Mixture Deconvolution (CNTMD) problem, which aims to find the phylogenetic tree with the fewest number of CNAs that explain the copy number data from multiple samples of a tumor. CNTMD generalizes two approaches that have been researched intensively in recent years: deconvolution/factorization algorithms that aim to infer the number and proportions of clones in a mixed tumor sample; and phylogenetic models of copy number evolution that model the dependencies between copy number events that affect the same genomic loci. We design an algorithm for solving the CNTMD problem and apply the algorithm to both simulated and real data. On simulated data, we find that our algorithm outperforms existing approaches that perform either deconvolution or phylogenetic tree construction under the assumption of a single tumor clone per sample. On real data, we analyze multiple samples from a prostate cancer patient, identifying clones within these samples and a phylogenetic tree that relates these clones and their differing proportions across samples. This phylogenetic tree provides a higher-resolution view of copy number evolution of this cancer than published analyses
Genome sequence analysis with MonetDB - A case study on Ebola virus diversity
Next-generation sequencing (NGS) technology has led the life sciences into the big data era.
Today, sequencing genomes takes little time and cost, but yields terabytes of data to be stored and analyzed.
Biologists are often exposed to excessively time consuming and error-prone data
management and analysis hurdles.
In this paper, we propose a database management system (DBMS) based
approach to accelerate and substantially simplify genome sequence analysis.
We have extended MonetDB, an open-source
column-based DBMS, with a BAM module, which enables \textit{easy},
\textit{flexible}, and \textit{rapid} management and analysis of sequence
alignment data stored as Sequence Alignment/Map \\(SAM/BAM) files.
We describe the main features of MonetDB/BAM using a case study on Ebola
virus \\genomes
Genome sequence analysis with MonetDB: a case study on Ebola virus diversity
Next-generation sequencing (NGS) technology has led the life sciences into the big data era. Today, sequencing genomes takes little time and cost, but results in terabytes of data to be stored and analysed. Biologists are often exposed to excessively time consuming and error-prone data management and analysis hurdles. In this paper, we propose a database management system (DBMS) based approach to accelerate and substantially simplify genome sequence analysis. We have extended MonetDB, an open-source column-based DBMS, with a BAM module, which enables easy, flexible, and rapid management and analysis of sequence alignment data stored as Sequence Alignment/Map (SAM/BAM) files. We describe the main features of MonetDB/BAM using a case study on Ebola virus genomes