8 research outputs found

    Algorithmic and Hardness Results for the Colorful Components Problems

    Full text link
    In this paper we investigate the colorful components framework, motivated by applications emerging from comparative genomics. The general goal is to remove a collection of edges from an undirected vertex-colored graph GG such that in the resulting graph Gâ€ČG' all the connected components are colorful (i.e., any two vertices of the same color belong to different connected components). We want Gâ€ČG' to optimize an objective function, the selection of this function being specific to each problem in the framework. We analyze three objective functions, and thus, three different problems, which are believed to be relevant for the biological applications: minimizing the number of singleton vertices, maximizing the number of edges in the transitive closure, and minimizing the number of connected components. Our main result is a polynomial time algorithm for the first problem. This result disproves the conjecture of Zheng et al. that the problem is NP NP-hard (assuming P≠NPP \neq NP). Then, we show that the second problem is APX APX-hard, thus proving and strengthening the conjecture of Zheng et al. that the problem is NP NP-hard. Finally, we show that the third problem does not admit polynomial time approximation within a factor of ∣V∣1/14−ϔ|V|^{1/14 - \epsilon} for any Ï”>0\epsilon > 0, assuming P≠NPP \neq NP (or within a factor of ∣V∣1/2−ϔ|V|^{1/2 - \epsilon}, assuming ZPP≠NPZPP \neq NP).Comment: 18 pages, 3 figure

    Connected Tropical Subgraphs in Vertex-Colored Graphs

    Get PDF
    International audienceA subgraph of a vertex-colored graph is said to be tropical whenever it contains each color of the graph. In this work we study the problem of finding a minimal connected tropical subgraph. We first show that this problem is NP-Hard for trees, interval graphs and split graphs, but polynomial when the number of colors is logarithmic in terms of the order of the graph (i.e. FPT). We then provide upper bounds for the order of the minimal connected tropical subgraph under various conditions. We finally study the problem of finding a connected tropical subgraph in a randomly vertex-colored random graph

    Gene order in rosid phylogeny, inferred from pairwise syntenies among extant genomes

    Get PDF
    BACKGROUND: Ancestral gene order reconstruction for flowering plants has lagged behind developments in yeasts, insects and higher animals, because of the recency of widespread plant genome sequencing, sequencers' embargoes on public data use, paralogies due to whole genome duplication (WGD) and fractionation of undeleted duplicates, extensive paralogy from other sources, and the computational cost of existing methods. RESULTS: We address these problems, using the gene order of four core eudicot genomes (cacao, castor bean, papaya and grapevine) that have escaped any recent WGD events, and two others (poplar and cucumber) that descend from independent WGDs, in inferring the ancestral gene order of the rosid clade and those of its main subgroups, the fabids and malvids. We improve and adapt techniques including the OMG method for extracting large, paralogy-free, multiple orthologies from conflated pairwise synteny data among the six genomes and the PATHGROUPS approach for ancestral gene order reconstruction in a given phylogeny, where some genomes may be descendants of WGD events. We use the gene order evidence to evaluate the hypothesis that the order Malpighiales belongs to the malvids rather than as traditionally assigned to the fabids. CONCLUSIONS: Gene orders of ancestral eudicot species, involving 10,000 or more genes can be reconstructed in an efficient, parsimonious and consistent way, despite paralogies due to WGD and other processes. Pairwise genomic syntenies provide appropriate input to a parameter-free procedure of multiple ortholog identification followed by gene-order reconstruction in solving instances of the "small phylogeny" problem

    A Multilingual Study of Multi-Sentence Compression using Word Vertex-Labeled Graphs and Integer Linear Programming

    Full text link
    Multi-Sentence Compression (MSC) aims to generate a short sentence with the key information from a cluster of similar sentences. MSC enables summarization and question-answering systems to generate outputs combining fully formed sentences from one or several documents. This paper describes an Integer Linear Programming method for MSC using a vertex-labeled graph to select different keywords, with the goal of generating more informative sentences while maintaining their grammaticality. Our system is of good quality and outperforms the state of the art for evaluations led on news datasets in three languages: French, Portuguese and Spanish. We led both automatic and manual evaluations to determine the informativeness and the grammaticality of compressions for each dataset. In additional tests, which take advantage of the fact that the length of compressions can be modulated, we still improve ROUGE scores with shorter output sentences.Comment: Preprint versio

    Polyploidy underlies co-option and diversification of biosynthetic triterpene pathways in the apple tribe

    Get PDF
    Whole-genome duplication (WGD) plays important roles in plant evolution and function, yet little is known about how WGD underlies metabolic diversification of natural products that bear significant medicinal properties, especially in nonmodel trees. Here, we reveal how WGD laid the foundation for co-option and differentiation of medicinally important ursane triterpene pathway duplicates, generating distinct chemotypes between species and between developmental stages in the apple tribe. After generating chromosome-level assemblies of a widely cultivated loquat variety and Gillenia trifoliata, we define differentially evolved, duplicated gene pathways and date the WGD in the apple tribe at 13.5 to 27.1 Mya, much more recent than previously thought. We then functionally characterize contrasting metabolic pathways responsible for major triterpene biosynthesis in G. trifoliata and loquat, which pre- and postdate the Maleae WGD, respectively. Our work mechanistically details the metabolic diversity that arose post-WGD and provides insights into the genomic basis of medicinal properties of loquat, which has been used in both traditional and modern medicines
    corecore