721 research outputs found

    MRL and SuperFine+MRL: new supertree methods

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood.</p> <p>Results</p> <p>We compared MRP and SuperFine+MRP with MRL and SuperFine+MRL on simulated and biological datasets. We examined the MRP and MRL scores of each method on a wide range of datasets, as well as the resulting topological accuracy of the trees. Our experimental results show that MRL, coupled with a very good ML heuristic such as RAxML, produced more accurate trees than MRP, and MRL scores were more strongly correlated with topological accuracy than MRP scores.</p> <p>Conclusions</p> <p>SuperFine+MRP, when based upon a good MP heuristic, such as TNT, produces among the best scores for both MRP and MRL, and is generally faster and more topologically accurate than other supertree methods we tested.</p

    Using Sequence Similarity Networks for Visualization of Relationships Across Diverse Protein Superfamilies

    Get PDF
    The dramatic increase in heterogeneous types of biological data—in particular, the abundance of new protein sequences—requires fast and user-friendly methods for organizing this information in a way that enables functional inference. The most widely used strategy to link sequence or structure to function, homology-based function prediction, relies on the fundamental assumption that sequence or structural similarity implies functional similarity. New tools that extend this approach are still urgently needed to associate sequence data with biological information in ways that accommodate the real complexity of the problem, while being accessible to experimental as well as computational biologists. To address this, we have examined the application of sequence similarity networks for visualizing functional trends across protein superfamilies from the context of sequence similarity. Using three large groups of homologous proteins of varying types of structural and functional diversity—GPCRs and kinases from humans, and the crotonase superfamily of enzymes—we show that overlaying networks with orthogonal information is a powerful approach for observing functional themes and revealing outliers. In comparison to other primary methods, networks provide both a good representation of group-wise sequence similarity relationships and a strong visual and quantitative correlation with phylogenetic trees, while enabling analysis and visualization of much larger sets of sequences than trees or multiple sequence alignments can easily accommodate. We also define important limitations and caveats in the application of these networks. As a broadly accessible and effective tool for the exploration of protein superfamilies, sequence similarity networks show great potential for generating testable hypotheses about protein structure-function relationships

    Formulations and algorithms for the optimum communication spanning tree problem

    Get PDF
    The Optimum Communication Spanning Tree problem (OCT) has applications in many fields of study such as logistics, telecommunications and bioinformatics. This problem receives as input an undirected graph with weighted edges and requirement value for each pair of nodes, and seeks for a spanning tree that minimizes the communication cost, given by the sum of requirement of each pair of nodes times the distance separating them in the tree. In this work we design a new integer formulation for OCT as well as four different strategies of evolutionary algorithms and a combined strategy with simulated annealing. We give public access to our implementations. We test our approaches on instances from the literature and from real-world data sets. The experiments show that our best strategies were able to obtain very accurate solutions, getting close to the best known value for all tested instances, improving the results of previous metaheuristics from the literature.O problema da árvore geradora de comunicação ótima possui aplicação em diversos campos de estudo como logística, telecomunicações e bioinformática. Esse problema recebe como entrada um grafo com pesos nas arestas e um valor de requerimento entre cada par de nodos do grafo, e procura por uma árvore geradora que minimiza o custo de comunicação que é calculado pela soma dos requerimentos de cada par de nodos vezes a distância que os separa na árvore. Neste trabalho propomos uma nova formulação inteira para o problema e desenvolvemos quatro estratégias diferentes de algoritmos evolutivos e uma combinada com o método simulated annealing, dando acesso público às nossas implementações. Testamos nossos algoritmos com instâncias da literatura e com outras baseadas em conjuntos de dados do mundo real. Os experimentos mostram que nossas melhores estratégias foram capazes de obter soluções muito precisas para todas as instâncias testadas, melhorando os resultados de metaheurísticas anteriores da literatura

    The Transitive Minimum Manhattan Subnetwork Problem in 3 Dimensions

    Get PDF
    We consider the Minimum Manhattan Subnetwork (MMSN) Problem which generalizes the already known Minimum Manhattan Network (MMN) Problem: Given a set P of n points in the plane, find shortest rectilinear paths between all pairs of points. These paths form a network, the total length of which has to be minimized. From a graph theoretical point of view, a MMN is a 1-spanner with respect to the L_1 metric. In contrast to the MMN problem, a solution to the MMSN problem does not demand L_1 -shortest paths for all point pairs, but only for a given set R subseteq P imes P of pairs. The complexity status of the MMN problem is still unsolved in geq 2 dimensions, whereas the MMSN was shown to be NP -complete considering general relations R in the plane. We restrict the MMSN problem to transitive relations R_T ({em Transitive} Minimum Manhattan Subnetwork (TMMSN) Problem) and show that the TMMSN problem is Max-SNP -complete with epsilon<frac{1}{8} in 3 dimensions

    Molecular phylogeny of brachiopods and phoronids based on nuclear-encoded small subunit ribosomal RNA gene sequences

    Get PDF
    Brachiopod and phoronid phylogeny is inferred from SSU rDNA sequences of 28 articulate and nine inarticulate brachiopods, three phoronids, two ectoprocts and various outgroups, using gene trees reconstructed by weighted parsimony, distance and maximum likelihood methods. Of these sequences, 33 from brachiopods, two from phoronids and one each from an ectoproct and a priapulan are newly determined. The brachiopod sequences belong to 31 different genera and thus survey about 10% of extant genus-level diversity. Sequences determined in different laboratories and those from closely related taxa agree well, but evidence is presented suggesting that one published phoronid sequence (GenBank accession UO12648) is a brachiopod-phoronid chimaera, and this sequence is excluded from the analyses. The chiton, Acanthopleura, is identified as the phenetically proximal outgroup; other selected outgroups were chosen to allow comparison with recent, non-molecular analyses of brachiopod phylogeny. The different outgroups and methods of phylogenetic reconstruction lead to similar results, with differences mainly in the resolution of weakly supported ancient and recent nodes, including the divergence of inarticulate brachiopod sub-phyla, the position of the rhynchonellids in relation to long- and short-looped articulate brachiopod clades and the relationships of some articulate brachiopod genera and species. Attention is drawn to the problem presented by nodes that are strongly supported by non-molecular evidence but receive only low bootstrap resampling support. Overall, the gene trees agree with morphology-based brachiopod taxonomy, but novel relationships are tentatively suggested for thecideidine and megathyrid brachiopods. Articulate brachiopods are found to be monophyletic in all reconstructions, but monophyly of inarticulate brachiopods and the possible inclusion of phoronids in the inarticulate brachiopod clade are less strongly established. Phoronids are clearly excluded from a sister-group relationship with articulate brachiopods, this proposed relationship being due to the rejected, chimaeric sequence (GenBank UO12648). Lineage relative rate tests show no heterogeneity of evolutionary rate among articulate brachiopod sequences, but indicate that inarticulate brachiopod plus phoronid sequences evolve somewhat more slowly. Both brachiopods and phoronids evolve slowly by comparison with other invertebrates. A number of palaeontologically dated times of earliest appearance are used to make upper and lower estimates of the global rate of brachiopod SSU rDNA evolution, and these estimates are used to infer the likely divergence times of other nodes in the gene tree. There is reasonable agreement between most inferred molecular and palaeontological ages. The estimated rates of SSU rDNA sequence evolution suggest that the last common ancestor of brachiopods, chitons and other protostome invertebrates (Lophotrochozoa and Ecdysozoa) lived deep in Precambrian time. Results of this first DNA-based, taxonomically representative analysis of brachiopod phylogeny are in broad agreement with current morphology-based classification and systematics and are largely consistent with the hypothesis that brachiopod shell ontogeny and morphology are a good guide to phylogeny

    Culture Enriched Molecular Profiling of the Cystic Fibrosis Airway Microbiome

    Get PDF
    The microbiome of the respiratory tract, including the nasopharyngeal and oropharyngeal microbiota, is a dynamic community of microorganisms that is highly diverse. The cystic fibrosis (CF) airway microbiome refers to the polymicrobial communities present in the lower airways of CF patients. It is comprised of chronic opportunistic pathogens (such as Pseudomonas aeruginosa) and a variety of organisms derived mostly from the normal microbiota of the upper respiratory tract. The complexity of these communities has been inferred primarily from culture independent molecular profiling. As with most microbial communities it is generally assumed that most of the organisms present are not readily cultured. Our culture collection generated using more extensive cultivation approaches, reveals a more complex microbial community than that obtained by conventional CF culture methods. To directly evaluate the cultivability of the airway microbiome, we examined six samples in depth using culture-enriched molecular profiling which combines culture-based methods with the molecular profiling methods of terminal restriction fragment length polymorphisms and 16S rRNA gene sequencing. We demonstrate that combining culture-dependent and culture-independent approaches enhances the sensitivity of either approach alone. Our techniques were able to cultivate 43 of the 48 families detected by deep sequencing; the five families recovered solely by culture-independent approaches were all present at very low abundance (<0.002% total reads). 46% of the molecular signatures detected by culture from the six patients were only identified in an anaerobic environment, suggesting that a large proportion of the cultured airway community is composed of obligate anaerobes. Most significantly, using 20 growth conditions per specimen, half of which included anaerobic cultivation and extended incubation times we demonstrate that the majority of bacteria present can be cultured

    The Transitive Minimum Manhattan Subnetwork Problem in 3 Dimensions

    Get PDF
    We consider the Minimum Manhattan Subnetwork (MMSN) Problem which generalizes the already known Minimum Manhattan Network (MMN) Problem: Given a set P of n points in the plane, find shortest rectilinear paths between all pairs of points. These paths form a network, the total length of which has to be minimized. From a graph theoretical point of view, a MMN is a 1-spanner with respect to the L_1 metric. In contrast to the MMN problem, a solution to the MMSN problem does not demand L_1 -shortest paths for all point pairs, but only for a given set R subseteq P imes P of pairs. The complexity status of the MMN problem is still unsolved in geq 2 dimensions, whereas the MMSN was shown to be NP -complete considering general relations R in the plane. We restrict the MMSN problem to transitive relations R_T ({em Transitive} Minimum Manhattan Subnetwork (TMMSN) Problem) and show that the TMMSN problem is Max-SNP -complete with epsilon<frac{1}{8} in 3 dimensions

    A Lagrangian relaxation approach for the multiple sequence alignment problem

    Get PDF
    We present a branch-and-bound (bb) algorithm for the multiple sequence alignment problem (MSA), one of the most important problems in computational biology. The upper bound at each bb node is based on a Lagrangian relaxation of an integer linear programming formulation for MSA. Dualizing certain inequalities, the Lagrangian subproblem becomes a pairwise alignment problem, which can be solved efficiently by a dynamic programming approach. Due to a reformulation w.r.t. additionally introduced variables prior to relaxation we improve the convergence rate dramatically while at the same time being able to solve the Lagrangian problem efficiently. Our experiments show that our implementation, although preliminary, outperforms all exact algorithms for the multiple sequence alignment problem
    • …
    corecore