779 research outputs found

    Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data

    Full text link
    Background. A large number of algorithms is being developed to reconstruct evolutionary models of individual tumours from genome sequencing data. Most methods can analyze multiple samples collected either through bulk multi-region sequencing experiments or the sequencing of individual cancer cells. However, rarely the same method can support both data types. Results. We introduce TRaIT, a computational framework to infer mutational graphs that model the accumulation of multiple types of somatic alterations driving tumour evolution. Compared to other tools, TRaIT supports multi-region and single-cell sequencing data within the same statistical framework, and delivers expressive models that capture many complex evolutionary phenomena. TRaIT improves accuracy, robustness to data-specific errors and computational complexity compared to competing methods. Conclusions. We show that the application of TRaIT to single-cell and multi-region cancer datasets can produce accurate and reliable models of single-tumour evolution, quantify the extent of intra-tumour heterogeneity and generate new testable experimental hypotheses

    Algebraic comparison of metabolic networks, phylogenetic inference, and metabolic innovation

    Get PDF
    BACKGROUND: Comparison of metabolic networks is typically performed based on the organisms' enzyme contents. This approach disregards functional replacements as well as orthologies that are misannotated. Direct comparison of the structure of metabolic networks can circumvent these problems. RESULTS: Metabolic networks are naturally represented as directed hypergraphs in such a way that metabolites are nodes and enzyme-catalyzed reactions form (hyper)edges. The familiar operations from set algebra (union, intersection, and difference) form a natural basis for both the pairwise comparison of networks and identification of distinct metabolic features of a set of algorithms. We report here on an implementation of this approach and its application to the procaryotes. CONCLUSION: We demonstrate that metabolic networks contain valuable phylogenetic information by comparing phylogenies obtained from network comparisons with 16S RNA phylogenies. The algebraic approach to metabolic networks is suitable to study metabolic innovations in two sets of organisms, free living microbes and Pyrococci, as well as obligate intracellular pathogens

    Group-theoretic models of the inversion process in bacterial genomes

    Full text link
    The variation in genome arrangements among bacterial taxa is largely due to the process of inversion. Recent studies indicate that not all inversions are equally probable, suggesting, for instance, that shorter inversions are more frequent than longer, and those that move the terminus of replication are less probable than those that do not. Current methods for establishing the inversion distance between two bacterial genomes are unable to incorporate such information. In this paper we suggest a group-theoretic framework that in principle can take these constraints into account. In particular, we show that by lifting the problem from circular permutations to the affine symmetric group, the inversion distance can be found in polynomial time for a model in which inversions are restricted to acting on two regions. This requires the proof of new results in group theory, and suggests a vein of new combinatorial problems concerning permutation groups on which group theorists will be needed to collaborate with biologists. We apply the new method to inferring distances and phylogenies for published Yersinia pestis data.Comment: 19 pages, 7 figures, in Press, Journal of Mathematical Biolog

    A Fast-Graph Approach to Modeling Similarity of Whole Genomes

    Get PDF
    As increasing numbers of closely related genomic sequences become available, the need to develop methods for detecting fine differences among them also grows apparent. Several calls have been made for improved algorithms to exploit the wealth of pathogenic viral and bacterial sequence data that are rapidly becoming available to researchers. The first stage of our research addresses the computational limitations associated with whole-genome comparisons of large numbers of subspecies sequences. We investigate the potential for the use of fast, word-based comparative measures to approximate computationally expensive, full alignment comparison methods. Recent advances in next generation sequencing are providing a number of large whole-genome sequence datasets stemming from globally distributed disease occurrences. This offers an unprecedented opportunity for epidemiological studies and the development of computationally efficient, robust tools for such studies. In the second stage of our research, we present an approach that enables a quick, effective, and robust epidemiological analysis of large whole-genome datasets. We then apply our method to a complex dataset of over 4,200 globally sampled Influenza A virus isolates from multiple host types, subtypes and years. These sequences are compared using an alignment-free method that runs in linear-time. These comparisons enable us to build 2-dimensional graphs that represent the relationships between sequences, where sequences are viewed as vertices, and high-degree sequence similarity as edges. These graphs prove useful, as they are able to model potential disease transmission paths when applied to viral sequences. Mixing patterns are then used to study the occurrence and patterns of edges between different types of sequence groups, such as the host type and year of collection, to better understand the potential of genotypic transfer between sequence groups

    Modelos espaciais de especiação

    Get PDF
    Orientador: Marcus Aloizio Martinez de AguiarTese (doutorado) - Universidade Estadual de Campinas, Instituto de BiologiaResumo: A impressionante diversidade observada na natureza nos faz pensar quais processos podem ser responsáveis por tamanha variedade. Responder esta questão foi o objetivo de muito biólogos evolutivos, que tentaram descobrir os processos olhando para os padrões que eles poderiam gerar. O desenvolvimento de modelos teóricos, em particular modelos baseados em indivíduo, é indispensável para lidar com esta questão, pois apenas com modelos podemos isolar processos específicos em um ambiente controlado, o que não é completamente possível em experimentos naturais, e em um tempo realizável. Nesta tese eu investiguei quais são os padrões gerados por um modelo de especiação baseado em indivíduo no qual apenas processos neutros e o espaço estão regulando a dinâmica populacional. A população evoluiu sob as influências combinadas de reprodução sexuada, mutação e dispersão. No primeiro capítulo, desenvolvemos um algoritmo que registra as relações de ancestralidade-descendência entre pares de indivíduos da comunidade final, e um algoritmo que registra os tempos exatos de especiação e extinção das espécies. Com ambas as informações foi possível construir genealogias e filogenias, a partir das quais padrões macroevolutivos foram obtidos, servindo como um referencial de evolução neutra. O segundo capítulo foi dedicado a usar esta nova informação filogenética do modelo para investigar se diferentes contextos geográficos de especiação (parapátrica e simpátrica) deixam assinaturas distintas nos padrões macroevolutivos de diversificação, como a simetria de árvores e a velocidade da diversificação. Os resultados das simulações foram comparados com dados empíricos de radiações evolutivas. O terceiro capítulo, por fim, incorporou barreiras espaciais ao modelo anterior, para buscar por possíveis assinaturas deixadas pela especiação alopátrica, com barreiras variando em tamanho e permitindo que indivíduos as cruzassem dependendo de seu tamanho. O modelo foi adaptado ao sistema particular dos macacos Platyrrhini, com o espaço modelado de modo a se ajustar à forma da América do Sul, e as barreiras representando os principais rios da região. O número de gerações foi adaptado a diferentes subfamílias e gêneros dos Platyrrhini, para examinar a "Riverine Hypothesis" com um enfoque de modelagem. Os resultados dos três capítulos mostraram que o espaço possui um papel fundamental na especiação quando processos neutros são os únicos a agir sob as populações, com o contexto geográfico da especiação deixando assinaturas nos padrões macroevolutivos emergentes. A incorporação de processos não neutros e a investigação do papel da extinção em moldar os padrões são possíveis passos seguintes para esta pesquisaAbstract: The impressive diversity observed in nature makes us wonder what processes could be responsible for so great variety. The answer to this question has been the goal of many evolutionary biologists, who have tried to discover the processes looking for the patterns they would generate. The development of theoretical models, particularly individual-based models, is imperative to address this question, as only with models we can isolate specific processes in a controled environment, something not completely possible in natural experiments, and in a feasible time. In this thesis I investigated what are the patterns generated by an individual-based model of speciation in which only neutral processes and the space are regulating the dynamics of the population. The population evolved under the combined influences of sexual reproduction, mutation and dispersal. In the first chapter, we developed an algorithm that records the ancestor-descendant relationships between each pair of individuals of the final community, and an algorithm which records the exact speciation and extinction times of species. With both information was possible to construct genealogies and phylogenies, from which macroevolutionary patterns could be derived, offering a neutral referential of evolution. The second chapter was dedicated to use this new phylogenetic information of the model to investigate if different geographical contexts of speciation (parapatric and sympatric) leave different signatures in the macroevolutionary patterns of diversification, like tree symmetry and the speed of diversification. The simulations results were compared with empirical data about evolutionary radiations. The third chapter, lastly, incorporated spatial barriers to the previous model with the goal of looking for possible signatures left by allopatric speciation, with barriers varying in sizes and allowing the crossing of individuals depending on the individual size. The model adapted to the particular system of Platyrrhini monkeys, with space modeled to fit the shape of South America, and spatial barriers representing the main rivers of the region. The number of generations was adapted to conform different subfamilies and genera of Platyrrhini monkeys, with the aim of examine the Riverine Hypothesis in a modeling approach. All results from the three chapters have showed that the space plays a fundamental role in speciation when neutral processes are the only acting upon populations, with the geographic context of speciation leaving signatures in the macroevolutionary patterns emerged. The incorporation of non neutral processes and the investigation of the role of extinction in shaping the patterns are possible next steps to this researchDoutoradoEcologiaDoutora em EcologiaCAPE

    Algebraic comparison of meta bolic networks, phylogenetic inference, and metabolic innovation

    Get PDF
    Metabolic networks are naturally represented as directed hypergraphs in such a way that metabolites are nodes and enzyme-catalyzed reactions form (hyper)edges. The familiar operations from set algebra (union, intersection, and difference) form a natural basis for both the pairwise comparison of networks and identification of distinct metabolic features of a set of algorithms. We report here on an implementation of this approach and its application to the procaryotes. We demonstrate that metabolic networks contain valuable phylogenetic information by comparing phylogenies obtained from network comparisons with 16S RNA phylogenies. We then used the same software to study metabolic innovations in two sets of organisms, free living microbes and Pyrococci, as well as obligate intracellular pathogens

    Consistency and convergence rate of phylogenetic inference via regularization

    Full text link
    It is common in phylogenetics to have some, perhaps partial, information about the overall evolutionary tree of a group of organisms and wish to find an evolutionary tree of a specific gene for those organisms. There may not be enough information in the gene sequences alone to accurately reconstruct the correct "gene tree." Although the gene tree may deviate from the "species tree" due to a variety of genetic processes, in the absence of evidence to the contrary it is parsimonious to assume that they agree. A common statistical approach in these situations is to develop a likelihood penalty to incorporate such additional information. Recent studies using simulation and empirical data suggest that a likelihood penalty quantifying concordance with a species tree can significantly improve the accuracy of gene tree reconstruction compared to using sequence data alone. However, the consistency of such an approach has not yet been established, nor have convergence rates been bounded. Because phylogenetics is a non-standard inference problem, the standard theory does not apply. In this paper, we propose a penalized maximum likelihood estimator for gene tree reconstruction, where the penalty is the square of the Billera-Holmes-Vogtmann geodesic distance from the gene tree to the species tree. We prove that this method is consistent, and derive its convergence rate for estimating the discrete gene tree structure and continuous edge lengths (representing the amount of evolution that has occurred on that branch) simultaneously. We find that the regularized estimator is "adaptive fast converging," meaning that it can reconstruct all edges of length greater than any given threshold from gene sequences of polynomial length. Our method does not require the species tree to be known exactly; in fact, our asymptotic theory holds for any such guide tree.Comment: 34 pages, 5 figures. To appear on The Annals of Statistic
    • …
    corecore