692 research outputs found

    Cophylogenetic analysis of dated trees

    Get PDF
    Parasites and the associations they form with their hosts is an important area of research due to the associated health risks which parasites pose to the human population. The associations parasites form with their hosts are responsible for a number of the worst emerging diseases impacting global health today, including Ebola, HIV, and malaria. Macro-scale coevolutionary research aims to analyse these associations to provide further insights into these deadly diseases. This approach, first considered by Fahrenholz in 1913, has been applied to hundreds of coevolutionary systems and remains the most robust means to infer the underlying relationships which form between coevolving species. While reconciling the coevolutionary relationships between a pair of evolutionary systems is NP-Hard, it has been shown that if dating information exists there is a polynomial solution. These solutions however are computationally expensive, and are quickly becoming infeasible due to the rapid growth of phylogenetic data. If the rate of growth continues in line with the last three decades, the current means for analysing dated systems will become computationally infeasible. Within this thesis a collection of algorithms are introduced which aim to address this problem. This includes the introduction of the most efficient solution for analysing dated coevolutionary systems optimally, along with two linear time heuristics which may be applied where traditional algorithms are no longer feasible, while still offering a high degree of accuracy 91%. Finally, this work integrates these incremental results into a single model which is able to handle widespread parasitism, the case where parasites infect multiple hosts. This proposed model reconciles two competing theories of widespread parasitism, while also providing an accuracy improvement of 21%, one of the largest single improvements provided in this field to date. As such, the set of algorithms introduced within this thesis offers another step toward a unified coevolutionary analysis framework, consistent with Fahrenholz original coevolutionary analysis model

    Efficient Change Management of XML Documents

    Get PDF
    XML-based documents play a major role in modern information architectures and their corresponding work-flows. In this context, the ability to identify and represent differences between two versions of a document is essential. A second important aspect is the merging of document versions, which becomes crucial in parallel editing processes. Many different approaches exist that meet these challenges. Most rely on operational transformation or document annotation. In both approaches, the operations leading to changes are tracked, which requires corresponding editing applications. In the context of software development, however, a state-based approach is common. Here, document versions are compared and merged using external tools, called diff and patch. This allows users for freely editing documents without being tightened to special tools. Approaches exist that are able to compare XML documents. A corresponding merge capability is still not available. In this thesis, I present a comprehensive framework that allows for comparing and merging of XML documents using a state-based approach. Its design is based on an analysis of XML documents and their modification patterns. The heart of the framework is a context-oriented delta model. I present a diff algorithm that appears to be highly efficient in terms of speed and delta quality. The patch algorithm is able to merge document versions efficiently and reliably. The efficiency and the reliability of my approach are verified using a competitive test scenario

    Gene Family Histories: Theory and Algorithms

    Get PDF
    Detailed gene family histories and reconciliations with species trees are a prerequisite for studying associations between genetic and phenotypic innovations. Even though the true evolutionary scenarios are usually unknown, they impose certain constraints on the mathematical structure of data obtained from simple yes/no questions in pairwise comparisons of gene sequences. Recent advances in this field have led to the development of methods for reconstructing (aspects of) the scenarios on the basis of such relation data, which can most naturally be represented by graphs on the set of considered genes. We provide here novel characterizations of best match graphs (BMGs) which capture the notion of (reciprocal) best hits based on sequence similarities. BMGs provide the basis for the detection of orthologous genes (genes that diverged after a speciation event). There are two main sources of error in pipelines for orthology inference based on BMGs. Firstly, measurement errors in the estimation of best matches from sequence similarity in general lead to violations of the characteristic properties of BMGs. The second issue concerns the reconstruction of the orthology relation from a BMG. We show how to correct estimated BMG to mathematically valid ones and how much information about orthologs is contained in BMGs. We then discuss implicit methods for horizontal gene transfer (HGT) inference that focus on pairs of genes that have diverged only after the divergence of the two species in which the genes reside. This situation defines the edge set of an undirected graph, the later-divergence-time (LDT) graph. We explore the mathematical structure of LDT graphs and show how much information about all HGT events is contained in such LDT graphs

    Towards Accurate Reconstruction of Phylogenetic Networks

    Get PDF
    Since Darwin proposed that all species on the earth have evolved from a common ancestor, evolution has played an important role in understanding biology. While the evolutionary relationships/histories of genes are represented using trees, the genomic evolutionary history may not be adequately captured by a tree, as some evolutionary events, such as horizontal gene transfer (HGT), do not fit within the branches of a tree. In this case, phylogenetic networks are more appropriate for modeling evolutionary histories. In this dissertation, we present computational algorithms to reconstruct phylogenetic networks from different types of data. Under the assumption that species have single copies of genes, and HGT and speciation are the only events through the course of evolution, gene sequences can be sampled one copy per species for HGT detection. Given the alignments of the sequences, we propose systematic methods that estimate the significance of detected HGT events under maximum parsimony (MP) and maximum likelihood (ML). The estimated significance aims at addressing the issue of overestimation of both optimization criteria in the search for phylogenetic networks and helps the search identify networks with the ``right" number of HGT edges. We study their performance on both synthetic and biological data sets. While the studies show very promising results in identifying HGT edges, they also highlight the issues that are challenging for each criterion. We also develop algorithms that estimate the amount of HGT events and reconstruct phylogenetic networks by utilizing the pairwise Subtree-Prune-Regraft (SPR) operation from a collection of trees. The methods produce good results in general in terms of quickly estimating the minimum number of HGT events required to reconcile a set of trees. Further, we identify conditions under which the methods do not work well in order to help in the development of new methods in this area. Finally, we extend the assumption for the genetic evolutionary process and allow for duplication and loss. Under this assumption, we analyze gene family trees of proteobacterial strains using a parsimony-based approach to detect evolutionary events. Also we discuss the current issues of parsimony-based approaches in the biological data analysis and propose a way to retrieve significant estimates. The evolutionary history of species is complex with various evolutionary events. As HGT contributes largely to this complexity, accurately identifying HGT will help untangle evolutionary histories and solve important questions. As our algorithms identify significant HGT events in the data and reconstruct accurate phylogenetic networks from them, they can be used to address questions arising in large-scale biological data analyses

    The Evolution of Function in the Rab family of Small GTPases

    Get PDF
    Dissertation presented to obtain the PhD degree in Computational Biology.The question how protein function evolves is a fundamental problem with profound implications for both functional end evolutionary studies on proteins. Here, we review some of the work that has addressed or contributed to this question. We identify and comment on three different levels relevant for the evolution of protein function. First, biochemistry. This is the focus of our discussion, as protein function itself commonly receives least attention in studies on protein evolution.(...

    Méthodes et algorithmes pour l’amélioration de l’inférence de l’histoire évolutive des génomes

    Full text link
    Les phylogénies de gènes offrent un cadre idéal pour l’étude comparative des génomes. Non seulement elles incorporent l’évolution des espèces par spéciation, mais permettent aussi de capturer l’expansion et la contraction des familles de gènes par gains et pertes de gènes. La détermination de l’ordre et de la nature de ces événements équivaut à inférer l’histoire évolutive des familles de gènes, et constitue un prérequis à plusieurs analyses en génomique comparative. En effet, elle est requise pour déterminer efficacement les relations d’orthologies entre gènes, importantes pour la prédiction des structures et fonctions de protéines et les analyses phylogénétiques, pour ne citer que ces applications. Les méthodes d’inférence d’histoires évolutives de familles de gènes supposent que les phylogénies considérées sont dénuées d’erreurs. Ces phylogénies de gènes, souvent recons- truites à partir des séquences d’acides aminés ou de nucléotides, ne représentent cependant qu’une estimation du vrai arbre de gènes et sont sujettes à des erreurs provenant de sources variées, mais bien documentées. Pour garantir l’exactitude des histoires inférées, il faut donc s’assurer de l’absence d’erreurs au sein des arbres de gènes. Dans cette thèse, nous étudions cette problématique sous deux aspects. Le premier volet de cette thèse concerne l’identification des déviations du code génétique, l’une des causes d’erreurs d’annotations se propageant ensuite dans les phylogénies. Nous développons à cet effet, une méthodologie pour l’inférence de déviations du code génétique standard par l’analyse des séquences codantes et des ARNt. Cette méthodologie est cen- trée autour d’un algorithme de prédiction de réaffectations de codons, appelé CoreTracker. Nous montrons tout d’abord l’efficacité de notre méthode, puis l’utilisons pour démontrer l’évolution du code génétique dans les génomes mitochondriaux des algues vertes. Le second volet de la thèse concerne le développement de méthodes efficaces pour la correction et la construction d’arbres phylogénétiques de gènes. Nous présentons deux méthodes exploitant l’information sur l’évolution des espèces. La première, ProfileNJ , est déterministe et très rapide. Elle corrige les arbres de gènes en ciblant exclusivement les sous-arbres présentant un support statistique faible. Son application sur les familles de gènes d’Ensembl Compara montre une amélioration nette de la qualité des arbres, par comparaison à ceux proposés par la base de données. La seconde, GATC, utilise un algorithme génétique et traite le problème comme celui de l’optimisation multi-objectif de la topologie des arbres de gènes, étant données des contraintes relatives à l’évolution des familles de gènes par mutation de séquences et par gain/perte de gènes. Nous montrons qu’une telle approche est non seulement efficace, mais appropriée pour la construction d’ensemble d’arbres de référence.Gene trees offer a proper framework for comparative genomics. Not only do they provide information about species evolution through speciation events, but they also capture gene family expansion and contraction by gene gains and losses. They are thus used to infer the evolutionary history of gene families and accurately predict the orthologous relationship between genes, on which several biological analyses rely. Methods for inferring gene family evolution explicitly assume that gene trees are known without errors. However, standard phylogenetic methods for tree construction based on se- quence data are well documented as error-prone. Gene trees constructed using these methods will usually introduce biases during the inference of gene family histories. In this thesis, we present new methods aiming to improve the quality of phylogenetic gene trees and thereby the accuracy of underlying evolutionary histories of their corresponding gene families. We start by providing a framework to study genetic code deviations, one possible reason of annotation errors that could then spread to the phylogeny reconstruction. Our framework is based on analysing coding sequences and tRNAs to predict codon reassignments. We first show its efficiency, then apply it to green plant mitochondrial genomes. The second part of this thesis focuses on the development of efficient species tree aware methods for gene tree construction. We present ProfileNJ , a fast and deterministic correction method that targets weakly supported branches of a gene tree. When applied to the gene families of the Ensembl Compara database, ProfileNJ produces an arguably better set of gene trees compared to the ones available in Ensembl Compara. We later use a different strategy, based on a genetic algorithm, allowing both construction and correction of gene trees. This second method called GATC, treats the problem as a multi-objective optimisation problem in which we are looking for the set of gene trees optimal for both sequence data and information of gene family evolution through gene gain and loss. We show that this approach yields accurate trees and is suitable for the construction of reference datasets to benchmark other methods

    Lattice-based Key Sharing Schemes - A Survey

    Get PDF
    Public key cryptography is an indispensable component used in almost all of our present day digital infrastructure. However, most if not all of it is predominantly built upon hardness guarantees of number theoretic problems that can be broken by large scale quantum computers in the future. Sensing the imminent threat from continued advances in quantum computing, NIST has recently initiated a global level standardization process for quantum resistant public-key cryptographic primitives such as public key encryption, digital signatures and key encapsulation mechanisms. While the process received proposals from various categories of post-quantum cryptography, lattice-based cryptography features most prominently among all the submissions. Lattice-based cryptography offers a very attractive alternative to traditional public-key cryptography mainly due to the variety of lattice-based schemes offering varying flavors of security and efficiency guarantees. In this paper, we survey the evolution of lattice-based key sharing schemes (public key encryption and key encapsulation schemes) and cover various aspects ranging from theoretical security guarantees, general algorithmic frameworks, practical implementation aspects and physical attack security, with special focus on lattice-based key sharing schemes competing in the NIST\u27s standardization process. Please note that our work is focussed on the results available from the second round of the NIST\u27s standardization process while the standardization process has progressed to the third and final round at the time of publishing this document
    • …
    corecore