9 research outputs found

    An Evolutionary Analysis of Lateral Gene Transfer in Thymidylate Synthase Enzymes

    Get PDF
    Thymidylate synthases (Thy) are key enzymes in the synthesis of deoxythymidylate, 1 of the 4 building blocks of DNA. As such, they are essential for all DNA-based forms of life and therefore implicated in the hypothesized transition from RNA genomes to DNA genomes. Two evolutionally unrelated Thy enzymes, ThyA and ThyX, are known to catalyze the same biochemical reaction. Both enzymes are sporadically distributed within each of the 3 domains of life in a pattern that suggests multiple nonhomologous lateral gene transfer (LGT) events. We present a phylogenetic analysis of the evolution of the 2 enzymes, aimed at unraveling their entangled evolutionary history and tracing their origin back to early life. A novel probabilistic evolutionary model was developed, which allowed us to compute the posterior probabilities and the posterior expectation of the number of LGT events. Simulation studies were performed to validate the model's ability to accurately detect LGT events, which have occurred throughout a large phylogeny. Applying the model to the Thy data revealed widespread nonhomologous LGT between and within all 3 domains of life. By reconstructing the ThyA and ThyX gene trees, the most likely donor of each LGT event was inferred. The role of viruses in LGT of Thy is finally discussed

    Contextual Word Similarity and Estimation from Sparse Data

    No full text
    In recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. This paper discusses how to estimate the likelihood of cooccurrences that do not occur in the training data. We present a method that makes local analogies between each specific unobserved cooccurrence and other cooccurrences that contain similar words. These analogies are based on the assumption that similar word cooccurrences have similar values of mutual information. Accordingly, the word similarity metric captures similarities between vectors of mutual information values. Our evaluation suggests that this method performs better than existing, frequency based, smoothing methods, and may provide an alternative to class based models. A background survey is included, covering issues of lexical cooccurrence, data sparseness and smoothing, word similarity and clustering, and mutual information. 1 Introduction Statistical data on wo..

    Contextual word similarity and estimation from sparse data

    No full text
    In recent years there is much interest in word cooccurrence relations, such as n-grams, verb-object combinations, or cooccurrence within a limited context. This paper discusses how to estimate the likelihood of cooccurrences that do not occur in the training data. We present a method that makes local analogies between each speci c unobserved cooccurrence and other cooccurrences that contain similar words. These analogies are based on the assumption that similar word cooccurrences have similar values of mutual information. Accordingly, the word similarity metric captures similarities between vectors of mutual information values. Our evaluation suggests that this method performs better than existing, frequency based, smoothing methods, and may provide an alternative to class based models. A background survey is included, covering issues of lexical cooccurrence, data sparseness and smoothing, word similarity and clustering, and mutual information. 1
    corecore