34 research outputs found

    Genomic Relationships and Speciation Times of Human, Chimpanzee, and Gorilla Inferred from a Coalescent Hidden Markov Model

    Get PDF
    The genealogical relationship of human, chimpanzee, and gorilla varies along the genome. We develop a hidden Markov model (HMM) that incorporates this variation and relate the model parameters to population genetics quantities such as speciation times and ancestral population sizes. Our HMM is an analytically tractable approximation to the coalescent process with recombination, and in simulations we see no apparent bias in the HMM estimates. We apply the HMM to four autosomal contiguous human–chimp–gorilla–orangutan alignments comprising a total of 1.9 million base pairs. We find a very recent speciation time of human–chimp (4.1 ± 0.4 million years), and fairly large ancestral effective population sizes (65,000 ± 30,000 for the human–chimp ancestor and 45,000 ± 10,000 for the human–chimp–gorilla ancestor). Furthermore, around 50% of the human genome coalesces with chimpanzee after speciation with gorilla. We also consider 250,000 base pairs of X-chromosome alignments and find an effective population size much smaller than 75% of the autosomal effective population sizes. Finally, we find that the rate of transitions between different genealogies correlates well with the region-wide present-day human recombination rate, but does not correlate with the fine-scale recombination rates and recombination hot spots, suggesting that the latter are evolutionarily transient

    Assignment of isochores for all completely sequenced vertebrate genomes using a consensus

    Get PDF
    A new consensus isochore assignment method and a database of isochore maps for all completely sequenced vertebrate genomes are presented

    Evolutionary Sequence Modeling for Discovery of Peptide Hormones

    Get PDF
    There are currently a large number of “orphan” G-protein-coupled receptors (GPCRs) whose endogenous ligands (peptide hormones) are unknown. Identification of these peptide hormones is a difficult and important problem. We describe a computational framework that models spatial structure along the genomic sequence simultaneously with the temporal evolutionary path structure across species and show how such models can be used to discover new functional molecules, in particular peptide hormones, via cross-genomic sequence comparisons. The computational framework incorporates a priori high-level knowledge of structural and evolutionary constraints into a hierarchical grammar of evolutionary probabilistic models. This computational method was used for identifying novel prohormones and the processed peptide sites by producing sequence alignments across many species at the functional-element level. Experimental results with an initial implementation of the algorithm were used to identify potential prohormones by comparing the human and non-human proteins in the Swiss-Prot database of known annotated proteins. In this proof of concept, we identified 45 out of 54 prohormones with only 44 false positives. The comparison of known and hypothetical human and mouse proteins resulted in the identification of a novel putative prohormone with at least four potential neuropeptides. Finally, in order to validate the computational methodology, we present the basic molecular biological characterization of the novel putative peptide hormone, including its identification and regional localization in the brain. This species comparison, HMM-based computational approach succeeded in identifying a previously undiscovered neuropeptide from whole genome protein sequences. This novel putative peptide hormone is found in discreet brain regions as well as other organs. The success of this approach will have a great impact on our understanding of GPCRs and associated pathways and help to identify new targets for drug development

    Gene identification using phylogenetic metrics with conditional random fields

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 69-72).While the complete sequence of the human genome contains all the information necessary for encoding a complete human being, its interpretation remains a major challenge of modern biology. The first step to any genomic analysis is a comprehensive and accurate annotation of all genes encoded in the genome, providing the basis for understanding human variation, gene regulation, health and disease. Traditionally, the problem of computational gene prediction has been addressed using graphical probabilistic models of genomic sequence. While such models have been successful for small genomes with relatively simple gene structure, new methods are necessary for scaling these to the complete human genome, and for leveraging information across multiple mammalian species currently being sequenced. While generative models like hidden Markov models (HMMs) face the difficulty of modeling both coding and non-coding regions across a complete genome, discriminative models such as Conditional Random Fields (CRFs) have recently emerged, which focus specifically on the discrimination problem of gene identification, and can therefore be more powerful. One of the most attractive characteristics of these models is that their general framework also allows the incorporation of any number of independently derived feature functions (metrics), which can increase discriminatory power. While most of the work on CRFs for gene finding has been on model construction and training, there has not been much focus on the metrics used in such discriminatory frameworks. This is particularly important with the availability of rich comparative genome data, enabling the development of phylogenetic gene identification metrics which can maximally use alignments of a large number of genomes.(cont.) In this work I address the question of gene identification using multiple related genomes. I first present novel comparative metrics for gene classification that show considerable improvement over existing work, and also scale well with an increase in the number of aligned genomes. Second, I describe a general methodology of extending pair-wise metrics to alignments of multiple genomes that incorporates the evolutionary phylogenetic relationship between informant species. Third, I evaluate various methods of combining metrics that exploit metric independence and result in superior classification. Finally, I incorporate the metrics into a Conditional Random Field gene model, to perform unrestricted de novo gene prediction on 12-species alignments of the D. melanogaster genome, and demonstrate accuracy rivaling that of state-of-the-art gene prediction systems.by Ameya Nitin Deoras.S.M

    Statistical Population Genomics

    Get PDF
    This open access volume presents state-of-the-art inference methods in population genomics, focusing on data analysis based on rigorous statistical techniques. After introducing general concepts related to the biology of genomes and their evolution, the book covers state-of-the-art methods for the analysis of genomes in populations, including demography inference, population structure analysis and detection of selection, using both model-based inference and simulation procedures. Last but not least, it offers an overview of the current knowledge acquired by applying such methods to a large variety of eukaryotic organisms. Written in the highly successful Methods in Molecular Biology series format, chapters include introductions to their respective topics, pointers to the relevant literature, step-by-step, readily reproducible laboratory protocols, and tips on troubleshooting and avoiding known pitfalls. Authoritative and cutting-edge, Statistical Population Genomics aims to promote and ensure successful applications of population genomic methods to an increasing number of model systems and biological questions

    Learning to Behave: Internalising Knowledge

    Get PDF

    Bayesian molecular phylogenetics: estimation of divergence dates and hypothesis testing

    Get PDF
    With the advent of automated sequencing, sequence data are now available to help us understand the functioning of our genome, as well as its history. To date,powerful methods such as maximum likelihood have been used to estimate its mode and tempo of evolution and its branching pattern. However, these methods appear to have some limitations. The purpose of this thesis is to examine these issues in light of Bayesian modelling, taking advantage of some recent advances in Bayesian computation. Firstly, Bayesian methods to estimate divergence dates when rates of evolution vary from lineage to lineages are extended and compared. The power of the technique is demonstrated by analysing twenty-two genes sampled across the metazoans to test the Cambrian explosion hypothesis. While the molecular clock gives divergence dates at least twice as old as those indicated by the fossil records, it is shown (i) that modelling rate change gives results consistent with the fossils, (ii) that this improves dramatically the fit to the data and (iii) that these results are not dependent on the choice of a specific model of rate change.Results from this analysis support a molecular explosion of the metazoans about 600 million years (MY) ago, i.e. only some 50 MY before the morphological Cambrian explosion. Secondly, two new Bayesian tests of phylogenetic trees are developed. The first aims at selecting the correct tree, while the second constructs confidence sets of trees. Two other tests are also developed, in the frequentist framework. Based on p-values adjusted for multiple comparisons,they are built to match their Bayesian counterparts. These four new tests are compared with previous tests. Their sensitivity to model misspecification and the problem of regions is discussed. Finally, some extensions to the models examined are made to estimate divergence dates from data of multiple genes, and to detect positive selection

    Music in Evolution and Evolution in Music

    Get PDF
    Music in Evolution and Evolution in Music by Steven Jan is a comprehensive account of the relationships between evolutionary theory and music. Examining the ‘evolutionary algorithm’ that drives biological and musical-cultural evolution, the book provides a distinctive commentary on how musicality and music can shed light on our understanding of Darwin’s famous theory, and vice-versa. Comprised of seven chapters, with several musical examples, figures and definitions of terms, this original and accessible book is a valuable resource for anyone interested in the relationships between music and evolutionary thought. Jan guides the reader through key evolutionary ideas and the development of human musicality, before exploring cultural evolution, evolutionary ideas in musical scholarship, animal vocalisations, music generated through technology, and the nature of consciousness as an evolutionary phenomenon. A unique examination of how evolutionary thought intersects with music, Music in Evolution and Evolution in Music is essential to our understanding of how and why music arose in our species and why it is such a significant presence in our lives

    Ancient human genomes suggest three ancestral populations for present-day Europeans

    Get PDF
    We sequenced the genomes of a ∼7,000-year-old farmer from Germany and eight ∼8,000-year-old hunter-gatherers from Luxembourg and Sweden. We analysed these and other ancient genomes1,2,3,4 with 2,345 contemporary humans to show that most present-day Europeans derive from at least three highly differentiated populations: west European hunter-gatherers, who contributed ancestry to all Europeans but not to Near Easterners; ancient north Eurasians related to Upper Palaeolithic Siberians3, who contributed to both Europeans and Near Easterners; and early European farmers, who were mainly of Near Eastern origin but also harboured west European hunter-gatherer related ancestry. We model these populations’ deep relationships and show that early European farmers had ∼44% ancestry from a ‘basal Eurasian’ population that split before the diversification of other non-African lineages.Instituto Multidisciplinario de Biología Celula
    corecore