Search CORE

105 research outputs found

Very Few RNA and DNA Sequence Differences in the Human Transcriptome

Author: Daniel R. Schrider
Jean-Francois Gout
Matthew W. Hahn
Philip Awadalla
Publication venue: Public Library of Science
Publication date: 12/10/2011
Field of study

RNA editing is an important cellular process by which the nucleotides in a mature RNA transcript are altered to cause them to differ from the corresponding DNA sequence. While this process yields essential transcripts in humans and other organisms, it is believed to occur at a relatively small number of loci. The rarity of RNA editing has been challenged by a recent comparison of human RNA and DNA sequence data from 27 individuals, which revealed that over 10,000 human exonic sites appear to exhibit RNA-DNA differences (RDDs). Many of these differences could not have been caused by either of the two previously known human RNA editing mechanisms—ADAR-mediated A→G substitutions or APOBEC1-mediated C→U switches—suggesting that a previously unknown mechanism of RNA editing may be active in humans. Here, we reanalyze these data and demonstrate that genomic sequences exist in these same individuals or in the human genome that match the majority of RDDs. Our results suggest that the majority of these RDD events were observed due to accurate transcription of sequences paralogous to the apparently edited gene but differing at the edited site. In light of our results it seems prudent to conclude that if indeed an unknown mechanism is causing RDD events in humans, such events occur at a much lower frequency than originally proposed

Public Library of Science (PLOS)

Crossref

PubMed Central

IntroUNET: Identifying introgressed alleles via semantic segmentation

Author: Flagel Lex
Ray Dylan D.
Schrider Daniel R.
Publication venue: Public Library of Science
Publication date: 01/01/2024
Field of study

A growing body of evidence suggests that gene flow between closely related species is a widespread phenomenon. Alleles that introgress from one species into a close relative are typically neutral or deleterious, but sometimes confer a significant fitness advantage. Given the potential relevance to speciation and adaptation, numerous methods have therefore been devised to identify regions of the genome that have experienced introgression. Recently, supervised machine learning approaches have been shown to be highly effective for detecting introgression. One especially promising approach is to treat population genetic inference as an image classification problem, and feed an image representation of a population genetic alignment as input to a deep neural network that distinguishes among evolutionary models (i.e. introgression or no introgression). However, if we wish to investigate the full extent and fitness effects of introgression, merely identifying genomic regions in a population genetic alignment that harbor introgressed loci is insufficient—ideally we would be able to infer precisely which individuals have introgressed material and at which positions in the genome. Here we adapt a deep learning algorithm for semantic segmentation, the task of correctly identifying the type of object to which each individual pixel in an image belongs, to the task of identifying introgressed alleles. Our trained neural network is thus able to infer, for each individual in a two-population alignment, which of those individual’s alleles were introgressed from the other population. We use simulated data to show that this approach is highly accurate, and that it can be readily extended to identify alleles that are introgressed from an unsampled “ghost” population, performing comparably to a supervised learning method tailored specifically to that task. Finally, we apply this method to data from Drosophila, showing that it is able to accurately recover introgressed haplotypes from real data. This analysis reveals that introgressed alleles are typically confined to lower frequencies within genic regions, suggestive of purifying selection, but are found at much higher frequencies in a region previously shown to be affected by adaptive introgression. Our method’s success in recovering introgressed haplotypes in challenging real-world scenarios underscores the utility of deep learning approaches for making richer evolutionary inferences from genomic data

Carolina Digital Repository

Extensive error in the number of genes inferred from draft genome assemblies

Author: Denton James F
Hahn Matthew W
Lugo-Martinez Jose
Schrider Daniel R
Tucker Abraham E
Warren Wesley C
Publication venue: Digital Commons@Becker
Publication date: 01/01/2014
Field of study

Current sequencing methods produce large amounts of data, but genome assemblies based on these data are often woefully incomplete. These incomplete and error-filled assemblies result in many annotation errors, especially in the number of genes present in a genome. In this paper we investigate the magnitude of the problem, both in terms of total gene number and the number of copies of genes in specific families. To do this, we compare multiple draft assemblies against higher-quality versions of the same genomes, using several new assemblies of the chicken genome based on both traditional and next-generation sequencing technologies, as well as published draft assemblies of chimpanzee. We find that upwards of 40% of all gene families are inferred to have the wrong number of genes in draft assemblies, and that these incorrect assemblies both add and subtract genes. Using simulated genome assemblies of Drosophila melanogaster, we find that the major cause of increased gene numbers in draft genomes is the fragmentation of genes onto multiple individual contigs. Finally, we demonstrate the usefulness of RNA-Seq in improving the gene annotation of draft assemblies, largely by connecting genes that have been fragmented in the assembly process

CiteSeerX

Crossref

Directory of Open Access Journals

Digital Commons@Becker

PubMed Central

FigShare

Discovery of Ongoing Selective Sweeps within Anopheles Mosquito Populations Using Deep Learning.

Author: Ag1000g Consortium
Kern Andrew D
Schrider Daniel R
Xue Alexander T
Publication venue: 'Oxford University Press (OUP)'
Publication date: 09/03/2021
Field of study

Identification of partial sweeps, which include both hard and soft sweeps that have not currently reached fixation, provides crucial information about ongoing evolutionary responses. To this end, we introduce partialS/HIC, a deep learning method to discover selective sweeps from population genomic data. partialS/HIC uses a convolutional neural network for image processing, which is trained with a large suite of summary statistics derived from coalescent simulations incorporating population-specific history, to distinguish between completed versus partial sweeps, hard versus soft sweeps, and regions directly affected by selection versus those merely linked to nearby selective sweeps. We perform several simulation experiments under various demographic scenarios to demonstrate partialS/HIC's performance, which exhibits excellent resolution for detecting partial sweeps. We also apply our classifier to whole genomes from eight mosquito populations sampled across sub-Saharan Africa by the Anopheles gambiae 1000 Genomes Consortium, elucidating both continent-wide patterns as well as sweeps unique to specific geographic regions. These populations have experienced intense insecticide exposure over the past two decades, and we observe a strong overrepresentation of sweeps at insecticide resistance loci. Our analysis thus provides a list of candidate adaptive loci that may be relevant to mosquito control efforts. More broadly, our supervised machine learning approach introduces a method to distinguish between completed and partial sweeps, as well as between hard and soft sweeps, under a variety of demographic scenarios. As whole-genome data rapidly accumulate for a greater diversity of organisms, partialS/HIC addresses an increasing demand for useful selection scan tools that can track in-progress evolutionary dynamics

Cold Spring Harbor Laboratory Institutional Repository

A community-maintained standard library of population genetic models

Author: Adrion Jeffrey R.
Baumdicker Franz
Carlson Jedidiah
Cartwright Reed A.
Cole Christopher B.
Dukler Noah
Durvasula Arun
Galloway Jared G.
Gladstein Ariella L.
Gower Graham
Gravel Simon
Gronau Ilan
Gutenkunst Ryan N.
Kelleher Jerome
Kern Andrew D.
Kim Bernard Y.
Kyriazis Christopher C.
Lohmueller Kirk E.
McKenzie Patrick
Messer Philipp W.
Noskova Ekaterina
Ortega-Del Vecchyo Diego
Racimo Fernando
Ragsdale Aaron P.
Ralph Peter L.
Schrider Daniel R.
Siepel Adam
Struck Travis J.
Tsambos Georgia
Publication venue: 'eLife Sciences Publications, Ltd'
Publication date: 01/01/2020
Field of study

The explosion in population genomic data demands ever more complex modes of analysis, and increasingly, these analyses depend on sophisticated simulations. Recent advances in population genetic simulation have made it possible to simulate large and complex models, but specifying such models for a particular simulation engine remains a difficult and error-prone task. Computational genetics researchers currently re-implement simulation models independently, leading to inconsistency and duplication of effort. This situation presents a major barrier to empirical researchers seeking to use simulations for power analyses of upcoming studies or sanity checks on existing genomic data. Population genetics, as a field, also lacks standard benchmarks by which new tools for inference might be measured. Here, we describe a new resource, stdpopsim, that attempts to rectify this situation. Stdpopsim is a community-driven open source project, which provides easy access to a growing catalog of published simulation models from a range of organisms and supports multiple simulation engine backends. This resource is available as a well-documented python library with a simple command-line interface. We share some examples demonstrating how stdpopsim can be used to systematically compare demographic inference methods, and we encourage a broader community of developers to contribute to this growing resource.Open access journalThis item from the UA Faculty Publications collection is made available by the University of Arizona with support from the University of Arizona Libraries. If you have questions, please contact us at [email protected]

Copenhagen University Research Information System

The University of Arizona

Expanding the stdpopsim species catalog, and lessons learned for realistic genome simulations

Simulation is a key tool in population genetics for both methods development and empirical research, but producing simulations that recapitulate the main features of genomic datasets remains a major obstacle. Today, more realistic simulations are possible thanks to large increases in the quantity and quality of available genetic data, and the sophistication of inference and simulation software. However, implementing these simulations still requires substantial time and specialized knowledge. These challenges are especially pronounced for simulating genomes for species that are not well-studied, since it is not always clear what information is required to produce simulations with a level of realism sufficient to confidently answer a given question. The community-developed framework stdpopsim seeks to lower this barrier by facilitating the simulation of complex population genetic models using up-to-date information. The initial version of stdpopsim focused on establishing this framework using six well-characterized model species (Adrion et al., 2020). Here, we report on major improvements made in the new release of stdpopsim (version 0.2), which includes a significant expansion of the species catalog and substantial additions to simulation capabilities. Features added to improve the realism of the simulated genomes include non-crossover recombination and provision of species-specific genomic annotations. Through community-driven efforts, we expanded the number of species in the catalog more than threefold and broadened coverage across the tree of life. During the process of expanding the catalog, we have identified common sticking points and developed the best practices for setting up genome-scale simulations. We describe the input data required for generating a realistic simulation, suggest good practices for obtaining the relevant information from the literature, and discuss common pitfalls and major considerations. These improvements to stdpopsim aim to further promote the use of realistic whole-genome population genetic simulations, especially in non-model organisms, making them available, transparent, and accessible to everyone

Publikationer från Uppsala Universitet

Edinburgh Research Explorer

eScholarship - University of California

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Genome variation and population structure among 1142 mosquitoes of the African malaria vector species Anopheles gambiae and Anopheles coluzzii

Author: Amaya-Romero Jorge Edouardo
Ayala Diego
Battey C J
Besansky Nora J
Burt Austin
Cano Jorge
Caputo Beniamino
Clarkson Chris S
Constant Edi
Costantini Carlo
Coulibaly Boubacar
della Torre Alessandra
Diabaté Abdoulaye
Dinis Joao
Donnelly Martin J
Drury Eleanor
Elissa Nohal
Essandoh John
Fontaine Michael C
Godfray H Charles J
Hahn Matthew W
Harding Nicholas J
Henrichs Christa
Hubbart Christina
Isaacs Alison T
Jawara Musa
Jeffreys Anna E
Jyothi Dushyanth
Kamali Maryam
Kern Andrew D
Kwiatkowski Dominic P
Lawniczak Mara K N
Le Goff Gilbert
Lucas Eric R
Malangone Cinzia
Mawejje Henry D
Mbogo Charles
Mead Daniel
Midega Janet
Miles Alistair
Nwakanma Davis C
O'Loughlin Samantha
Pinto João
Riehle Michelle M
Robert Vincent
Rockett Kirk A
Rohatgi Kyanne R
Rowlands Kate
Schrider Daniel R
Sharakhov Igor
Simpson Victoria
Stalker Jim
Troco Arlete D
Vernick Kenneth D
White Bradley J
Publication venue: 'Cold Spring Harbor Laboratory'
Publication date: 01/12/2019
Field of study

Mosquito control remains a central pillar of efforts to reduce malaria burden in sub-Saharan Africa. However, insecticide resistance is entrenched in malaria vector populations, and countries with a high malaria burden face a daunting challenge to sustain malaria control with a limited set of surveillance and intervention tools. Here we report on the second phase of a project to build an open resource of high-quality data on genome variation among natural populations of the major African malaria vector species Anopheles gambiae and Anopheles coluzzii. We analyzed whole genomes of 1142 individual mosquitoes sampled from the wild in 13 African countries, as well as a further 234 individuals comprising parents and progeny of 11 laboratory crosses. The data resource includes high-confidence single-nucleotide polymorphism (SNP) calls at 57 million variable sites, genome-wide copy number variation (CNV) calls, and haplotypes phased at biallelic SNPs. We use these data to analyze genetic population structure and characterize genetic diversity within and between populations. We illustrate the utility of these data by investigating species differences in isolation by distance, genetic variation within proposed gene drive target sequences, and patterns of resistance to pyrethroid insecticides. This data resource provides a foundation for developing new operational systems for molecular surveillance and for accelerating research and development of new vector control tools. It also provides a unique resource for the study of population genomics and evolutionary biology in eukaryotic species with high levels of genetic diversity under strong anthropogenic evolutionary pressures

LJMU Research Online (Liverpool John Moores University)

University of Groningen

HAL-IRD

Resistance to pirimiphos-methyl in West African Anopheles is spreading via duplication and introgression of the Ace1 locus

Author: Abdoulaye Diabate´
Alessandra della Torre
Alison T. Isaacs
Alistair Miles
Andrew D. Kern
Anna E. Jeffreys
Arlete D. Troco
Austin Burt
Beniamino Caputo
Boubacar Coulibaly
Bradley J. White
C. J. Battey
Carlo Costantini
Chabi Joseph
Charles Mbogo
Chris S. Clarkson
Christa Henrichs
Christina Hubbart
Cinzia Malangone
Constant Edi
Craig S. Wilding
Dadzie Samuel
Daniel Mead
Daniel R. Schrider
David Weetman
Davis C. Nwakanma
Diego Ayala
Djogbénou Luc
Dominic P. Kwiatkowski
Donnelly Martin J.
Dushyanth Jyothi
Edi Constant
Egyir-Yawson Alexander
Eleanor Drury
Eric R. Lucas
Essandoh John
Gilbert Le Goff
Grau-Bové Xavier
H. Charles J. Godfray
Harding Nicholas J.
Henry D. Mawejje
Igor Sharakhov
Janet Midega
Jim Stalker
John Essandoh
Jorge Cano
Jorge Edouardo
Jorge Edouardo Amaya-Romero
João Dinis
João Pinto
Kate Rowlands
Kenneth D. Vernick
Kirk A. Rockett
Kwiatkowski Dominic
Kyanne R. Rohatgi
Lucas Eric
Mara K. N. Lawniczak
Martin J. Donnelly
Maryam Kamali
Matthew W. Hahn
Michael C. Fontaine
Michelle M. Riehle
Miles Alistair
Musa Jawara
Nicholas J. Harding
Nohal Elissa
Nora J. Besansky
Philip Bejon
Pipini Dimitra
Rippon Emily
Samantha O’Loughlin
van ‘t Hof Arjèn E.
Victoria Simpson
Vincent Robert
Weetman David
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2021
Field of study

Vector population control using insecticides is a key element of current strategies to prevent malaria transmission in Africa. The introduction of effective insecticides, such as the organophosphate pirimiphos-methyl, is essential to overcome the recurrent emergence of resistance driven by the highly diverse Anopheles genomes. Here, we use a population genomic approach to investigate the basis of pirimiphos-methyl resistance in the major malaria vectors Anopheles gambiae and A. coluzzii. A combination of copy number variation and a single non-synonymous substitution in the acetylcholinesterase gene, Ace1, provides the key resistance diagnostic in an A. coluzzii population from Coˆte d’Ivoire that we used for sequence-based association mapping, with replication in other West African populations. The Ace1 substitution and duplications occur on a unique resistance haplotype that evolved in A. gambiae and introgressed into A. coluzzii, and is now common in West Africa primarily due to selection imposed by other organophosphate or carbamate insecticides. Our findings highlight the predictive value of this complex resistance haplotype for phenotypic resistance and clarify its evolutionary history, providing tools to for molecular surveillance of the current and future effectiveness of pirimiphos-methyl based interventions

Archivio della ricerca- Università di Roma La Sapienza