Search CORE

arXiv.org e-Print Archive

FigShare

SWEEPFINDER2: Increased sensitivity, robustness, and flexibility

Author: DeGiorgio Michael
Hellmann Ines
Huber Christian D.
Hubisz Melissa J.
Nielsen Rasmus
Publication venue
Publication date: 26/05/2015
Field of study

SweepFinder is a popular program that implements a powerful likelihood-based method for detecting recent positive selection, or selective sweeps. Here, we present SweepFinder2, an extension of SweepFinder with increased sensitivity and robustness to the confounding effects of mutation rate variation and background selection, as well as increased flexibility that enables the user to examine genomic regions in greater detail and to specify a fixed distance between test sites. Moreover, SweepFinder2 enables the use of invariant sites for sweep detection, increasing both its power and precision relative to SweepFinder

Public Library of Science (PLOS)

Error and Error Mitigation in Low-Coverage Genome Assemblies

Author: Hubisz Melissa J.
Kellis Manolis
Lin Michael F.
Siepel Adam
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/11/2010
Field of study

The recent release of twenty-two new genome sequences has dramatically increased the data available for mammalian comparative genomics, but twenty of these new sequences are currently limited to ~2× coverage. Here we examine the extent of sequencing error in these 2× assemblies, and its potential impact in downstream analyses. By comparing 2× assemblies with high-quality sequences from the ENCODE regions, we estimate the rate of sequencing error to be 1–4 errors per kilobase. While this error rate is fairly modest, sequencing error can still have surprising effects. For example, an apparent lineage-specific insertion in a coding region is more likely to reflect sequencing error than a true biological event, and the length distribution of coding indels is strongly distorted by error. We find that most errors are contributed by a small fraction of bases with low quality scores, in particular, by the ends of reads in regions of single-read coverage in the assembly. We explore several approaches for automatic sequencing error mitigation (SEM), making use of the localized nature of sequencing error, the fact that it is well predicted by quality scores, and information about errors that comes from comparisons across species. Our automatic methods for error mitigation cannot replace the need for additional sequencing, but they do allow substantial fractions of errors to be masked or eliminated at the cost of modest amounts of over-correction, and they can reduce the impact of error in downstream phylogenomic analyses. Our error-mitigated alignments are available for download.National Science Foundation (U.S.) (Faculty Early Career Development grant DBI-0644111)National Science Foundation (U.S.) (Faculty Early Career Development grant DBI-0644282)National Science Foundation (U.S.) (Faculty Early Career Development grant U54 HG004555-01)David & Lucile Packard FoundationDavid & Lucile Packard Foundation (Fellowship for Science and Engineering

CiteSeerX

DSpace@MIT

Cold Spring Harbor Laboratory Institutional Repository

Public Library of Science (PLOS)

Recommended from our members

Patterns of Positive Selection in Six Mammalian Genomes

Author: Bustamante Carlos D.
Fonseca Rute R. da
Hubisz Melissa J.
Kosiol Carolin
Nielsen Rasmus
Siepel Adam
Vinař Tomáš
Publication venue
Publication date: 03/01/2024
Field of study

Genome-wide scans for positively selected genes (PSGs) in mammals have provided insight into the dynamics of genome evolution, the genetic basis of differences between species, and the functions of individual genes. However, previous scans have been limited in power and accuracy owing to small numbers of available genomes. Here we present the most comprehensive examination of mammalian PSGs to date, using the six high-coverage genome assemblies now available for eutherian mammals. The increased phylogenetic depth of this dataset results in substantially improved statistical power, and permits several new lineage- and clade-specific tests to be applied. Of ∼16,500 human genes with high-confidence orthologs in at least two other species, 400 genes showed significant evidence of positive selection (FDR<0.05), according to a standard likelihood ratio test. An additional 144 genes showed evidence of positive selection on particular lineages or clades. As in previous studies, the identified PSGs were enriched for roles in defense/immunity, chemosensory perception, and reproduction, but enrichments were also evident for more specific functions, such as complement-mediated immunity and taste perception. Several pathways were strongly enriched for PSGs, suggesting possible co-evolution of interacting genes. A novel Bayesian analysis of the possible “selection histories” of each gene indicated that most PSGs have switched multiple times between positive selection and nonselection, suggesting that positive selection is often episodic. A detailed analysis of Affymetrix exon array data indicated that PSGs are expressed at significantly lower levels, and in a more tissue-specific manner, than non-PSGs. Genes that are specifically expressed in the spleen, testes, liver, and breast are significantly enriched for PSGs, but no evidence was found for an enrichment for PSGs among brain-specific genes. This study provides additional evidence for widespread positive selection in mammalian evolution and new genome-wide insights into the functional implications of positive selection.</p

Knowledge UChicago

Localizing Recent Adaptive Evolution in the Human Genome

Author: Andrew G Clark
Bret A Payseur
Carlos D Bustamante
Gil McVean
Melissa J Hubisz
Rasmus Nielsen
Scott H Williamson
The International HapMap Consortium
Publication venue: Public Library of Science
Publication date: 01/01/2007
Field of study

Identifying genomic locations that have experienced selective sweeps is an important first step toward understanding the molecular basis of adaptive evolution. Using statistical methods that account for the confounding effects of population demography, recombination rate variation, and single-nucleotide polymorphism ascertainment, while also providing fine-scale estimates of the position of the selected site, we analyzed a genomic dataset of 1.2 million human single-nucleotide polymorphisms genotyped in African-American, European-American, and Chinese samples. We identify 101 regions of the human genome with very strong evidence (p < 10−5) of a recent selective sweep and where our estimate of the position of the selective sweep falls within 100 kb of a known gene. Within these regions, genes of biological interest include genes in pigmentation pathways, components of the dystrophin protein complex, clusters of olfactory receptors, genes involved in nervous system development and function, immune system genes, and heat shock genes. We also observe consistent evidence of selective sweeps in centromeric regions. In general, we find that recent adaptation is strikingly pervasive in the human genome, with as much as 10% of the genome affected by linkage to a selective sweep

CiteSeerX

Copenhagen University Research Information System

arXiv.org e-Print Archive

A Model-Based Analysis of GC-Biased Gene Conversion in the Human and Chimpanzee Genomes

Author: A Auton
A Kong
A Navarro
A Necşulea
A Ratnakumar
A Siepel
Adam Siepel
AJ Jeffreys
AJ Webb
AP Boyle
BC Lamb
C Kosiol
CC Spencer
CF Mugal
D Karolchik
D Kostka
Dennis Kostka
E Mancera
G Marais
Graham Coop
J Berglund
J Harrow
J Romiguier
JA Capra
JM Chen
John A. Capra
JW IJdo
K Lindblad-Toh
K Pollard
Katherine S. Pollard
L Arbiza
L Duret
L Duret
LR Meyer
M Blanchette
M Hasegawa
Melissa J. Hubisz
MJ Hubisz
N Galtier
N Galtier
N Lartillot
P Flicek
P Stenson
RD George
S Glémin
S Katzman
S Katzman
S Myers
S Myers
SE Ptak
ST Sherry
T Nagylaki
TC Brown
TR Dreszer
W Winckler
Y Zhang
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

GC-biased gene conversion (gBGC) is a recombination-associated process that favors the fixation of G/C alleles over A/T alleles. In mammals, gBGC is hypothesized to contribute to variation in GC content, rapidly evolving sequences, and the fixation of deleterious mutations, but its prevalence and general functional consequences remain poorly understood. gBGC is difficult to incorporate into models of molecular evolution and so far has primarily been studied using summary statistics from genomic comparisons. Here, we introduce a new probabilistic model that captures the joint effects of natural selection and gBGC on nucleotide substitution patterns, while allowing for correlations along the genome in these effects. We implemented our model in a computer program, called phastBias, that can accurately detect gBGC tracts about 1 kilobase or longer in simulated sequence alignments. When applied to real primate genome sequences, phastBias predicts gBGC tracts that cover roughly 0.3% of the human and chimpanzee genomes and account for 1.2% of human-chimpanzee nucleotide differences. These tracts fall in clusters, particularly in subtelomeric regions; they are enriched for recombination hotspots and fast-evolving sequences; and they display an ongoing fixation preference for G and C alleles. They are also significantly enriched for disease-associated polymorphisms, suggesting that they contribute to the fixation of deleterious alleles. The gBGC tracts provide a unique window into historical recombination processes along the human and chimpanzee lineages. They supply additional evidence of long-term conservation of megabase-scale recombination rates accompanied by rapid turnover of hotspots. Together, these findings shed new light on the evolutionary, functional, and disease implications of gBGC. The phastBias program and our predicted tracts are freely available. © 2013 Capra et al

Cold Spring Harbor Laboratory Institutional Repository

D-Scholarship@Pitt

FigShare

Tracking genes and finding mutations: finding genes for complex traits in the domestic dog (Canis familiaris)

Author: Auton Adam
Boyko Adam
Brisbin Abra
Bustamante Carlos D
Cargill Michelle
Degenhardt Jeremiah D
Elkahloun Abdel G
Hubisz Melissa J
Li Lin
Lohmueller Kurt E
Mosher Dana S
Novembre John
Ostrander Elaine A
Parker Heidi G
Quignon Pascale
Reynolds Andy
Schoenebeck Jeffrey J
Siepel Adam
Sutter Nathan B
Von Holdt Bridgett M
Wayne Robert K
Zhao Keyan
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Recommended from our members

A high-resolution map of human evolutionary constraint using 29 mammals.

Author: Alföldi Jessica
Baldwin Jen
Baylor College of Medicine Human Genome Sequencing Center Sequencing Team
Beal Kathryn
Birney Ewan
Bloom Toby
Broad Institute Sequencing Platform and Whole Genome Assembly Team
Chang Jean
Chin Chee Whye
Clamp Michele
Clawson Hiram
Cree Andrew
Cuff James
Delehaunty Kim
Di Palma Federica
Dihn Huyen H
Dooling David
Ernst Jason
Fitzgerald Stephen
Flicek Paul
Fowler Gerald
Fronik Catrina
Fulton Bob
Fulton Lucinda
Garber Manuel
Genome Institute at Washington University
Gibbs Richard A
Gnerre Sante
Goldman Nick
Graves Tina
Green Eric D
Guttman Mitchell
Haussler David
Heiman Dave
Herrero Javier
Holloway Alisha K
Hubisz Melissa J
Jaffe David B
Jhangiani Shalili
Jordan Gregory
Joshi Vandita
Jungreis Irwin
Kellis Manolis
Kent W James
Kheradpour Pouya
Kostka Dennis
Kovar Christie L
Lander Eric S
Lara Marcia
Lee Sandra
Lewis Lora R
Lin Michael F
Lindblad-Toh Kerstin
Lowe Craig B
Mardis Elaine R
Margulies Elliott H
Martins Andre L
Massingham Tim
Mauceli Evan
Minx Patrick
Moltke Ida
Muzny Donna M
Nazareth Lynne V
Nicol Robert
Nusbaum Chad
Okwuonu Geoffrey
Parker Brian J
Pedersen Jakob S
Pollard Katherine S
Raney Brian J
Rasmussen Matthew D
Robinson Jim
Santibanez Jireh
Siepel Adam
Sodergren Erica
Stark Alexander
Vilella Albert J
Ward Lucas D
Warren Wesley C
Washietl Stefan
Weinstock George M
Wen Jiayu
Wilkinson Jane
Wilson Richard K
Worley Kim C
Xie Xiaohui
Young Sarah
Zody Michael C
Zuk Or
Publication venue: eScholarship, University of California
Publication date: 01/10/2011
Field of study

The comparison of related genomes has emerged as a powerful lens for genome interpretation. Here we report the sequencing and comparative analysis of 29 eutherian genomes. We confirm that at least 5.5% of the human genome has undergone purifying selection, and locate constrained elements covering ∼4.2% of the genome. We use evolutionary signatures and comparisons with experimental data sets to suggest candidate functions for ∼60% of constrained bases. These elements reveal a small number of new coding exons, candidate stop codon readthrough events and over 10,000 regions of overlapping synonymous constraint within protein-coding exons. We find 220 candidate RNA structural families, and nearly a million elements overlapping potential promoter, enhancer and insulator regions. We report specific amino acid residues that have undergone positive selection, 280,000 non-coding elements exapted from mobile elements and more than 1,000 primate- and human-accelerated elements. Overlap with disease-associated variants indicates that our findings will be relevant for studies of human biology, health and disease

Comparative Genomic Analysis of the Streptococcus dysgalactiae Species Group: Gene Content, Molecular Adaptation, and Promoter Evolution

Author: Abdelsalam
Adam Siepel
Altschul
Andolfatto
Arakawa
Arakawa
Ashburner
Benjamini
Benson
Bergmann
Bert
Blanchette
Bourgoin
Brady
Brandt
Brochet
Chen
Chenier
Churchward
Davidsen
Davies
Davies
Demers
Duret
Ferretti
Gardner
Glazunova
Grant
Guindon
Guindon
Haenni
Haruo Suzuki
Hashikawa
Haygood
Heather
Henry
Holden
Holm
Hsueh
Karlin
Karp
Kazakov
Kent
Kent
Kosakovsky Pond
Kosiol
Kreikemeyer
Laus
Lefébure
Lefébure
Leplae
Li
Lima-Mendez
Lobry
Lowe
Marraffini
McDonald
McShan
Melissa Jane Hubisz
Michael J. Stanhope
Ohtsubo
Panchaud
Paulina Pavinski Bitar
Pavlovic
Ping Lang
Pollard
Proft
R_Development_Core_Team
Rhead
Rolston
Roshan
Sharp
Siepel
Siepel
Stothard
Sun
Sunaoshi
Suzuki
Takahashi
Talkington
Tanaka
Torgerson
Touchon
Tristan Lefébure
Vieira
Wray
Yang
Zerbino
Zhang
Zhao
Zhu
Publication venue: Oxford University Press
Publication date: 01/01/2011
Field of study

Comparative genomics of closely related bacterial species with different pathogenesis and host preference can provide a means of identifying the specifics of adaptive differences. Streptococcus dysgalactiae (SD) is comprised of two subspecies: S. dysgalactiae subsp. equisimilis is both a human commensal organism and a human pathogen, and S. dysgalactiae subsp. dysgalactiae is strictly an animal pathogen. Here, we present complete genome sequences for both taxa, with analyses involving other species of Streptococcus but focusing on adaptation in the SD species group. We found little evidence for enrichment in biochemical categories of genes carried by each SD strain, however, differences in the virulence gene repertoire were apparent. Some of the differences could be ascribed to prophage and integrative conjugative elements. We identified approximately 9% of the nonrecombinant core genome to be under positive selection, some of which involved known virulence factors in other bacteria. Analyses of proteomes by pooling data across genes, by biochemical category, clade, or branch, provided evidence for increased rates of evolution in several gene categories, as well as external branches of the tree. Promoters were primarily evolving under purifying selection but with certain categories of genes evolving faster. Many of these fast-evolving categories were the same as those associated with rapid evolution in proteins. Overall, these results suggest that adaptation to changing environments and new hosts in the SD species group has involved the acquisition of key virulence genes along with selection of orthologous protein-coding loci and operon promoters

Cold Spring Harbor Laboratory Institutional Repository