4,982 research outputs found
A MOSAIC of methods: Improving ortholog detection through integration of algorithmic diversity
Ortholog detection (OD) is a critical step for comparative genomic analysis
of protein-coding sequences. In this paper, we begin with a comprehensive
comparison of four popular, methodologically diverse OD methods: MultiParanoid,
Blat, Multiz, and OMA. In head-to-head comparisons, these methods are shown to
significantly outperform one another 12-30% of the time. This high
complementarity motivates the presentation of the first tool for integrating
methodologically diverse OD methods. We term this program MOSAIC, or Multiple
Orthologous Sequence Analysis and Integration by Cluster optimization. Relative
to component and competing methods, we demonstrate that MOSAIC more than
quintuples the number of alignments for which all species are present, while
simultaneously maintaining or improving functional-, phylogenetic-, and
sequence identity-based measures of ortholog quality. Further, we demonstrate
that this improvement in alignment quality yields 40-280% more confidently
aligned sites. Combined, these factors translate to higher estimated levels of
overall conservation, while at the same time allowing for the detection of up
to 180% more positively selected sites. MOSAIC is available as python package.
MOSAIC alignments, source code, and full documentation are available at
http://pythonhosted.org/bio-MOSAIC
Whole-genome sequencing of Theileria parva strains provides insight into parasite migration and diversification in the african continent
The disease caused by the apicomplexan protozoan parasite Theileria parva, known as East Coast fever or Corridor disease, is one of the most serious cattle diseases in Eastern, Central, and Southern Africa. We performed whole-genome sequencing of nine T. parva strains, including one of the vaccine strains (Kiambu 5), field isolates from Zambia, Uganda, Tanzania, or Rwanda, and two buffalo-derived strains. Comparison with the reference Muguga genome sequence revealed 34 814–121 545 single nucleotide polymorphisms (SNPs) that were more abundant in buffalo-derived strains. High-resolution phylogenetic trees were constructed with selected informative SNPs that allowed the investigation of possible complex recombination events among ancestors of the extant strains. We further analysed the dN/dS ratio (non-synonymous substitutions per non-synonymous site divided by synonymous substitutions per synonymous site) for 4011 coding genes to estimate potential selective pressure. Genes under possible positive selection were identified that may, in turn, assist in the identification of immunogenic proteins or vaccine candidates. This study elucidated the phylogeny of T. parva strains based on genome-wide SNPs analysis with prediction of possible past recombination events, providing insight into the migration, diversification, and evolution of this parasite species in the African continent
Population genetics models of local ancestry
Migrations have played an important role in shaping the genetic diversity of
human populations. Understanding genomic data thus requires careful modeling of
historical gene flow. Here we consider the effect of relatively recent
population structure and gene flow, and interpret genomes of individuals that
have ancestry from multiple source populations as mosaics of segments
originating from each population. We propose general and tractable models for
describing the evolution of these patterns of local ancestry and their impact
on genetic diversity. We focus on the length distribution of continuous
ancestry tracts, and the variance in total ancestry proportions among
individuals. The proposed models offer improved agreement with Wright-Fisher
simulation data when compared to state-of-the art models, and can be used to
infer various demographic parameters in gene flow models. Considering HapMap
African-American (ASW) data, we find that a model with two distinct phases of
`European' gene flow significantly improves the modeling of both tract lengths
and ancestry variances.Comment: 25 pages with 7 figures; Genetics: Published online before print
April 4, 201
An efficient method to identify, date, and describe admixture events using haplotype information
We present fastGLOBETROTTER, an efficient new haplotype-based technique to identify, date, and describe admixture events using genome-wide autosomal data. With simulations, we demonstrate how fastGLOBETROTTER reduces computation time by an order of magnitude relative to the related technique GLOBETROTTER without suffering loss of accuracy. We apply fastGLOBETROTTER to a cohort of >6000 Europeans from ten countries, revealing previously unreported admixture signals. In particular we infer multiple periods of admixture related to East Asian or Siberian-like sources, starting >2000 years ago, in people living in countries north of the Baltic Sea. In contrast, we infer admixture related to West Asian, North African and/or Southern European sources in populations south of the Baltic Sea, including admixture dated to ≈300-700CE, overlapping the fall of the Roman Empire, in people from Belgium, France and parts of Germany. Our new approach scales to analyzing hundreds to thousands of individuals from a putatively admixed population and hence is applicable to emerging large-scale cohorts of genetically homogeneous populations
Methods for Assessing Population Relationships and History Using Genomic Data
Genetic data contain a record of our evolutionary history. The availability of
large-scale datasets of human populations from various geographic areas and
timescales, coupled with advances in the computational methods to analyze
these data, has transformed our ability to use genetic data to learn about
our evolutionary past. Here, we review some of the widely used statistical
methods to explore and characterize population relationships and history
using genomic data. We describe the intuition behind commonly used approaches, their interpretation, and important limitations. For illustration, we
apply some of these techniques to genome-wide autosomal data from 929 individuals representing 53 worldwide populations that are part of the Human
Genome Diversity Project. Finally, we discuss the new frontiers in genomic
methods to learn about population history. In sum, this review highlights
the power (and limitations) of DNA to infer features of human evolutionary
history, complementing the knowledge gleaned from other disciplines, such
as archaeology, anthropology, and linguistics
Dense sampling of ethnic groups within African countries reveals fine-scale genetic structure and extensive historical admixture
Previous studies have highlighted how African genomes have been shaped by a complex series of historical events. Despite this, genome-wide data have only been obtained from a small proportion of present-day ethnolinguistic groups. By analyzing new autosomal genetic variation data of 1333 individuals from over 150 ethnic groups from Cameroon, Republic of the Congo, Ghana, Nigeria, and Sudan, we demonstrate a previously underappreciated fine-scale level of genetic structure within these countries, for example, correlating with historical polities in western Cameroon. By comparing genetic variation patterns among populations, we infer that many northern Cameroonian and Sudanese groups share genetic links with multiple geographically disparate populations, likely resulting from long-distance migrations. In Ghana and Nigeria, we infer signatures of intermixing dated to over 2000 years ago, corresponding to reports of environmental transformations possibly related to climate change. We also infer recent intermixing signals in multiple African populations, including Congolese, that likely relate to the expansions of Bantu language-speaking peoples
- …