1,740 research outputs found
Enhanced methods for local ancestry assignment in sequenced admixed individuals.
Inferring the ancestry at each locus in the genome of recently admixed individuals (e.g., Latino Americans) plays a major role in medical and population genetic inferences, ranging from finding disease-risk loci, to inferring recombination rates, to mapping missing contigs in the human genome. Although many methods for local ancestry inference have been proposed, most are designed for use with genotyping arrays and fail to make use of the full spectrum of data available from sequencing. In addition, current haplotype-based approaches are very computationally demanding, requiring large computational time for moderately large sample sizes. Here we present new methods for local ancestry inference that leverage continent-specific variants (CSVs) to attain increased performance over existing approaches in sequenced admixed genomes. A key feature of our approach is that it incorporates the admixed genomes themselves jointly with public datasets, such as 1000 Genomes, to improve the accuracy of CSV calling. We use simulations to show that our approach attains accuracy similar to widely used computationally intensive haplotype-based approaches with large decreases in runtime. Most importantly, we show that our method recovers comparable local ancestries, as the 1000 Genomes consensus local ancestry calls in the real admixed individuals from the 1000 Genomes Project. We extend our approach to account for low-coverage sequencing and show that accurate local ancestry inference can be attained at low sequencing coverage. Finally, we generalize CSVs to sub-continental population-specific variants (sCSVs) and show that in some cases it is possible to determine the sub-continental ancestry for short chromosomal segments on the basis of sCSVs
A comparison of methods for haplotype inference
This study presents some of the available methods for haplotype reconstruction and evaluates the accuracy and efficiency of three different software programs that utilize these methods. The analysis is performed on the QTLMAS XII common dataset, which is publicly available. The program LinkPHASE 5+, rule-based software, considers pedigree information (deduction and linkage) only. HiddenPHASE is a likelihood-based software, which takes into account molecular information (linkage disequilibrium). The DualPHASE software combines both of the above mentioned methods. We will see how usage of different available sources of information as well as the shape of the data affects the haplotype inference
Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads
Background Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping. Results In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project. Experiments on publicly available 454, Illumina, and ABI SOLiD sequencing datasets suggest that integration of LD information results in genotype calling accuracy comparable to that of microarray platforms from sequencing data of low-coverage. A software package implementing our algorithm, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/GeneSeq/. Conclusions Integration of LD information leads to significant improvements in genotype calling accuracy compared to prior LD-oblivious methods, rendering low-coverage sequencing as a viable alternative to microarrays for conducting large-scale genome-wide association studies
Recommended from our members
Investigating the genetic diversity, population structure and archaic admixture history in worldwide human populations using high-coverage genomes
I present the analysis on 929 high-coverage (>30x) genomes from the Human Genome Diversity Project (HGDP) panel, a collection of cell lines from 54 populations across the world. Some data processing steps were necessary for downstream analysis, including lifting over resources on a different reference genome assembly, annotating the genome, and statistical phasing. Genome-wide genetic diversity conforms with previous studies using SNP arrays and microsatellites, yet haplotype information reveals fine scale structures and recent demographic history that vary between populations.
This dataset also provides a valuable opportunity to explore the diversity and distribution of archaic segments in modern human populations. I implemented a hidden Markov model to detect such segments, based on patterns of allele-sharing with sequenced archaic genomes and a sub-Saharan African control panel. I also compared several variants of the model and different training methods using simulated data. Applying the model on the HGDP dataset using two Neanderthal genomes and one Denisova genome, I detected variations in the level of archaic ancestry across continental regions, populations, and individuals within each population. I further compared Neanderthal and Denisovan segments regarding their lengths, genomic distribution, divergence to the archaic genomes, nucleotide diversity, and haplotype networks to shed light on the structure of the admixture events. Neanderthal segments from all non-African populations appear largely homogeneous after accounting for the recent demographic history of modern human populations, which is consistent with a single admixture event that happened before they diverged from each other. In contrast, a distinct separation exists between Denisovan haplotypes recovered from Oceania and those from East/South Asia, whilst the complicated structure in the latter cannot be explained by a single source of gene flow. Therefore I propose that more than one episode of admixture with different Denisova groups occurred in the ancestral population of present-day East Asian, South Asian and American populations after the separation from the ancestors of present-day Oceanians, and that a separate admixture event occurred between the ancestors of Oceanians and the Denisova population.Gates Cambridg
- …