151 research outputs found

    Enhancements to the ADMIXTURE algorithm for individual ancestry estimation

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The estimation of individual ancestry from genetic data has become essential to applied population genetics and genetic epidemiology. Software programs for calculating ancestry estimates have become essential tools in the geneticist's analytic arsenal.</p> <p>Results</p> <p>Here we describe four enhancements to ADMIXTURE, a high-performance tool for estimating individual ancestries and population allele frequencies from SNP (single nucleotide polymorphism) data. First, ADMIXTURE can be used to estimate the number of underlying populations through cross-validation. Second, individuals of known ancestry can be exploited in supervised learning to yield more precise ancestry estimates. Third, by penalizing small admixture coefficients for each individual, one can encourage model parsimony, often yielding more interpretable results for small datasets or datasets with large numbers of ancestral populations. Finally, by exploiting multiple processors, large datasets can be analyzed even more rapidly.</p> <p>Conclusions</p> <p>The enhancements we have described make ADMIXTURE a more accurate, efficient, and versatile tool for ancestry estimation.</p

    PREMIER - PRobabilistic Error-correction using Markov Inference in Errored Reads

    Get PDF
    In this work we present a flexible, probabilistic and reference-free method of error correction for high throughput DNA sequencing data. The key is to exploit the high coverage of sequencing data and model short sequence outputs as independent realizations of a Hidden Markov Model (HMM). We pose the problem of error correction of reads as one of maximum likelihood sequence detection over this HMM. While time and memory considerations rule out an implementation of the optimal Baum-Welch algorithm (for parameter estimation) and the optimal Viterbi algorithm (for error correction), we propose low-complexity approximate versions of both. Specifically, we propose an approximate Viterbi and a sequential decoding based algorithm for the error correction. Our results show that when compared with Reptile, a state-of-the-art error correction method, our methods consistently achieve superior performances on both simulated and real data sets.Comment: Submitted to ISIT 201

    A splice variant in KRT71 is associated with curly coat phenotype of Selkirk Rex cats.

    Get PDF
    One of the salient features of the domestic cat is the aesthetics of its fur. The Selkirk Rex breed is defined by an autosomal dominant woolly rexoid hair (ADWH) abnormality that is characterized by tightly curled hair shafts. A genome-wide case - control association study was conducted using 9 curly coated Selkirk Rex and 29 controls, including straight-coated Selkirk Rex, British Shorthair and Persian, to localize the Selkirk autosomal dominant rexoid locus (SADRE). Although the control cats were from different breed lineages, they share recent breeding histories and were validated as controls by Bayesian clustering, multi-dimensional scaling and genomic inflation. A significant association was found on cat chromosome B4 (Praw = 2.87 × 10(-11)), and a unique haplotype spanning ~600 Kb was found in all the curly coated cats. Direct sequencing of four candidate genes revealed a splice site variant within the KRT71 gene associated with the hair abnormality in Selkirk Rex

    Differential Evolution Approach to Detect Recent Admixture

    Full text link
    The genetic structure of human populations is extraordinarily complex and of fundamental importance to studies of anthropology, evolution, and medicine. As increasingly many individuals are of mixed origin, there is an unmet need for tools that can infer multiple origins. Misclassification of such individuals can lead to incorrect and costly misinterpretations of genomic data, primarily in disease studies and drug trials. We present an advanced tool to infer ancestry that can identify the biogeographic origins of highly mixed individuals. reAdmix is an online tool available at http://chcb.saban-chla.usc.edu/reAdmix/.Comment: presented at ISMB 2014, VariSI

    Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations.

    Get PDF
    BackgroundEstimation of individual ancestry from genetic data is useful for the analysis of disease association studies, understanding human population history and interpreting personal genomic variation. New, computationally efficient methods are needed for ancestry inference that can effectively utilize existing information about allele frequencies associated with different human populations and can work directly with DNA sequence reads.ResultsWe describe a fast method for estimating the relative contribution of known reference populations to an individual's genetic ancestry. Our method utilizes allele frequencies from the reference populations and individual genotype or sequence data to obtain a maximum likelihood estimate of the global admixture proportions using the BFGS optimization algorithm. It accounts for the uncertainty in genotypes present in sequence data by using genotype likelihoods and does not require individual genotype data from external reference panels. Simulation studies and application of the method to real datasets demonstrate that our method is significantly times faster than previous methods and has comparable accuracy. Using data from the 1000 Genomes project, we show that estimates of the genome-wide average ancestry for admixed individuals are consistent between exome sequence data and whole-genome low-coverage sequence data. Finally, we demonstrate that our method can be used to estimate admixture proportions using pooled sequence data making it a valuable tool for controlling for population stratification in sequencing based association studies that utilize DNA pooling.ConclusionsOur method is an efficient and versatile tool for estimating ancestry from DNA sequence data and is available from https://sites.google.com/site/vibansal/software/iAdmix

    The genome-wide structure of two economically important indigenous Sicilian cattle breeds

    Get PDF
    Genomic technologies, such as high-throughput genotyping based on SNP arrays, provided background information concerning genome structure in domestic animals. The aim of this work was to investigate the genetic structure, the genome-wide estimates of inbreeding, coancestry, effective population size (Ne), and the patterns of linkage disequilibrium (LD) in two economically important Sicilian local cattle breeds, Cinisara (CIN) and Modicana (MOD), using the Illumina Bovine SNP50K v2 BeadChip. In order to understand the genetic relationship and to place both Sicilian breeds in a global context, genotypes from others 134 domesticated bovid breeds were used. Principal component analysis showed that the Sicilian cattle breeds were closer to individuals of B. t. taurus from Eurasia and formed non-overlapping clusters with other breeds. Between the Sicilian cattle breeds, MOD was the most differentiated, whereas the animals belonging to CIN breed showed a lower value of assignment, the presence of substructure and genetic links with MOD breed. The average molecular inbreeding and coancestry coefficients were moderately high, and the current estimates of Ne were low in both breeds. These values indicated a low genetic variability. Considering levels of LD between adjacent markers, the average r2 in MOD breed was comparable to those reported for others cattle breeds, whereas CIN showed a lower value. Therefore, these results support the need of more dense SNP arrays for a high power association mapping and genomic selection efficiency in particular for CIN cattle breed. Controlling molecular inbreeding and coancestry would restrict inbreeding depression, the probability of losing beneficial rare alleles, and therefore, the risk of extinction. The results generated from this study have important implications for the development of conservation and/or selection breeding programs in these two local cattle breeds
    • …
    corecore