3,734 research outputs found

    Methods for statistical and population genetics analyses.

    Full text link
    Genetics studies have advanced rapidly, from candidate region studies to genome wide association studies (GWAS) and next generation sequencing projects. The emergence of new technologies has brought with it an array of statistical challenges. In this thesis, we propose methods for statistical and population genetics in our effort to better understand the underlying architecture of our genomes. GWAS rely on indirect association, testing a reduced set of representative markers (tagSNPs) instead of all variants present in the genome. In the first chapter, we propose a graph-based method to select the optimal set of tagSNPs. We apply our method to chromosome-wide data and show that it outperforms the widely used greedy approach, selecting fewer tagSNPs while maintaining high correlation with non-tagSNPs variants. Alignment to a reference sequence is an integral step in many sequencing studies. Multiply mapped reads, reads that align to multiple locations in the reference, are discarded from downstream analyses, resulting in a loss of information. We develop a Gibbs sampling approach to identify the true location of multiply mapped reads obtained from the alignment step. We validate our method using simulation studies. We use the improvement in variant discovery to quantify the effect of including multiply mapped reads in downstream analyses. In the third chapter, we explore the feasibility of admixture mapping, a population genetics tool, in identifying regions harboring rare susceptibility variants. We compare the power of admixture mapping to single marker association studies in detecting causal regions. We find that admixture mapping performs better over a wide range of risk allele frequencies. The site frequency spectrum (SFS) is an important summary statistic in population genetics, encompassing information on selection and demographic history. We show that estimates of the SFS obtained from genotype calling methods underestimate the number of rare variants, especially singletons and doubletons. We derive a maximum likelihood estimate for the SFS. We demonstrate that our method performs better than SFS obtained from genotype calling algorithms using both simulated and real data examples.Ph.D.BiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/89609/1/gopalakr_1.pd

    Effects of sampling close relatives on some elementary population genetics analyses

    Get PDF
    Many molecular ecology analyses assume the genotyped individuals are sampled at random from a population and thus are representative of the population. Realistically, however, a sample may contain excessive close relatives (ECR) because, for example, localized juveniles are drawn from fecund species. Our knowledge is limited about how ECR affect the routinely conducted elementary genetics analyses, and how ECR are best dealt with to yield unbiased and accurate parameter estimates. This study quantifies the effects of ECR on some popular population genetics analyses of marker data, including the estimation of allele frequencies, F-statistics, expected heterozygosity (He), effective and observed numbers of alleles, and the tests of Hardy–Weinberg equilibrium (HWE) and linkage equilibrium (LE). It also investigates several strategies for handling ECR to mitigate their impact and to yield accurate parameter estimates. My analytical work, assisted by simulations, shows that ECR have large and global effects on all of the above marker analyses. The naïve approach of simply ignoring ECR could yield low-precision and often biased parameter estimates, and could cause too many false rejections of HWE and LE. The bold approach, which simply identifies and removes ECR, and the cautious approach, which estimates target parameters (e.g., He) by accounting for ECR and using naïve allele frequency estimates, eliminate the bias and the false HWE and LE rejections, but could reduce estimation precision substantially. The likelihood approach, which accounts for ECR in estimating allele frequencies and thus target parameters relying on allele frequencies, usually yields unbiased and the most accurate parameter estimates. Which of the four approaches is the most effective and efficient may depend on the particular marker analysis to be conducted. The results are discussed in the context of using marker data for understanding population properties and marker properties

    New transcriptome-based SNP markers for noug (Guizotia abyssinica) and their conversion to KASP markers for population genetics analyses

    Get PDF
    The development and use of genomic resources are essential for understanding the population genetics of crops for their efficient conservation and enhancement. Noug (Guizotia abyssinica) is an economically important oilseed crop in Ethiopia and India. The present study sought to develop new DNA markers for this crop. Transcriptome sequencing was conducted on two genotypes and 628 transcript sequences containing 959 single nucleotide polymorphisms (SNPs) were developed. A competitive allele-specific PCR (KASP) assay was developed for the SNPs and used for genotyping of 24 accessions. A total of 554 loci were successfully genotyped across the accessions, and 202 polymorphic loci were used for population genetics analyses. Polymorphism information content (PIC) of the loci varied from 0.01 to 0.37 with a mean of 0.24, and about 49% of the loci showed significant deviation from the Hardy-Weinberg equilibrium. The mean expected heterozygosity was 0.27 suggesting moderately high genetic variation within accessions. Low but significant differentiation existed among accessions (FST = 0.045, p < 0.0001). Landrace populations from isolated areas may have useful mutations and should be conserved and used in breeding this crop. The genomic resources developed in this study were shown to be useful for population genetics research and can also be used in, e.g., association genetics

    Laboratory colonisation and genetic bottlenecks in the tsetse fly Glossina pallidipes

    Get PDF
    Background The IAEA colony is the only one available for mass rearing of Glossina pallidipes, a vector of human and animal African trypanosomiasis in eastern Africa. This colony is the source for Sterile Insect Technique (SIT) programs in East Africa. The source population of this colony is unclear and its genetic diversity has not previously been evaluated and compared to field populations.<p></p> Methodology/Principal Findings We examined the genetic variation within and between the IAEA colony and its potential source populations in north Zimbabwe and the Kenya/Uganda border at 9 microsatellites loci to retrace the demographic history of the IAEA colony. We performed classical population genetics analyses and also combined historical and genetic data in a quantitative analysis using Approximate Bayesian Computation (ABC). There is no evidence of introgression from the north Zimbabwean population into the IAEA colony. Moreover, the ABC analyses revealed that the foundation and establishment of the colony was associated with a genetic bottleneck that has resulted in a loss of 35.7% of alleles and 54% of expected heterozygosity compared to its source population. Also, we show that tsetse control carried out in the 1990's is likely reduced the effective population size of the Kenya/Uganda border population.<p></p> Conclusions/Significance All the analyses indicate that the area of origin of the IAEA colony is the Kenya/Uganda border and that a genetic bottleneck was associated with the foundation and establishment of the colony. Genetic diversity associated with traits that are important for SIT may potentially have been lost during this genetic bottleneck which could lead to a suboptimal competitiveness of the colony males in the field. The genetic diversity of the colony is lower than that of field populations and so, studies using colony flies should be interpreted with caution when drawing general conclusions about G. pallidipes biology.<p></p&gt

    Phred-Phrap package to analyses tools: a pipeline to facilitate population genetics re-sequencing studies

    Get PDF
    BACKGROUND: Targeted re-sequencing is one of the most powerful and widely used strategies for population genetics studies because it allows an unbiased screening for variation that is suitable for a wide variety of organisms. Examples of studies that require re-sequencing data are evolutionary inferences, epidemiological studies designed to capture rare polymorphisms responsible for complex traits and screenings for mutations in families and small populations with high incidences of specific genetic diseases. Despite the advent of next-generation sequencing technologies, Sanger sequencing is still the most popular approach in population genetics studies because of the widespread availability of automatic sequencers based on capillary electrophoresis and because it is still less prone to sequencing errors, which is critical in population genetics studies. Two popular software applications for re-sequencing studies are Phred-Phrap-Consed-Polyphred, which performs base calling, alignment, graphical edition and genotype calling and DNAsp, which performs a set of population genetics analyses. These independent tools are the start and end points of basic analyses. In between the use of these tools, there is a set of basic but error-prone tasks to be performed with re-sequencing data. RESULTS: In order to assist with these intermediate tasks, we developed a pipeline that facilitates data handling typical of re-sequencing studies. Our pipeline: (1) consolidates different outputs produced by distinct Phred-Phrap-Consed contigs sharing a reference sequence; (2) checks for genotyping inconsistencies; (3) reformats genotyping data produced by Polyphred into a matrix of genotypes with individuals as rows and segregating sites as columns; (4) prepares input files for haplotype inferences using the popular software PHASE; and (5) handles PHASE output files that contain only polymorphic sites to reconstruct the inferred haplotypes including polymorphic and monomorphic sites as required by population genetics software for re-sequencing data such as DNAsp. CONCLUSION: We tested the pipeline in re-sequencing studies of haploid and diploid data in humans, plants, animals and microorganisms and observed that it allowed a substantial decrease in the time required for sequencing analyses, as well as being a more controlled process that eliminates several classes of error that may occur when handling datasets. The pipeline is also useful for investigators using other tools for sequencing and population genetics analyses

    Genome-wide sequence analyses of ethnic populations across Russia

    Get PDF
    The Russian Federation is the largest and one of the most ethnically diverse countries in the world, however no centralized reference database of genetic variation exists to date. Such data are crucial for medical genetics and essential for studying population history. The Genome Russia Project aims at filling this gap by performing whole genome sequencing and analysis of peoples of the Russian Federation. Here we report the characterization of genome-wide variation of 264 healthy adults, including 60 newly sequenced samples. People of Russia carry known and novel genetic variants of adaptive, clinical and functional consequence that in many cases show allele frequency divergence from neighboring populations. Population genetics analyses revealed six phylogeographic partitions among indigenous ethnicities corresponding to their geographic locales. This study presents a characterization of population-specific genomic variation in Russia with results important for medical genetics and for understanding the dynamic population history of the world's largest country

    Sequences From First Settlers Reveal Rapid Evolution in Icelandic mtDNA Pool

    Get PDF
    A major task in human genetics is to understand the nature of the evolutionary processes that have shaped the gene pools of contemporary populations. Ancient DNA studies have great potential to shed light on the evolution of populations because they provide the opportunity to sample from the same population at different points in time. Here, we show that a sample of mitochondrial DNA (mtDNA) control region sequences from 68 early medieval Icelandic skeletal remains is more closely related to sequences from contemporary inhabitants of Scotland, Ireland, and Scandinavia than to those from the modern Icelandic population. Due to a faster rate of genetic drift in the Icelandic mtDNA pool during the last 1,100 years, the sequences carried by the first settlers were better preserved in their ancestral gene pools than among their descendants in Iceland. These results demonstrate the inferential power gained in ancient DNA studies through the application of population genetics analyses to relatively large samples

    High performance computation of landscape genomic models integrating local indices of spatial association

    Get PDF
    Since its introduction, landscape genomics has developed quickly with the increasing availability of both molecular and topo-climatic data. The current challenges of the field mainly involve processing large numbers of models and disentangling selection from demography. Several methods address the latter, either by estimating a neutral model from population structure or by inferring simultaneously environmental and demographic effects. Here we present Samβ\betaada, an integrated approach to study signatures of local adaptation, providing rapid processing of whole genome data and enabling assessment of spatial association using molecular markers. Specifically, candidate loci to adaptation are identified by automatically assessing genome-environment associations. In complement, measuring the Local Indicators of Spatial Association (LISA) for these candidate loci allows to detect whether similar genotypes tend to gather in space, which constitutes a useful indication of the possible kinship relationship between individuals. In this paper, we also analyze SNP data from Ugandan cattle to detect signatures of local adaptation with Samβ\betaada, BayEnv, LFMM and an outlier method (FDIST approach in Arlequin) and compare their results. Samβ\betaada is an open source software for Windows, Linux and MacOS X available at \url{http://lasig.epfl.ch/sambada}Comment: 1 figure in text, 1 figure in supplementary material The structure of the article was modified and some explanations were updated. The methods and results presented are the same as in the previous versio

    Development of genomic resources and tools for precision farming of pikeperch through high-throughput sequencing and computational genomics

    Get PDF
    This thesis provides the first genomic tools and resources to enhance pikeperch's innovative farming, optimal domestication, and adaption into modern intensive aquaculture systems, including a high-quality chromosome-level assembly, reference transcriptome, and gene expression atlas. The pikeperch genome was also used as a reference for comparative genomics analyses and population genetics analyses in domesticated individuals to establish the landscape of genetic variations. These findings lay the foundation for addressing critical issues in genomics-informed pikeperch farming.Diese Dissertation stellt die ersten genomischen Werkzeuge und Ressourcen zur Verfügung, um die innovative Zanderzucht, optimale Domestizierung und Anpassung an moderne intensive Aquakultursysteme zu verbessern, einschließlich einer hochwertigen Genom-Assemblierung auf Chromosomenebene, eines Referenztranskriptoms und eines Genexpressionsatlasses. Das Genom des Zanders wurde auch als Referenz für vergleichende Genomanalysen und populationsgenetische Analysen bei domestizierten Individuen verwendet, um die Landschaft der genetischen Variationen zu ermitteln
    corecore