13 research outputs found

    Using Rare Genetic Variation to Understand Human Demography and the Etiology of Complex Traits.

    Full text link
    Modern sequencing technology has revolutionized almost every aspect of human genetics research. Among the novel findings made possible by the sequencing of large samples is how abundant extremely rare genetic variation is in the human genome. Rare genetic variants are likely to have arisen recently. Thus, they provide novel information about recent population history, and because selection has had little time to act on them, sets of rare variants are potentially enriched with important regulatory and biologically functional variants. Detecting associations between rare variants and genetic traits is challenging; conventional single marker association statistics have little power at low allele counts. Several statistics that aggregate information from multiple variants to increase power and detect group-wise associations have been proposed. In chapter 2 we address the robustness of these group-based tests to population stratification. Using the joint site frequency spectrum of samples from multiple European populations, we show that group-based tests cluster into two classes, and p-value inflation in each class is correlated with a specific form of population structure. An abundance of rare genetic variation is evidence of recent population growth. Large sequencing studies have found the frequency spectra they observe in their samples are inconsistent with models of simple exponential growth, likely due to a recent acceleration in the growth rate. To address this, in chapter 3 we propose a two-parameter model of accelerating, faster-than-exponential population growth and incorporate it into the coalescent. We show that our model can generate samples containing large quantities of rare genetic variants without inflating the quantity of more common variants, making them well suited to modeling the recent history of humans. In chapter 4 we develop a series of analytic calculations that allow us to directly sample internal and external branches from a sample's genealogy without resorting to full coalescent simulations. We show that for constant size populations an exact probability function can be defined for branch lengths, and that by using the expected times between coalescent events we can expand our method to a broader range of demographic models.PhDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/109028/1/mreppell_1.pd

    HLA-DQA1*05 carriage associated with development of anti-drug antibodies to infliximab and adalimumab in patients with Crohn's Disease

    Get PDF
    Anti-tumor necrosis factor (anti-TNF) therapies are the most widely used biologic drugs for treating immune-mediated diseases, but repeated administration can induce the formation of anti-drug antibodies. The ability to identify patients at increased risk for development of anti-drug antibodies would facilitate selection of therapy and use of preventative strategies.This article is freely available via Open Access. Click on Publisher URL to access the full-text

    Using pseudoalignment and base quality to accurately quantify microbial community composition

    No full text
    <div><p>Pooled DNA from multiple unknown organisms arises in a variety of contexts, for example microbial samples from ecological or human health research. Determining the composition of pooled samples can be difficult, especially at the scale of modern sequencing data and reference databases. Here we propose a novel method for taxonomic profiling in pooled DNA that combines the speed and low-memory requirements of k-mer based pseudoalignment with a likelihood framework that uses base quality information to better resolve multiply mapped reads. We apply the method to the problem of classifying 16S rRNA reads using a reference database of known organisms, a common challenge in microbiome research. Using simulations, we show the method is accurate across a variety of read lengths, with different length reference sequences, at different sample depths, and when samples contain reads originating from organisms absent from the reference. We also assess performance in real 16S data, where we reanalyze previous genetic association data to show our method discovers a larger number of quantitative trait associations than other widely used methods. We implement our method in the software Karp, for k-mer based analysis of read pools, to provide a novel combination of speed and accuracy that is uniquely suited for enhancing discoveries in microbial studies.</p></div

    Materials and methods definitions.

    No full text
    <p>Materials and methods definitions.</p

    Computational requirements and speed of Karp, Kallisto, SINTAX, UCLUST, USEARCH61, SortMeRNA, and the Wang <i>et al.</i> (2007) Naive Bayes using Mothur.

    No full text
    <p>All programs were run using 12 multi-threaded cores except Mothur. Mothur’s memory requirements scale with the number of cores used, and in order to keep memory <16GB we limited it to 4 cores. The values for UCLUST and USEARCH give the time to assign taxonomy, generally with these methods reads are clustered before taxonomy is assigned and the value in parenthesis gives the time to first cluster and then assign taxonomy. The results for the method 16S Classifier are not shown: it was fast and required a few minutes at most; however its memory usage scaled dramatically with the number of reads. To keep memory usage < 128GB samples needed to be split into several smaller samples and then reassembled. Additionally, it could not be run in parallel, so a meaningful comparison against the other methods of speed and memory requirements was not possible.</p

    151bp and 301bp simulation results.

    No full text
    <p>Average relative errror (AVGRE) with 95% confidence intervals from the taxonomic quantification of simulated samples with 151bp paired-end and 301bp paired-end reads. Per reference error was calculated using the classifications from Karp, Kallisto, UCLUST, USEARCH, and SortMeRNA Only references with frequencies >0.1% in either the true reference or a simulated sample were used to calculate error. (A) Per reference error in full 16S gene samples of 151bp paired-end reads. (B) Per reference level error in 151bp paired-end samples simulated using only the V4 hypervariable region of the 16S gene. (C) Per reference error in full 16S gene samples of 301bp paired-end reads. (D) Per reference error in V4 samples of 301bp paired-end reads.</p

    Results with missing taxa.

    No full text
    <p>Accuracy when the reference database used for quantification is missing taxa found in the sample. For each of one phylum (<i>Acidobacteria</i>), one order (<i>Pseudomonadales</i>), and one genus (<i>Clostridiisalibacter</i>), 10 samples were simulated where 50% of the reads originated from the noted taxa. Each sample was classified with the full GreenGenes database and also a reduced version of the database lacking all members of the taxa which had been used to simulate the sample. The accuracy of estimates by Karp, Kallisto, and UCLUST for the 50% of the samples that did not originate from the absent taxa were compared with their true frequencies. Black bars give 95% confidence intervals.</p

    Overview of Karp.

    No full text
    <p>(1) Query reads are pseudoaligned against an index of the reference database, resulting in a set of references they could have potentially originated from. (2) The query reads are locally aligned to the possible references. (3) Using the best alignment, the likelihood that a read originated from a specific reference is calculated. (4) Using the read likelihoods an EM-algorithm is employed to estimate the relative abundances of the references in the pool of query reads.</p
    corecore