14 research outputs found

    Identification and correction of systematic error in high-throughput sequence data

    Get PDF
    A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed “next-gen” sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of _systematic_ error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. We characterize and describe systematic errors using overlapping paired reads form high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that quality scores at systematic error sites do not account for the extent of errors. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq). Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments

    Sex and Fear: Mathematical models of mate choice, parental care, and maladaptive anxiety

    No full text
    Thesis (Ph.D.)--University of Washington, 2018In many contexts, animals must infer salient information about another individual indirectly by observing some other characteristic of that individual. In Chapter 1 of this thesis, a model of costly signaling is developed to investigate how stochastic signal costs influence the overall cost of communication. Chapter 2 presents a model of mate choice where females must infer from his appearance whether a potential mate will choose to be a good parent to the future offspring. Chapters 3 and 4 deal with mathematical models of anxiety disorders. These disorders affect a huge number of people and can be tremendously disabling. But it is clear that the capacity for anxiety is an evolutionary adaptation. This presents a puzzle: why has natural selection not protected us from such a common malfunctioning of an adaptation? Chapter 3 develops a model that shows how the basic information constraints inherent in the problem of learning about an environment can unavoidably cause a subset of the population to be overly sensitive to signs of danger. Chapter 4 addresses the perplexing observation that as the society of developed countries has continually become safer, anxiety has increased rather than decreased. A model is presented that shoes how the mismatch between a modern environment and the environment to which we adapted can cause this seemingly paradoxical increase in levels of anxiety. This result is in some ways analogous to the well-known ``hygiene hypothesis" of inflammatory and autoimmune diseases

    IGF Deficiency

    No full text
    corecore