19 research outputs found

    Detection and Removal of Biases in the Analysis of Next-Generation Sequencing Reads

    Get PDF
    Since the emergence of next-generation sequencing (NGS) technologies, great effort has been put into the development of tools for analysis of the short reads. In parallel, knowledge is increasing regarding biases inherent in these technologies. Here we discuss four different biases we encountered while analyzing various Illumina datasets. These biases are due to both biological and statistical effects that in particular affect comparisons between different genomic regions. Specifically, we encountered biases pertaining to the distributions of nucleotides across sequencing cycles, to mappability, to contamination of pre-mRNA with mRNA, and to non-uniform hydrolysis of RNA. Most of these biases are not specific to one analyzed dataset, but are present across a variety of datasets and within a variety of genomic contexts. Importantly, some of these biases correlated in a highly significant manner with biological features, including transcript length, gene expression levels, conservation levels, and exon-intron architecture, misleadingly increasing the credibility of results due to them. We also demonstrate the relevance of these biases in the context of analyzing an NGS dataset mapping transcriptionally engaged RNA polymerase II (RNAPII) in the context of exon-intron architecture, and show that elimination of these biases is crucial for avoiding erroneous interpretation of the data. Collectively, our results highlight several important pitfalls, challenges and approaches in the analysis of NGS reads

    Phylogeny of Parasitic Parabasalia and Free-Living Relatives Inferred from Conventional Markers vs. Rpb1, a Single-Copy Gene

    Get PDF
    Parabasalia are single-celled eukaryotes (protists) that are mainly comprised of endosymbionts of termites and wood roaches, intestinal commensals, human or veterinary parasites, and free-living species. Phylogenetic comparisons of parabasalids are typically based upon morphological characters and 18S ribosomal RNA gene sequence data (rDNA), while biochemical or molecular studies of parabasalids are limited to a few axenically cultivable parasites. These previous analyses and other studies based on PCR amplification of duplicated protein-coding genes are unable to fully resolve the evolutionary relationships of parabasalids. As a result, genetic studies of Parabasalia lag behind other organisms.Comparing parabasalid EF1α, α-tubulin, enolase and MDH protein-coding genes with information from the Trichomonas vaginalis genome reveals difficulty in resolving the history of species or isolates apart from duplicated genes. A conserved single-copy gene encodes the largest subunit of RNA polymerase II (Rpb1) in T. vaginalis and other eukaryotes. Here we directly sequenced Rpb1 degenerate PCR products from 10 parabasalid genera, including several T. vaginalis isolates and avian isolates, and compared these data by phylogenetic analyses. Rpb1 genes from parabasalids, diplomonads, Parabodo, Diplonema and Percolomonas were all intronless, unlike intron-rich homologs in Naegleria, Jakoba and Malawimonas.The phylogeny of Rpb1 from parasitic and free-living parabasalids, and conserved Rpb1 insertions, support Trichomonadea, Tritrichomonadea, and Hypotrichomonadea as monophyletic groups. These results are consistent with prior analyses of rDNA and GAPDH sequences and ultrastructural data. The Rpb1 phylogenetic tree also resolves species- and isolate-level relationships. These findings, together with the relative ease of Rpb1 isolation, make it an attractive tool for evaluating more extensive relationships within Parabasalia

    Track E Implementation Science, Health Systems and Economics

    Full text link
    Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/138412/1/jia218443.pd

    Historical and pooled flood frequency analysis for the River Tay at Perth, Scotland

    No full text
    Improved estimates of UK flood risk during a period of increased climatic variability place challenges on existing methods that rely on short instrumental records. This paper examines the value of using historical data (both documentary and epigraphic) to augment existing gauged records for the River Tay at Perth as part of a multi-method approach to assessing flood risk. Single station and pooled methods are compared with flood risk estimates based on an augmented historical series (1815-2000) using the Generalized Logistic and Generalized Pareto distributions. The value of using an even longer, but less reliable, extended historical series (1210-2000) is also examined. It is recommended that modelling flood risk for return periods >100 years should incorporate historical data, where available, and that a multi-method approach using a high threshold Generalized Pareto distribution can also add confidence in flood risk estimates for return period
    corecore