731 research outputs found

    Evaluating the accuracy of a functional SNP annotation system

    Get PDF
    Many common and chronic diseases are influenced at some level by genetic variation. Research done in population genetics, specifically in the area of single nucleotide polymorphisms (SNPs) is critical to understanding human genetic variation. A key element in assessing role of a given SNP is determining if the variation is likely to result in change in function. The SNP Integration Tool (SNPit) is a comprehensive tool that integrates diverse, existing predictors of SNP functionality, providing the user with information for improved association study analysis. To evaluate the SNPit system, we developed an alternative gold standard to measure accuracy using sensitivity and specificity. The results of our evaluation demonstrated that our alternative gold standard produced encouraging results

    Computational Approaches for Analyzing High-Throughput Genomic Data

    Full text link
    With the improvement of high-throughput technologies, association studies related to molecular phenotypes have become increasingly significant. Associated genetic variants found from studies based on high-throughput omics experiments provide valuable information to help understand biological mechanisms behind complex traits. While analyses using high-throughput data can play a crucial role to study complex traits, many analytical challenges remain unresolved. This dissertation primarily focuses on two outstanding issues in genetic association analysis of high-throughput sequence data. First, when incorporating functional annotations into multi-SNP association analyses and the number of candidate SNPs increases, computational burden increases. Second, there is a need to identify reproducible signals between studies. Measuring reproducibility between assays in high-throughput experiments and association results between studies is crucial to assess the quality of the overall procedures and the association evidence. In Chapter 2, we propose an algorithm to incorporate functional annotations into Bayesian multi-SNP analysis based on a probabilistic hierarchical model. The proposed algorithm, name as deterministic approximation of posteriors (DAP), shows superior accuracy and computational efficiency over the existing methods, including Markov Chain Monte Carlo (MCMC) algorithms to fit a sparse Bayesian variable selection model. In Chapter 3, we propose a probabilistic quantification of association evidence, accounting for linkage disequilibrium (LD). By identifying a set of SNPs in LD and representing a single association signal, we are able to construct credible sets and perform appropriate false discovery rate (FDR) control in Bayesian multi-SNP association analysis. We also derive a set of sufficient summary statistics that lead to equivalent inference results as using individual-level data. In Chapter 4, we propose a set of computational methods to measure reproducibility among high-throughput sequencing experiments. In particular, we propose a statistical approach to take advantage of the fact that a strong and genuine signal is expected to show the same directional effects in multiple studies.We design a novel Bayesian hierarchical model and estimate the posterior probability of each testing unit (e,g, SNP) being reproducible under a proposed set of prior probabilities. We also propose visualization tools and quantification measures tool to assess the overall reproducibility among multiple experiments. In three chapters of the dissertation, we discuss several issues in studies utilizing high-throughput data and propose computational methods to deal with these issues.PHDBiostatisticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/147631/1/yejilee_1.pd

    TRAPLINE: a standardized and automated pipeline for RNA sequencing data analysis, evaluation and annotation

    Get PDF
    Background: Technical advances in Next Generation Sequencing (NGS) provide a means to acquire deeper insights into cellular functions. The lack of standardized and automated methodologies poses a challenge for the analysis and interpretation of RNA sequencing data. We critically compare and evaluate state-of-the-art bioinformatics approaches and present a workflow that integrates the best performing data analysis, data evaluation and annotation methods in a Transparent, Reproducible and Automated PipeLINE (TRAPLINE) for RNA sequencing data processing (suitable for Illumina, SOLiD and Solexa). Results: Comparative transcriptomics analyses with TRAPLINE result in a set of differentially expressed genes, their corresponding protein-protein interactions, splice variants, promoter activity, predicted miRNA-target interactions and files for single nucleotide polymorphism (SNP) calling. The obtained results are combined into a single file for downstream analysis such as network construction. We demonstrate the value of the proposed pipeline by characterizing the transcriptome of our recently described stem cell derived antibiotic selected cardiac bodies ('aCaBs'). Conclusion: TRAPLINE supports NGS-based research by providing a workflow that requires no bioinformatics skills, decreases the processing time of the analysis and works in the cloud. The pipeline is implemented in the biomedical research platform Galaxy and is freely accessible via www.sbi.uni-rostock.de/RNAseqTRAPLINE or the specific Galaxy manual page (https://usegalaxy.org/u/mwolfien/p/trapline-manual)

    Selected abstracts of “Bioinformatics: from Algorithms to Applications 2020” conference

    Get PDF
    El documento solamente contiene el resumen de la ponenciaUCR::VicerrectorĂ­a de InvestigaciĂłn::Unidades de InvestigaciĂłn::Ciencias de la Salud::Centro de InvestigaciĂłn en Enfermedades Tropicales (CIET)UCR::VicerrectorĂ­a de Docencia::Salud::Facultad de MicrobiologĂ­

    Automating Genomic Data Mining via a Sequence-based Matrix Format and Associative Rule Set

    Get PDF
    There is an enormous amount of information encoded in each genome – enough to create living, responsive and adaptive organisms. Raw sequence data alone is not enough to understand function, mechanisms or interactions. Changes in a single base pair can lead to disease, such as sickle-cell anemia, while some large megabase deletions have no apparent phenotypic effect. Genomic features are varied in their data types and annotation of these features is spread across multiple databases. Herein, we develop a method to automate exploration of genomes by iteratively exploring sequence data for correlations and building upon them. First, to integrate and compare different annotation sources, a sequence matrix (SM) is developed to contain position-dependant information. Second, a classification tree is developed for matrix row types, specifying how each data type is to be treated with respect to other data types for analysis purposes. Third, correlative analyses are developed to analyze features of each matrix row in terms of the other rows, guided by the classification tree as to which analyses are appropriate. A prototype was developed and successful in detecting coinciding genomic features among genes, exons, repetitive elements and CpG islands
    • …
    corecore