155 research outputs found

    An eQTL biological data visualization challenge and approaches from the visualization community

    Get PDF
    In 2011, the IEEE VisWeek conferences inaugurated a symposium on Biological Data Visualization. Like other domain-oriented Vis symposia, this symposium's purpose was to explore the unique characteristics and requirements of visualization within the domain, and to enhance both the Visualization and Bio/Life-Sciences communities by pushing Biological data sets and domain understanding into the Visualization community, and well-informed Visualization solutions back to the Biological community. Amongst several other activities, the BioVis symposium created a data analysis and visualization contest. Unlike many contests in other venues, where the purpose is primarily to allow entrants to demonstrate tour-de-force programming skills on sample problems with known solutions, the BioVis contest was intended to whet the participants' appetites for a tremendously challenging biological domain, and simultaneously produce viable tools for a biological grand challenge domain with no extant solutions. For this purpose expression Quantitative Trait Locus (eQTL) data analysis was selected. In the BioVis 2011 contest, we provided contestants with a synthetic eQTL data set containing real biological variation, as well as a spiked-in gene expression interaction network influenced by single nucleotide polymorphism (SNP) DNA variation and a hypothetical disease model. Contestants were asked to elucidate the pattern of SNPs and interactions that predicted an individual's disease state. 9 teams competed in the contest using a mixture of methods, some analytical and others through visual exploratory methods. Independent panels of visualization and biological experts judged entries. Awards were given for each panel's favorite entry, and an overall best entry agreed upon by both panels. Three special mention awards were given for particularly innovative and useful aspects of those entries. And further recognition was given to entries that correctly answered a bonus question about how a proposed "gene therapy" change to a SNP might change an individual's disease status, which served as a calibration for each approaches' applicability to a typical domain question. In the future, BioVis will continue the data analysis and visualization contest, maintaining the philosophy of providing new challenging questions in open-ended and dramatically underserved Bio/Life Sciences domains

    Advanced Visual Analytics Approaches for the Integrative Study of Genomic and Transcriptomic Data

    Get PDF
    The advances in next-generation sequencing (NGS) technology enabled rapid and cost-effective whole genome analyses. Nowadays, it is known that individual organisms have unique genome sequences and that differences between these sequences are the reason for genetic diversity. Furthermore, the biomolecular processes of living organisms are steered by genes and the interplay of their products. Perturbations in these systems often lead to disease. Thus, one of the major question in biomedical research is how genetic variations influence gene function, and how these affect underlying biological pathways and gene interaction networks. One of the most common sources of genetic diversity are single nucleotide variations (SNVs). So-called Genome Wide Association Studies (GWAS) as well as expression Quantitative Trait Locus (eQTL) studies intend to associate SNVs with e.g. disease related binary or quantitative traits. However, available methods are usually limited to statistical analyses and previous approaches to improve the interpretation of the respective results are often insufficient. The goal of this dissertation was the development of new visual analytical approaches to assist purely statistical methods in the identification, characterization and interpretation of SNVs. Genomic variations, especially SNVs, also play an important role in the immensely growing field of paleogenetics, where DNA of ancient origin is compared to modern DNA with the intention to gain insights into evolutionary history. In this dissertation, a computational pipeline for comparative NGS analyses of ancient and modern DNA samples has been described. Special attention was given to the read merging step, which is required to cope with the quality limitations inherent to ancient DNA (aDNA), in particular DNA fragmentation and nucleotide misincorporation. In addition, aDNA is usually only retrievable in low amounts and it is often contaminated with DNA of modern microorganisms. To solve this issue, a highly economical microarray-based DNA capturing strategy has been developed for the parallel detection and enrichment of aDNA from up to 100 different human pathogens

    iHAT: interactive Hierarchical Aggregation Table for Genetic Association Data

    Get PDF
    In the search for single-nucleotide polymorphisms which influence the observable phenotype, genome wide association studies have become an important technique for the identification of associations between genotype and phenotype of a diverse set of sequence-based data. We present a methodology for the visual assessment of single-nucleotide polymorphisms using interactive hierarchical aggregation techniques combined with methods known from traditional sequence browsers and cluster heatmaps. Our tool, the interactive Hierarchical Aggregation Table (iHAT), facilitates the visualization of multiple sequence alignments, associated metadata, and hierarchical clusterings. Different color maps and aggregation strategies as well as filtering options support the user in finding correlations between sequences and metadata. Similar to other visualizations such as parallel coordinates or heatmaps, iHAT relies on the human pattern-recognition ability for spotting patterns that might indicate correlation or anticorrelation. We demonstrate iHAT using artificial and real-world datasets for DNA and protein association studies as well as expression Quantitative Trait Locus data

    From Classical to Modern Computational Approaches to Identify Key Genetic Regulatory Components in Plant Biology

    Get PDF
    The selection of plant genotypes with improved productivity and tolerance to environmental constraints has always been a major concern in plant breeding. Classical approaches based on the generation of variability and selection of better phenotypes from large variant collections have improved their efficacy and processivity due to the implementation of molecular biology techniques, particularly genomics, Next Generation Sequencing and other omics such as proteomics and metabolomics. In this regard, the identification of interesting variants before they develop the phenotype trait of interest with molecular markers has advanced the breeding process of new varieties. Moreover, the correlation of phenotype or biochemical traits with gene expression or protein abundance has boosted the identification of potential new regulators of the traits of interest, using a relatively low number of variants. These important breakthrough technologies, built on top of classical approaches, will be improved in the future by including the spatial variable, allowing the identification of gene(s) involved in key processes at the tissue and cell levels

    Molecular epidemiology study on genetically regulated gene expression in the colonic mucosa and its role in disease susceptibility

    Full text link
    [spa] La expresión genética es un proceso celular clave, que además está relacionado con la susceptibilidad genética a enfermedades y rasgos complejos. La mayoría de genes se someten a splicing alternativo (AS). Las variantes genéticas que regulan la expresión genética y el AS se llaman ¿quantitative trait loci¿ (e/sQTLs). Técnicas estadísticas permiten predecir in silico la expresión genética en un tejido concreto a partir de datos genéticos. Esta aproximación se lleva a cabo en los estudios de asociación de transcriptoma completo (TWAS). Esta Tesis se compone de tres objetivos principales y presenta tres artículos. 1) Generar perfiles de expresión genética de la mucosa colónica de individuos sanos, así como sus diferencias a lo largo del colon y sus e/sQTLs asociados; 2) Desarrollar una aplicación web que permita explorar los datos de expresión genética en el colon; 3) Llevar a cabo un TWAS para proponer genes de susceptibilidad a enfermedad inflamatoria intestinal (EII). Como resumen de los resultados, 1) se generaron catálogos de e/sQTLs a partir de nuevos datos de expresión genética en colon de 445 individuos, y se encontraron más de 4,000 genes que varían sus niveles de expresión a lo largo del colon; 2) se desarrolló el "Colon Transcriptome Explorer", disponible públicamente en https://barcuvaseq.org/cotrex/; 3) se propusieron más de doscientos genes de susceptibilidad genética a EII. En conclusión, nuestros estudios proporcionan nuevos datos y evidencias sobre los genes involucrados en mecanismos de susceptibilidad a enfermedades relacionadas con el colon, y servirán de guía a otros investigadores para proponer nuevas hipótesis en este campo

    Identification of Biomarker Systems of Autism Spectrum Disorder and Uterine Cancer

    Get PDF
    Complex diseases and disorders pose a challenge to scientists due to their variable and often inconsistent genetic and environmental underpinnings across affected individuals. Because of this variability, large condition-specific datasets and corresponding analytical tools and approaches are being curated as resources to investigate potential genetic trends in complex diseases and disorders. In this Dissertation, I used DNA- and RNA-based resources to discover polygenic biosignatures associated with Autism Spectrum Disorder (ASD) or uterine cancer. To explore the intersection of small-effect common DNA variants and regulation in ASD, I discovered and analyzed trends in allelic associations at eQTLs within ASD-affected individuals. Association of eQTLs underlying any phenotype brings the genetic variation closer to biochemical mechanism leading to phenotypic expression. Uterine cancer was additionally investigated using gene expression profiles from normal and cancerous uterine tissue samples, from which gene co-expression networks and corresponding gene regulatory networks were built and further studied. The biomarker discoveries discussed here reflect the importance of dry lab resources and the potential they hold for future discovery

    Genome-wide association meta-analysis of spontaneous coronary artery dissection identifies risk variants and genes related to artery integrity and tissue-mediated coagulation

    Get PDF
    Spontaneous coronary artery dissection (SCAD) is an understudied cause of myocardial infarction primarily affecting women. It is not known to what extent SCAD is genetically distinct from other cardiovascular diseases, including atherosclerotic coronary artery disease (CAD). Here we present a genome-wide association meta-analysis (1,917 cases and 9,292 controls) identifying 16 risk loci for SCAD. Integrative functional annotations prioritized genes that are likely to be regulated in vascular smooth muscle cells and artery fibroblasts and implicated in extracellular matrix biology. One locus containing the tissue factor gene F3, which is involved in blood coagulation cascade initiation, appears to be specific for SCAD risk. Several associated variants have diametrically opposite associations with CAD, suggesting that shared biological processes contribute to both diseases, but through different mechanisms. We also infer a causal role for high blood pressure in SCAD. Our findings provide novel pathophysiological insights involving arterial integrity and tissue-mediated coagulation in SCAD and set the stage for future specific therapeutics and preventions

    Epigenome-wide association study in peripheral tissues highlights DNA methylation profiles associated with episodic memory performance in humans

    Get PDF
    The decline in episodic memory (EM) performance is a hallmark of cognitive aging and an early clinical sign in Alzheimer’s disease (AD). In this study, we conducted an epigenome-wide association study (EWAS) using DNA methylation (DNAm) profiles from buccal and blood samples for cross-sectional (n = 1019) and longitudinal changes in EM performance (n = 626; average follow-up time 5.4 years) collected under the auspices of the Lifebrain consortium project. The mean age of participants with cross-sectional data was 69 ± 11 years (30–90 years), with 50% being females. We identified 21 loci showing suggestive evidence of association (p < 1 × 10−5) with either or both EM phenotypes. Among these were SNCA, SEPW1 (both cross-sectional EM), ITPK1 (longitudinal EM), and APBA2 (both EM traits), which have been linked to AD or Parkinson’s disease (PD) in previous work. While the EM phenotypes were nominally significantly (p < 0.05) associated with poly-epigenetic scores (PESs) using EWASs on general cognitive function, none remained significant after correction for multiple testing. Likewise, estimating the degree of “epigenetic age acceleration” did not reveal significant associations with either of the two tested EM phenotypes. In summary, our study highlights several interesting candidate loci in which differential DNAm patterns in peripheral tissue are associated with EM performance in humans
    corecore