Developing computational tools and datasets to investigate the genomic loci associated with disease

Abstract

The majority of genetic variants associated with complex diseases are located in non-coding, regulatory regions of the genome. Understanding the genetic mechanisms of the progression of these diseases has been largely advanced by sequencing-based genomic techniques including RNA-seq, ChIP-seq, Hi-C, genome-wide association studies (GWAS), and Quantitative Trait Locus (QTL) mapping. However, the genetic underpinnings of disease have been difficult to interpret largely because (1) currently available visualization software lacks the ability to efficiently and programmatically integrate large volumes of complex multi-omic data and (2) there are few datasets in disease-relevant cell types in which genomic changes are tracked in response to disease-specific stimuli. In the first part of this work I describe plotgardener, a new R programmatic library for efficiently and reproducibly plotting publication-quality, multi-panel genomic figures. Plotgardener provides customizable genomic plotting and annotation functions that allows users to size and arrange plots in precisely-defined coordinate systems based upon user-defined units of measurement. I include example use cases with plotgardener, both with genomic data and ggplot2 objects, and also have extensively documented and freely available code for the package through Bioconductor and GitHub. I then go on to create and investigate the first response allelic imbalance (AI) and eQTL (reQTL) datasets using an ex vivo model of osteoarthritis (OA) whereby chondrocytes are stimulated with fibronectin fragment (FN-f), a known OA trigger. AI analysis revealed 55 unique genetic variants exhibiting AI at 58 positional genes only after FN-f treatment, with some of these genes exhibiting differential expression. reQTL mapping identified 384 eGenes specific to FN-f treated samples, and colocalization of identified reQTLs with GWAS of various OA phenotypes revealed one robust colocalization of a reQTL with multiple OA phenotypes. I also use plotgardener to visualize these datasets within the context of the genes and linkage disequilibrium (LD) structure of the region. Overall, these studies have resulted in the creation of a broadly applicable genomic visualization tool and novel datasets to provide critical insights into the genetic basis of osteoarthritis.Doctor of Philosoph

    Similar works