298 research outputs found

    Epigenomic and Transcriptomic Profiling for the Study of Monogenic and Polygenic Traits and Disease

    Full text link
    Many trait-associated genomic loci are in non-coding regions of the genome. Determining which genetic variants in these regions are causally related to a trait and elucidating their downstream effects can be difficult. Layering transcriptomic and epigenomic data on top of genetic variation data can help nominate causal phenotype-associated variants and generate hypotheses about their effects in different cellular contexts. In this thesis, I first apply RNA-sequencing (RNA-seq) and the assay for transposase accessible chromatin using sequencing (ATAC-seq) to investigate gene expression and chromatin accessibility in the Danforth mouse, a model of caudal birth defects. The Danforth phenotype results from an endogenous retroviral insertion near the Ptf1a gene. I identify 49 genes differentially expressed between Danforth and WT E9.5 tailbuds, including increased expression of Ptf1a and the nearby Gm13344 lncRNA in Danforth. A gene ontology enrichment analysis indicates differentially expressed genes are enriched in the hedgehog signaling pathway, suggesting disruption of hedgehog signaling may cause the Danforth phenotype. I identify one region of increased chromatin accessibility in Danforth relative to WT mice, localizing to the Gm13344 promoter. This region is orthologous to a human PTF1A enhancer, suggesting it may mediate Ptf1a overexpression in the Danforth mouse. Next, I apply a software package for the quality control of ATAC-seq data (developed in our lab) to public datasets to measure heterogeneity, and analyze GM12878 ATAC-seq data to quantify the impact of Tn5 transposase concentration and sequencing lane cluster density. I find that increasing cluster density shifts the ATAC-seq fragment length distribution towards shorter fragments and results in greater transcription start site enrichment. I show that increasing Tn5 transposase concentration increases the enrichment of reads in enhancers and promoters, with ~80% of ATAC-seq peaks showing increased signal with increasing Tn5 concentration (5% FDR). Peaks bound by the CTCF transcription factor are less sensitive to Tn5 concentration than those bound by other transcription factors. This analysis demonstrates the difficulties in reliably quantifying chromatin accessibility and utilizing public datasets. I then apply single-nucleus ATAC-seq and RNA-seq to human and rat skeletal muscle to generate cell type specific transcriptomic and chromatin accessibility maps. I integrate these maps with UK Biobank genome-wide association study (GWAS) data to explore enrichment of GWAS signals in cell type specific ATAC-seq peaks. I demonstrate the utility of these maps by nominating causal genetic variants and cell types at several GWAS loci, including the T2D-associated ARL15 locus. At the ARL15 locus I nominate a credible set variant in a highly mesenchymal stem cell specific ATAC-seq peak. Lastly, to gain insight into the genetic regulation of chromatin architecture and its association with aerobic exercise capacity, I analyze skeletal muscle ATAC-seq (n = 129) and RNA-seq (n = 143) from a rat model for untrained running capacity. Although no genes associate with running capacity at 5% FDR, a gene ontology enrichment analysis indicates that the genes with the strongest association are enriched in fatty acid oxidation pathways, consistent with previous findings in this rat model. I identify no ATAC-seq peaks associated with running capacity (5% FDR) but find 4,477 ATAC-seq peaks associate with at least one SNP (5% FDR). Together, these projects demonstrate the value of epigenomic and transcriptomic data in the investigation of monogenic and polygenic traits, as well as the challenges and limitations of applying epigenomic and transcriptomic data in this context.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163000/1/porchard_1.pd

    Sparse inverse covariance estimation in Gaussian graphical models

    Get PDF
    One of the fundamental tasks in science is to find explainable relationships between observed phenomena. Recent work has addressed this problem by attempting to learn the structure of graphical models - especially Gaussian models - by the imposition of sparsity constraints. The graphical lasso is a popular method for learning the structure of a Gaussian model. It uses regularisation to impose sparsity. In real-world problems, there may be latent variables that confound the relationships between the observed variables. Ignoring these latents, and imposing sparsity in the space of the visibles, may lead to the pruning of important structural relationships. We address this problem by introducing an expectation maximisation (EM) method for learning a Gaussian model that is sparse in the joint space of visible and latent variables. By extending this to a conditional mixture, we introduce multiple structures, and allow side information to be used to predict which structure is most appropriate for each data point. Finally, we handle non-Gaussian data by extending each sparse latent Gaussian to a Gaussian copula. We train these models on a financial data set; we find the structures to be interpretable, and the new models to perform better than their existing competitors. A potential problem with the mixture model is that it does not require the structure to persist in time, whereas this may be expected in practice. So we construct an input-output HMM with sparse Gaussian emissions. But the main result is that, provided the side information is rich enough, the temporal component of the model provides little benefit, and reduces efficiency considerably. The GWishart distribution may be used as the basis for a Bayesian approach to learning a sparse Gaussian. However, sampling from this distribution often limits the efficiency of inference in these models. We make a small change to the state-of-the-art block Gibbs sampler to improve its efficiency. We then introduce a Hamiltonian Monte Carlo sampler that is much more efficient than block Gibbs, especially in high dimensions. We use these samplers to compare a Bayesian approach to learning a sparse Gaussian with the (non-Bayesian) graphical lasso. We find that, even when limited to the same time budget, the Bayesian method can perform better. In summary, this thesis introduces practically useful advances in structure learning for Gaussian graphical models and their extensions. The contributions include the addition of latent variables, a non-Gaussian extension, (temporal) conditional mixtures, and methods for efficient inference in a Bayesian formulation

    Computational Semantics with Functional Programming, by Jan van Eijck and Christina Unger

    Get PDF
    One of the fundamental tasks of science is to find explainable relationships between observed phenomena. One approach to this task that has received attention in recent years is based on probabilistic graphical modelling with sparsity constraints on model structures. In this paper, we describe two new approaches to Bayesian inference of sparse structures of Gaussian graphical models (GGMs). One is based on a simple modification of the cutting-edge block Gibbs sampler for sparse GGMs, which results in significant computational gains in high dimensions. The other method is based on a specific construction of the Hamiltonian Monte Carlo sampler, which results in further significant improvements. We compare our fully Bayesian approaches with the popular regularisation-based graphical LASSO, and demonstrate significant advantages of the Bayesian treatment under the same computing costs. We apply the methods to a broad range of simulated data sets, and a real-life financial data set

    Wavelength Tunability of Ion-bombardment Induced Ripples on Sapphire

    Full text link
    A study of ripple formation on sapphire surfaces by 300-2000 eV Ar+ ion bombardment is presented. Surface characterization by in-situ synchrotron grazing incidence small angle x-ray scattering and ex-situ atomic force microscopy is performed in order to study the wavelength of ripples formed on sapphire (0001) surfaces. We find that the wavelength can be varied over a remarkably wide range-nearly two orders of magnitude-by changing the ion incidence angle. Within the linear theory regime, the ion induced viscous flow smoothing mechanism explains the general trends of the ripple wavelength at low temperature and incidence angles larger than 30. In this model, relaxation is confined to a few-nm thick damaged surface layer. The behavior at high temperature suggests relaxation by surface diffusion. However, strong smoothing is inferred from the observed ripple wavelength near normal incidence, which is not consistent with either surface diffusion or viscous flow relaxation.Comment: Revtex4, 19 pages, 10 figures with JPEG forma

    Epidemiology of epidermolysis bullosa in the antipodes: The Australasian epidermolysis bullosa registry with a focus on Herlitz junctional epidermolysis bullosa

    Get PDF
    To present epidemiologic and clinical data from the Australasian Epidermolysis Bullosa (EB) Registry, the first orphan disease registry in Australia. Design: Observational study (cross-sectional and longitudinal). Setting: Australian private dermatology practice, inpatient ward, and outpatient clinic. Patients: Systematic case finding of patients with EB simplex, junctional EB (JEB), and dystrophic EB and data collection were performed throughout Australia and New Zealand from January 1, 2006, through December 31, 2008. Patients were consecutively enrolled in the study after clinical assessment and laboratory diagnosis. Medical records were retrospectively examined, and physicians involved in EB care were contacted to obtain patient history. A Herlitz JEB case series was prepared from registry data. Main Outcome Measures: Demographics and prognosis of patients with Herlitz JEB. Results: A total of 259 patients were enrolled in the study: 139 with EBS, 91 with dystrophic EB, 28 with JEB, and 1 with Kindler syndrome. Most enrollees were Australian citizens (n=243), with an Australian prevalence rate of 10.3 cases per million. The age range in the registry was birth to 99 years, with a mean and median age of 24.1 and 18.0 years, respectively. Ages were similar in patients with EBS and dominant dystrophic EB but were markedly lower in patients with JEB. Patients with Herlitz JEB (n=10) had the highest morbidity and mortality rates, with a mean age at death of 6.8 months. Sepsis, failure to thrive, and tracheolaryngeal complications were the leading causes of death. Conclusions: The Australasian EB registry is the first registry in Australia and New Zealand to provide original data on age, sex, ethnicity, and geographical and disease subtype distribution. The Australasian Herlitz JEB cohort witnessed a high infant mortality rate and poor prognosis overall
    • 

    corecore