298 research outputs found
Epigenomic and Transcriptomic Profiling for the Study of Monogenic and Polygenic Traits and Disease
Many trait-associated genomic loci are in non-coding regions of the genome. Determining which genetic variants in these regions are causally related to a trait and elucidating their downstream effects can be difficult. Layering transcriptomic and epigenomic data on top of genetic variation data can help nominate causal phenotype-associated variants and generate hypotheses about their effects in different cellular contexts.
In this thesis, I first apply RNA-sequencing (RNA-seq) and the assay for transposase accessible chromatin using sequencing (ATAC-seq) to investigate gene expression and chromatin accessibility in the Danforth mouse, a model of caudal birth defects. The Danforth phenotype results from an endogenous retroviral insertion near the Ptf1a gene. I identify 49 genes differentially expressed between Danforth and WT E9.5 tailbuds, including increased expression of Ptf1a and the nearby Gm13344 lncRNA in Danforth. A gene ontology enrichment analysis indicates differentially expressed genes are enriched in the hedgehog signaling pathway, suggesting disruption of hedgehog signaling may cause the Danforth phenotype. I identify one region of increased chromatin accessibility in Danforth relative to WT mice, localizing to the Gm13344 promoter. This region is orthologous to a human PTF1A enhancer, suggesting it may mediate Ptf1a overexpression in the Danforth mouse.
Next, I apply a software package for the quality control of ATAC-seq data (developed in our lab) to public datasets to measure heterogeneity, and analyze GM12878 ATAC-seq data to quantify the impact of Tn5 transposase concentration and sequencing lane cluster density. I find that increasing cluster density shifts the ATAC-seq fragment length distribution towards shorter fragments and results in greater transcription start site enrichment. I show that increasing Tn5 transposase concentration increases the enrichment of reads in enhancers and promoters, with ~80% of ATAC-seq peaks showing increased signal with increasing Tn5 concentration (5% FDR). Peaks bound by the CTCF transcription factor are less sensitive to Tn5 concentration than those bound by other transcription factors. This analysis demonstrates the difficulties in reliably quantifying chromatin accessibility and utilizing public datasets.
I then apply single-nucleus ATAC-seq and RNA-seq to human and rat skeletal muscle to generate cell type specific transcriptomic and chromatin accessibility maps. I integrate these maps with UK Biobank genome-wide association study (GWAS) data to explore enrichment of GWAS signals in cell type specific ATAC-seq peaks. I demonstrate the utility of these maps by nominating causal genetic variants and cell types at several GWAS loci, including the T2D-associated ARL15 locus. At the ARL15 locus I nominate a credible set variant in a highly mesenchymal stem cell specific ATAC-seq peak.
Lastly, to gain insight into the genetic regulation of chromatin architecture and its association with aerobic exercise capacity, I analyze skeletal muscle ATAC-seq (n = 129) and RNA-seq (n = 143) from a rat model for untrained running capacity. Although no genes associate with running capacity at 5% FDR, a gene ontology enrichment analysis indicates that the genes with the strongest association are enriched in fatty acid oxidation pathways, consistent with previous findings in this rat model. I identify no ATAC-seq peaks associated with running capacity (5% FDR) but find 4,477 ATAC-seq peaks associate with at least one SNP (5% FDR).
Together, these projects demonstrate the value of epigenomic and transcriptomic data in the investigation of monogenic and polygenic traits, as well as the challenges and limitations of applying epigenomic and transcriptomic data in this context.PHDBioinformaticsUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163000/1/porchard_1.pd
Recommended from our members
Assessment of needs and feasibility of commercial production of tropical fruits and vegetables for diversified exports in Ethiopia and Sudan
Diversification into the production and export of horticultural crops is a strategy increasingly adopted by developing countries to enhance incomes, employment and foreign exchange earnings. However, a relatively small number of countries dominate exports of horticultural products from Sub-Saharan Africa and for most African countries the horticulture export sector has remained very small and/or has experienced severe bottlenecks to expansion. The Common Fund for Commodities (CFC) is financing projects to assist Least Developed Countries in the diversification of their commodity exports. As part of this programme, the CFC is funding this pilot project in Ethiopia and the Sudan to assess the needs and feasibility of developing commercial production of high value tropical fruit and vegetable products based on these countries' comparative advantage. The purpose of this report is to identify and prioritise the various capacity building measures that need to be devised and developed to overcome these constraints and thus facilitate an expansion of fruit and vegetable exports. The aim is to advise the relevant international and national institutions on modalities to adopt in order to strengthen the existing horticultural strategy in Ethiopia and Sudan and thus reduce poverty while enhancing the livelihood benefits accruing. The report has five chapters. The first is an introduction, while aspects of horticultural production and trade in Ethiopia and Sudan are discussed in Chapters 2 and 3 respectively. Chapter 4 reviews the international market situation looking specifically at European Union and Middle East markets. Chapter 5 contains conclusions and recommendations of the study. In addition, there are 7 Annexes, covering methodology, contacts and itinerary, workshops in Sudan and Ethiopia, detailed production and trade data, and materials consulted
Sparse inverse covariance estimation in Gaussian graphical models
One of the fundamental tasks in science is to find explainable relationships between
observed phenomena. Recent work has addressed this problem by attempting to learn
the structure of graphical models - especially Gaussian models - by the imposition of
sparsity constraints.
The graphical lasso is a popular method for learning the structure of a Gaussian
model. It uses regularisation to impose sparsity. In real-world problems, there may be
latent variables that confound the relationships between the observed variables. Ignoring
these latents, and imposing sparsity in the space of the visibles, may lead to the
pruning of important structural relationships. We address this problem by introducing
an expectation maximisation (EM) method for learning a Gaussian model that is
sparse in the joint space of visible and latent variables. By extending this to a conditional
mixture, we introduce multiple structures, and allow side information to be used
to predict which structure is most appropriate for each data point. Finally, we handle
non-Gaussian data by extending each sparse latent Gaussian to a Gaussian copula. We
train these models on a financial data set; we find the structures to be interpretable, and
the new models to perform better than their existing competitors.
A potential problem with the mixture model is that it does not require the structure
to persist in time, whereas this may be expected in practice. So we construct an input-output
HMM with sparse Gaussian emissions. But the main result is that, provided the
side information is rich enough, the temporal component of the model provides little
benefit, and reduces efficiency considerably.
The GWishart distribution may be used as the basis for a Bayesian approach to
learning a sparse Gaussian. However, sampling from this distribution often limits the
efficiency of inference in these models. We make a small change to the state-of-the-art
block Gibbs sampler to improve its efficiency. We then introduce a Hamiltonian
Monte Carlo sampler that is much more efficient than block Gibbs, especially in high
dimensions. We use these samplers to compare a Bayesian approach to learning a
sparse Gaussian with the (non-Bayesian) graphical lasso. We find that, even when
limited to the same time budget, the Bayesian method can perform better.
In summary, this thesis introduces practically useful advances in structure learning
for Gaussian graphical models and their extensions. The contributions include the addition
of latent variables, a non-Gaussian extension, (temporal) conditional mixtures,
and methods for efficient inference in a Bayesian formulation
Recommended from our members
The production of fresh produce in Africa for export to the United Kingdom: mapping different value chains
This report maps the involvement of African smallholders in supplying produce to UK markets (with emphasis on detailed characterisation of UK markets) by determining origin of product, types of product, volumes, values and numbers of smallholders involved and destination markets. The study is important because there is strong evidence that exporters and importers are moving away from the smallest of growers, not because of product quality or productivity, but because of transaction costs associated with private retailer standards. At present, it is not clear whether production by small-scale farmers throughout Africa destined for export to retailers abroad can remain viable
Computational Semantics with Functional Programming, by Jan van Eijck and Christina Unger
One of the fundamental tasks of science is to find explainable relationships
between observed phenomena. One approach to this task that has received
attention in recent years is based on probabilistic graphical modelling with
sparsity constraints on model structures. In this paper, we describe two new
approaches to Bayesian inference of sparse structures of Gaussian graphical
models (GGMs). One is based on a simple modification of the cutting-edge block
Gibbs sampler for sparse GGMs, which results in significant computational gains
in high dimensions. The other method is based on a specific construction of the
Hamiltonian Monte Carlo sampler, which results in further significant
improvements. We compare our fully Bayesian approaches with the popular
regularisation-based graphical LASSO, and demonstrate significant advantages of
the Bayesian treatment under the same computing costs. We apply the methods to
a broad range of simulated data sets, and a real-life financial data set
Wavelength Tunability of Ion-bombardment Induced Ripples on Sapphire
A study of ripple formation on sapphire surfaces by 300-2000 eV Ar+ ion
bombardment is presented. Surface characterization by in-situ synchrotron
grazing incidence small angle x-ray scattering and ex-situ atomic force
microscopy is performed in order to study the wavelength of ripples formed on
sapphire (0001) surfaces. We find that the wavelength can be varied over a
remarkably wide range-nearly two orders of magnitude-by changing the ion
incidence angle. Within the linear theory regime, the ion induced viscous flow
smoothing mechanism explains the general trends of the ripple wavelength at low
temperature and incidence angles larger than 30. In this model, relaxation is
confined to a few-nm thick damaged surface layer. The behavior at high
temperature suggests relaxation by surface diffusion. However, strong smoothing
is inferred from the observed ripple wavelength near normal incidence, which is
not consistent with either surface diffusion or viscous flow relaxation.Comment: Revtex4, 19 pages, 10 figures with JPEG forma
Epidemiology of epidermolysis bullosa in the antipodes: The Australasian epidermolysis bullosa registry with a focus on Herlitz junctional epidermolysis bullosa
To present epidemiologic and clinical data from the Australasian Epidermolysis Bullosa (EB) Registry, the first orphan disease registry in Australia. Design: Observational study (cross-sectional and longitudinal). Setting: Australian private dermatology practice, inpatient ward, and outpatient clinic. Patients: Systematic case finding of patients with EB simplex, junctional EB (JEB), and dystrophic EB and data collection were performed throughout Australia and New Zealand from January 1, 2006, through December 31, 2008. Patients were consecutively enrolled in the study after clinical assessment and laboratory diagnosis. Medical records were retrospectively examined, and physicians involved in EB care were contacted to obtain patient history. A Herlitz JEB case series was prepared from registry data. Main Outcome Measures: Demographics and prognosis of patients with Herlitz JEB. Results: A total of 259 patients were enrolled in the study: 139 with EBS, 91 with dystrophic EB, 28 with JEB, and 1 with Kindler syndrome. Most enrollees were Australian citizens (n=243), with an Australian prevalence rate of 10.3 cases per million. The age range in the registry was birth to 99 years, with a mean and median age of 24.1 and 18.0 years, respectively. Ages were similar in patients with EBS and dominant dystrophic EB but were markedly lower in patients with JEB. Patients with Herlitz JEB (n=10) had the highest morbidity and mortality rates, with a mean age at death of 6.8 months. Sepsis, failure to thrive, and tracheolaryngeal complications were the leading causes of death. Conclusions: The Australasian EB registry is the first registry in Australia and New Zealand to provide original data on age, sex, ethnicity, and geographical and disease subtype distribution. The Australasian Herlitz JEB cohort witnessed a high infant mortality rate and poor prognosis overall
- âŠ