22 research outputs found

    Network inference in matrix-variate Gaussian models with non-independent noise

    Full text link
    Inferring a graphical model or network from observational data from a large number of variables is a well studied problem in machine learning and computational statistics. In this paper we consider a version of this problem that is relevant to the analysis of multiple phenotypes collected in genetic studies. In such datasets we expect correlations between phenotypes and between individuals. We model observations as a sum of two matrix normal variates such that the joint covariance function is a sum of Kronecker products. This model, which generalizes the Graphical Lasso, assumes observations are correlated due to known genetic relationships and corrupted with non-independent noise. We have developed a computationally efficient EM algorithm to fit this model. On simulated datasets we illustrate substantially improved performance in network reconstruction by allowing for a general noise distribution

    Bio-On-Magnetic-Beads (BOMB): Open platform for high-throughput nucleic acid extraction and manipulation

    Get PDF
    Current molecular biology laboratories rely heavily on the purification and manipulation of nucleic acids. Yet, commonly used centrifuge- and column-based protocols require specialised equipment, often use toxic reagents and are not economically scalable or practical to use in a high-throughput manner. Although it has been known for some time that magnetic beads can provide an elegant answer to these issues, the development of open-source protocols based on beads has been limited. In this article, we provide step-by-step instructions for an easy synthesis of functionalised magnetic beads, and detailed protocols for their use in the high-throughput purification of plasmids, genomic DNA and total RNA from different sources, as well as environmental TNA and PCR amplicons. We also provide a bead-based protocol for bisulfite conversion, and size selection of DNA and RNA fragments. Comparison to other methods highlights the capability, versatility and extreme cost-effectiveness of using magnetic beads. These open source protocols and the associated webpage (https://bomb.bio) can serve as a platform for further protocol customisation and community engagement

    Latent variable models for analysing multidimensional gene expression data

    No full text
    Multi-tissue gene expression studies give rise to 3D arrays of data. These experiments make it possible to study the tissue-specific nature of gene regulation and also the relationship between genotypes and higher level traits such as disease status. Analysing these multidimensional data sets is a statistical challenge, as they contain high noise levels and missing data. In this thesis I introduce a new approach for analysing multidimensional gene expression data sets called SPIDER (SParse Integrated DEcomposition for RNA-sequencing). SPIDER is a sparse Bayesian tensor decomposition that models the data as a sum of components (or factors). Each component consists of three vectors of scores or loadings that describe modes of variation across individuals, genes and tissues. Sparsity is induced in the components using a spike and slab prior, allowing for recovery of sparse structure in the data. The decomposition is easily extended to jointly decompose several data types, handle missing data and allow for relatedness between individuals, another common problem in genetics. Inference for the model is performed using variational Bayes. SPIDER is compared to existing approaches for decomposing multidimensional data via simulations. Results suggest that SPIDER performs comparably to, or better than, existing approaches and particularly well when the underlying signals are very sparse. Additional simulations designed to contain realistic levels of signal and noise suggest that SPIDER has the power to recover gene networks from gene expression data. I have applied SPIDER to gene expression data measured using RNA-sequencing for 845 individuals in three tissues from the TwinsUK cohort. Estimated components were tested for association with genetic variation genome-wide. Five signals describing gene regulation networks driven by genetic variants are uncovered, building on the current understanding of these pathways. In addition, components uncovering effects of experimental artefacts and covariates were also recovered from the data.</p

    Tensor decomposition for multiple-tissue gene expression experiments

    Get PDF
    Genome wide association studies of gene expression traits and other cellular phenotypes have been successful in revealing links between genetic variation and biological processes. The majority of discoveries have uncovered cis eQTL effects via mass univariate testing of SNPs against gene expression in single tissues. We present a Bayesian method for multi-tissue experiments focusing on uncovering gene networks linked to genetic variation. Our method decomposes the 3D array (or tensor) of gene expression measurements into a set of latent components. We identify sparse gene networks, which can then be tested for association against genetic variation genome-wide. We apply our method to a dataset of 845 individuals from the TwinsUK cohort with gene expression measured via RNA sequencing in adipose, LCLs and skin. We uncover several gene networks with a genetic basis and clear biological and statistical significance. Extensions of this approach will allow integration of multi-omic, environmental and phenotypic datasets

    Castration delays epigenetic aging and feminizes DNA methylation at androgen-regulated loci.

    No full text
    In mammals, females generally live longer than males. Nevertheless, the mechanisms underpinning sex-dependent longevity are currently unclear. Epigenetic clocks are powerful biological biomarkers capable of precisely estimating chronological age and identifying novel factors influencing the aging rate using only DNA methylation data. In this study, we developed the first epigenetic clock for domesticated sheep (Ovis aries), which can predict chronological age with a median absolute error of 5.1 months. We have discovered that castrated male sheep have a decelerated aging rate compared to intact males, mediated at least in part by the removal of androgens. Furthermore, we identified several androgen-sensitive CpG dinucleotides that become progressively hypomethylated with age in intact males, but remain stable in castrated males and females. Comparable sex-specific methylation differences in MKLN1 also exist in bat skin and a range of mouse tissues that have high androgen receptor expression, indicating that it may drive androgen-dependent hypomethylation in divergent mammalian species. In characterizing these sites, we identify biologically plausible mechanisms explaining how androgens drive male-accelerated aging
    corecore