67 research outputs found

    DiffVar: a new method for detecting differential variability with application to methylation in cancer and aging

    Get PDF
    Methylation of DNA is known to be essential to development and dramatically altered in cancers. The Illumina HumanMethylation450 BeadChip has been used extensively as a cost-effective way to profile nearly half a million CpG sites across the human genome. Here we present DiffVar, a novel method to test for differential variability between sample groups. DiffVar employs an empirical Bayes model framework that can take into account any experimental design and is robust to outliers. We applied DiffVar to several datasets from The Cancer Genome Atlas, as well as an aging dataset. DiffVar is available in the missMethyl Bioconductor R package

    A cross-package Bioconductor workflow for analysing methylation array data [version 1; referees: 3 approved, 1 approved with reservations]

    Get PDF
    Methylation in the human genome is known to be associated with development and disease. The Illumina Infinium methylation arrays are by far the most common way to interrogate methylation across the human genome. This paper provides a Bioconductor workflow using multiple packages for the analysis of methylation array data. Specifically, we demonstrate the steps involved in a typical differential methylation analysis pipeline including: quality control, filtering, normalization, data exploration and statistical testing for probe-wise differential methylation. We further outline other analyses such as differential methylation of regions, differential variability analysis, estimating cell type composition and gene ontology testing. Finally, we provide some examples of how to visualise methylation array data

    limma powers differential expression analyses for RNA-sequencing and microarray studies

    Get PDF
    limma is an R/Bioconductor software package that provides an integrated solution for analysing data from gene expression experiments. It contains rich features for handling complex experimental designs and for information borrowing to overcome the problem of small sample sizes. Over the past decade, limma has been a popular choice for gene discovery through differential expression analyses of microarray and high-throughput PCR data. The package contains particularly strong facilities for reading, normalizing and exploring such data. Recently, the capabilities of limma have been significantly expanded in two important directions. First, the package can now perform both differential expression and differential splicing analyses of RNA sequencing (RNA-seq) data. All the downstream analysis tools previously restricted to microarray data are now available for RNA-seq as well. These capabilities allow users to analyse both RNA-seq and microarray data with very similar pipelines. Second, the package is now able to go past the traditional gene-wise expression analyses in a variety of ways, analysing expression profiles in terms of co-regulated sets of genes or in terms of higher-order expression signatures. This provides enhanced possibilities for biological interpretation of gene expression differences. This article reviews the philosophy and design of the limma package, summarizing both new and historical features, with an emphasis on recent enhancements and features that have not been previously describe

    Molecular dissection of the pea shoot apical meristem*

    Get PDF
    The shoot apical meristem (SAM) is responsible for the development of all the above-ground parts of a plant. Our understanding of the SAM at the molecular level is incomplete. This study investigates the gene expression repertoire of SAMs in the garden pea (Pisum sativum). To this end, 10 346 EST sequences representing 7610 unique genes were generated from SAM cDNA libraries. These sequences, together with previously reported pea ESTs, were used to construct a 12K oligonucleotide array to identify genes with differential SAM expression, as compared to axillary meristems, root apical meristems, or non-meristematic tissues. A number of genes were identified, predominantly expressed in specific cell layers or domains of the SAM and thus are likely components of the gene networks involved in stem cell maintenance or the initiation of lateral organs. Further in situ hybridization analysis confirmed the spatial localization of some of these genes within the SAM. Our data also indicate the diversification of some gene expression patterns and hence functions in legume crop plants. A number of transcripts highly expressed in all three meristems have also been uncovered and these candidates may provide valuable insight into molecular networks that underpin the maintenance of meristematic functionality

    Empirical bayes modelling of expression profiles and their associations

    No full text
    © 2013 Dr. Belinda PhipsonNew biotechnology developments such as the microarray, and more recently, next generation sequencing, have necessitated the need for new statistical methodologies to be developed. These methods are designed to combat unique issues present in the data generated by these technologies. They provide the perfect environment for information sharing strategies, such as empirical Bayes methods, due to the large numbers of simulataneous tests performed. We explore different estimators of the proportion of true null hypotheses and develop a fast and accurate estimator which is valid for any number of p-values. This estimator is based on local false discovery rates and is used in several of the proceeding sections. Another interest is in developing robust hyper-parameter estimators in an empirical Bayes hierarchical model setting. An estimator for the prior degrees of freedom which is robust to outliers is developed using two different approaches. This has the effect that highly variable genes are unlikely to be significantly differentially expressed, as well as increasing power to detect differential expression. The second half of the thesis focuses on gaining more information from the log fold changes obtained from microarray and sequencing experiments. More accurate log fold changes are developed for microarrays and RNA sequencing data, which provide additional information for ranking top differentially expressed genes. The new measure, called predictive log fold change, arises from the posterior distribution of the log fold changes. The relationship between two gene expression profiles is quantified when the p-values obtained from testing two hypotheses are not independent. This arises when two genotypes are compared to a common control group. The method is based on separating the true biological correlation from the technical correlation of the log fold changes. The hyperparameters of the prior distribution for the log fold changes need to be estimated in order to get an estimate of the biological correlation. This is possible since we show that the two dependent moderated t statistics have a scaled multivariate t distribution. The methods developed in this thesis are tested using simulations and applied to data sets collected in collaboration with biologists at The Walter and Eliza Hall Institute of Medical Research

    RandBioconductor.pptx

    No full text
    This is a talk for a Bioinformatics Miniconference at the Linux Conference Australia 2016 in Geelong, Australia. The talk is about R and Bioconductor and the importance of open source software to the Bioinformatics community. <br
    corecore