3 research outputs found

    Unsupervised Extraction of Stable Expression Signatures from Public Compendia with an Ensemble of Neural Networks

    Get PDF
    Cross-experiment comparisons in public data compendia are challenged by unmatched conditions and technical noise. The ADAGE method, which performs unsupervised integration with denoising autoencoder neural networks, can identify biological patterns, but because ADAGE models, like many neural networks, are over-parameterized, different ADAGE models perform equally well. To enhance model robustness and better build signatures consistent with biological pathways, we developed an ensemble ADAGE (eADAGE) that integrated stable signatures across models. We applied eADAGE to a compendium of Pseudomonas aeruginosa gene expression profiling experiments performed in 78 media. eADAGE revealed a phosphate starvation response controlled by PhoB in media with moderate phosphate and predicted that a second stimulus provided by the sensor kinase, KinB, is required for this PhoB activation. We validated this relationship using both targeted and unbiased genetic approaches. eADAGE, which captures stable biological patterns, enables cross-experiment comparisons that can highlight measured but undiscovered relationships.Gordon and Betty Moore Foundation (GBMF 4552)National Institutes of Health (U.S.) (grant R01-AI091702)Cystic Fibrosis Foundation (STANTO15R0

    Optimisation and parallelisation of the partitioning around medoids function in R

    No full text
    R is a free statistical programming language commonly used for the analysis of high-throughput microarray and other data. It is currently unable to easily utilise multi processor architectures without substantial changes to existing R scripts. Further, working with large volumes of data often leads to slow processing and even memory allocation faults. A recent survey highlighted clustering algorithms as both computation and data intensive bottlenecks in post-genomic data analyses. These algorithms aim to sort numeric vectors (such as gene expression profiles) into groups by minimising vector distances within groups and maximising them between groups. This paper describes the optimisation and parallelisation of a popular clustering algorithm, partitioning around medoids (PAM), for the Simple Parallel R INTerface (SPRINT). SPRINT allows R users to exploit high performance computing systems without expert knowledge of such systems. This paper reports on a serial optimisation of the original code and a subsequent parallel implementation. The parallel implementation enables the processing of data sets that exceed the available physical memory and can yield, depending on the data set, over 100-fold increase in performance

    Statistical modelling of masked gene regulatory pathway changes across microarray studies of interferon gamma activated macrophages

    Get PDF
    Interferon gamma (IFN-γ) regulation of macrophages plays an essential role in innate immunity and pathogenicity of viral infections by directing large and small genome-wide changes in the transcriptional program of macrophages. Smaller changes at the transcriptional level are difficult to detect but can have profound biological effects, motivating the hypothesis of this thesis that responses of macrophages to immune activation by IFN-γ include small quantitative changes that are masked by noise but represent meaningful transcriptional systems in pathways against infection. To test this hypothesis, statistical meta-analysis of microarray studies is investigated as a tool to obtain the necessary increase in analysis sensitivity. Three meta-analysis models (Effect size model, Rank Product model, Fisher’s sum of logs) and three further modified versions were applied to a heterogeneous set of four microarray studies on the effect of IFN-γ on murine macrophages. Performance assessments include recovery of known biology and are followed by development of novel biological hypotheses through secondary analysis of meta-analysis outcomes in context of independent biological data sources. A separate network analysis of a microarray time course study investigate s if gene sets with coordinated time-dependent relationships overlap can also identify subtle IFN-γ related transcriptional changes in macrophages that match those identified through meta-analysis. It was found that all meta-analysis models can identify biologically meaningful transcription at enhanced sensitivity levels, with slightly improved performance advantages for a non-parametric model (Rank Product meta-analysis). Meta-analysis yielded consistently regulated genes, hidden in individual microarray studies, related to sterol biosynthesis (Stard3, Pgrmc1, Galnt6, Rab11a, Golga4, Lrp10), implicated in cross-talk between type II and type I interferon or IL-10 signalling (Tbk1, Ikbke, Clic4, Ptpre, Batf), and circadian rhythm (Csnk1e). Further network analysis confirms that meta-analysis findings are highly concentrated in a distinct immune response cluster of co-expressed genes, and also identifies global expression modularisation in IFN-γ treated macrophages, pointing to Trafd1 as a central anti-correlated node topologically linked to interactions with down-regulated sterol biosynthesis pathway members. Outcomes from this thesis suggest that small transcriptional changes in IFN-γ activated macrophages can be detected by enhancing sensitivity through combination of multiple microarray studies. Together with use of bioinformatical resources, independent data sets and network analysis, further validation assigns a potential role for low or variable transcription genes in linking type II interferon signalling to type I and TLR signalling, as well as the sterol metabolic network
    corecore