31 research outputs found
SMAGEXP: a galaxy tool suite for transcriptomics data meta-analysis
Bakground: With the proliferation of available microarray and high throughput
sequencing experiments in the public domain, the use of meta-analysis methods
increases. In these experiments, where the sample size is often limited,
meta-analysis offers the possibility to considerably enhance the statistical
power and give more accurate results. For those purposes, it combines either
effect sizes or results of single studies in a appropriate manner. R packages
metaMA and metaRNASeq perform meta-analysis on microarray and NGS data,
respectively. They are not interchangeable as they rely on statistical modeling
specific to each technology.
Results: SMAGEXP (Statistical Meta-Analysis for Gene EXPression) integrates
metaMA and metaRNAseq packages into Galaxy. We aim to propose a unified way to
carry out meta-analysis of gene expression data, while taking care of their
specificities. We have developed this tool suite to analyse microarray data
from Gene Expression Omnibus (GEO) database or custom data from affymetrix
microarrays. These data are then combined to carry out meta-analysis using
metaMA package. SMAGEXP also offers to combine raw read counts from Next
Generation Sequencing (NGS) experiments using DESeq2 and metaRNASeq package. In
both cases, key values, independent from the technology type, are reported to
judge the quality of the meta-analysis. These tools are available on the Galaxy
main tool shed. Source code, help and installation instructions are available
on github.
Conclusion: The use of Galaxy offers an easy-to-use gene expression
meta-analysis tool suite based on the metaMA and metaRNASeq packages
Differential meta-analysis of RNA-seq data from multiple studies
High-throughput sequencing is now regularly used for studies of the
transcriptome (RNA-seq), particularly for comparisons among experimental
conditions. For the time being, a limited number of biological replicates are
typically considered in such experiments, leading to low detection power for
differential expression. As their cost continues to decrease, it is likely that
additional follow-up studies will be conducted to re-address the same
biological question. We demonstrate how p-value combination techniques
previously used for microarray meta-analyses can be used for the differential
analysis of RNA-seq data from multiple related studies. These techniques are
compared to a negative binomial generalized linear model (GLM) including a
fixed study effect on simulated data and real data on human melanoma cell
lines. The GLM with fixed study effect performed well for low inter-study
variation and small numbers of studies, but was outperformed by the
meta-analysis methods for moderate to large inter-study variability and larger
numbers of studies. To conclude, the p-value combination techniques illustrated
here are a valuable tool to perform differential meta-analyses of RNA-seq data
by appropriately accounting for biological and technical variability within
studies as well as additional study-specific effects. An R package metaRNASeq
is available on the R Forge
Dys-regulated Gene Expression Networks by Meta-Analysis of Microarray Data on Oral Squamous Cell Carcinoma
Background: Oral squamous cell carcinoma (OSCC) is the sixth most common type of carcinoma worldwide. Development of OSCC is a multi-step process involving genes related to cell cycle, growth control, apoptosis, DNA damage response and other cellular regulators. The pathogenic pathways involved in this tumor are mostly unknown and therefore a better characterization of OSCC gene expression profile would represent a considerable advance. The availability of publicly available gene expression datasets has opened up new challenges especially for the integration of data generated by different research groups and different array platforms with the purpose of obtaining new insights on the biological process investigated.

Results: In this work we performed a meta-analysis on four microarray and four datasets of gene expression data on OSCC in order to evaluate the degree of agreement of the biological results obtained by these different studies and to identify common regulatory pathways that could be responsible of tumor growth. Sixteen dys-regulated pathways implicated in OSCC were mined out from the four published datasets, and most importantly three pathways were first reported. Those regulatory pathways and biological processes which are significantly enriched have been investigated by means of literatures and meanwhile, four genes of the maximally altered pathways, ECM-receptor interaction, were validated and identified by qRT-PCR as a possible candidate of aggressiveness of OSCC.

Conclusion: we have developed a robust method for analyzing pathways altered in OSCC using three expression array data sets. This study sets a stage for the further discovery of the basic mechanisms that may underlie a diseased state and would help in identifying critical nodes in the pathway that can be targeted for diagnosis and therapeutic intervention. In addition, those who are interested in our approach can obtain the software package (MATLAB platform) by email freely
yStreX: yeast stress expression database
Over the past decade genome-wide expression analyses have been often used to study how expression of genes changes in response to various environmental stresses. Many of these studies (such as effects of oxygen concentration, temperature stress, low pH stress, osmotic stress, depletion or limitation of nutrients, addition of different chemical compounds, etc.) have been conducted in the unicellular Eukaryal model, yeast Saccharomyces cerevisiae. However, the lack of a unifying or integrated, bioinformatics platformthat would permit efficient and rapid use of all these existing data remain an important issue. To facilitate research by exploiting existing transcription data in the field of yeast physiology, we have developed the yStreX database. It is an online repository of analyzed gene expression data from curated data sets from different studies that capture genome-wide transcriptional changes in response to diverse environmental transitions. The first aim of this online database is to facilitate comparison of cross-platform and cross-laboratory gene expression data. Additionally, we performed different expression analyses, meta-analyses and gene set enrichment analyses; and the results are also deposited in this database. Lastly, we constructed a user-friendly Web interface with interactive visualization to provide intuitive access and to display the queried data for users with no background in bioinformatics. Database URL: http://www.ystrexdb.co
Screening hub genes in coronary artery disease based on integrated analysis
Background: Coronary artery disease (CAD) is the leading cause of mortality worldwide. Identifying key pathogenic genes benefits the understanding molecular mechanism of CAD. Methods: In this study, 5 microarray data sets from the blood sample of 312 CADs and 277 healthy controls were downloaded. Limma and metaMA packages were used to identify differentially expressed genes. The functional enrichment analysis of differentially expressed genes was further performed by Gene Ontology and Kyoto Encyclopedia of Genes and Genomes. Additionally, proteinâprotein interacÂtion and transcript factors-target networks were performed based on top 10 up- and down-regulated differentially expressed genes to further study the biological function. Last, real-time quantitative polyÂmerase chain reaction (RT-qPCR) was used to validate the integrated analysis result. Results: A total of 528 differentially expressed genes were obtained. All differentially expressed genes were significantly involved in signal transduction and the MAPK signaling pathway. Among MAPK signaling pathway, IL1R2, ARRB2 and PRKX were associated with CAD. Furthermore, there were 4 common differentially expressed genes including PLAUR, HSPH1, ZMYND11 and S100A8 in the proteinâprotein interaction and transcript factors-target networks, which played crucial roles in the development of CAD. In quantitative RT-qPCR, the expression of PRKX, HSPH1 and ZMYND11 was down-regulated and consistent with the integrated analysis. Conclusions: Identified 7 differentially expressed genes (IL1R2, ARRB2, PRKX, PLAUR, HSPH1, ZMYND11 and S100A8) may play crucial roles in the development of CAD
Should We Abandon the t-Test in the Analysis of Gene Expression Microarray Data: A Comparison of Variance Modeling Strategies
High-throughput post-genomic studies are now routinely and promisingly investigated in biological and biomedical research. The main statistical approach to select genes differentially expressed between two groups is to apply a t-test, which is subject of criticism in the literature. Numerous alternatives have been developed based on different and innovative variance modeling strategies. However, a critical issue is that selecting a different test usually leads to a different gene list. In this context and given the current tendency to apply the t-test, identifying the most efficient approach in practice remains crucial. To provide elements to answer, we conduct a comparison of eight tests representative of variance modeling strategies in gene expression data: Welch's t-test, ANOVA [1], Wilcoxon's test, SAM [2], RVM [3], limma [4], VarMixt [5] and SMVar [6]. Our comparison process relies on four steps (gene list analysis, simulations, spike-in data and re-sampling) to formulate comprehensive and robust conclusions about test performance, in terms of statistical power, false-positive rate, execution time and ease of use. Our results raise concerns about the ability of some methods to control the expected number of false positives at a desirable level. Besides, two tests (limma and VarMixt) show significant improvement compared to the t-test, in particular to deal with small sample sizes. In addition limma presents several practical advantages, so we advocate its application to analyze gene expression data
Meta-analysis approach as a gene selection method in class prediction: Does it improve model performance? A case study in acute myeloid leukemia
Background: Aggregating gene expression data across experiments via meta-analysis is expected to increase the precision of the effect estimates and to increase the statistical power to detect a certain fold change. This study evaluates the potential benefit of using a meta-analysis approach as a gene selection method prior to predictive modeling in gene expression data. Results: Six raw datasets from different gene expression experiments in acute myeloid leukemia (AML) and 11 different classification methods were used to build classification models to classify samples as either AML or healthy control. First, the classification models were trained on gene expression data from single experiments using conventional supervised variable selection and externally validated with the other five gene expression datasets (referred to as the individual-classification approach). Next, gene selection was performed through meta-analysis on four datasets, and predictive models were trained with the selected genes on the fifth dataset and validated on the sixth dataset. For some datasets, gene selection through meta-analysis helped classification models to achieve higher performance as compared to predictive modeling based on a single dataset; but for others, there was no major improvement. Synthetic datasets were generated from nine simulation scenarios. The effect of sample size, fold change and pairwise correlation between differentially expressed (DE) genes on the difference between MA- and individual-classification model was evaluated. The fold change and pairwise correlation significantly contributed to the difference in performance between the two methods. The gene selection via meta-analysis approach was more effective when it was conducted using a set of data with low fold change and high pairwise correlation on the DE genes. Conclusion: Gene selection through meta-analysis on previously published studies potentially improves the performance of a predictive model on a given gene expression data
Integrated genomic approaches identify major pathways and upstream regulators in late onset Alzheimer's disease.
Previous studies have evaluated gene expression in Alzheimer's disease (AD) brains to identify mechanistic processes, but have been limited by the size of the datasets studied. Here we have implemented a novel meta-analysis approach to identify differentially expressed genes (DEGs) in published datasets comprising 450 late onset AD (LOAD) brains and 212 controls. We found 3124 DEGs, many of which were highly correlated with Braak stage and cerebral atrophy. Pathway Analysis revealed the most perturbed pathways to be (a) nitric oxide and reactive oxygen species in macrophages (NOROS), (b) NFkB and (c) mitochondrial dysfunction. NOROS was also up-regulated, and mitochondrial dysfunction down-regulated, in healthy ageing subjects. Upstream regulator analysis predicted the TLR4 ligands, STAT3 and NFKBIA, for activated pathways and RICTOR for mitochondrial genes. Protein-protein interaction network analysis emphasised the role of NFKB; identified a key interaction of CLU with complement; and linked TYROBP, TREM2 and DOK3 to modulation of LPS signalling through TLR4 and to phosphatidylinositol metabolism. We suggest that NEUROD6, ZCCHC17, PPEF1 and MANBAL are potentially implicated in LOAD, with predicted links to calcium signalling and protein mannosylation. Our study demonstrates a highly injurious combination of TLR4-mediated NFKB signalling, NOROS inflammatory pathway activation, and mitochondrial dysfunction in LOAD
EpigenomeĂą wide association of PTSD from heterogeneous cohorts with a common multiĂą site analysis pipeline
Peer Reviewedhttps://deepblue.lib.umich.edu/bitstream/2027.42/138305/1/ajmgb32568.pdfhttps://deepblue.lib.umich.edu/bitstream/2027.42/138305/2/ajmgb32568_am.pd