1,731 research outputs found

    Application of Volcano Plots in Analyses of mRNA Differential Expressions with Microarrays

    Full text link
    Volcano plot displays unstandardized signal (e.g. log-fold-change) against noise-adjusted/standardized signal (e.g. t-statistic or -log10(p-value) from the t test). We review the basic and an interactive use of the volcano plot, and its crucial role in understanding the regularized t-statistic. The joint filtering gene selection criterion based on regularized statistics has a curved discriminant line in the volcano plot, as compared to the two perpendicular lines for the "double filtering" criterion. This review attempts to provide an unifying framework for discussions on alternative measures of differential expression, improved methods for estimating variance, and visual display of a microarray analysis result. We also discuss the possibility to apply volcano plots to other fields beyond microarray.Comment: 8 figure

    Removing Batch Effects in Analysis of Expression Microarray Data: An Evaluation of Six Batch Adjustment Methods

    Get PDF
    The expression microarray is a frequently used approach to study gene expression on a genome-wide scale. However, the data produced by the thousands of microarray studies published annually are confounded by “batch effects,” the systematic error introduced when samples are processed in multiple batches. Although batch effects can be reduced by careful experimental design, they cannot be eliminated unless the whole study is done in a single batch. A number of programs are now available to adjust microarray data for batch effects prior to analysis. We systematically evaluated six of these programs using multiple measures of precision, accuracy and overall performance. ComBat, an Empirical Bayes method, outperformed the other five programs by most metrics. We also showed that it is essential to standardize expression data at the probe level when testing for correlation of expression profiles, due to a sizeable probe effect in microarray data that can inflate the correlation among replicates and unrelated samples

    Differential expression analysis with global network adjustment

    Get PDF
    <p>Background: Large-scale chromosomal deletions or other non-specific perturbations of the transcriptome can alter the expression of hundreds or thousands of genes, and it is of biological interest to understand which genes are most profoundly affected. We present a method for predicting a gene’s expression as a function of other genes thereby accounting for the effect of transcriptional regulation that confounds the identification of genes differentially expressed relative to a regulatory network. The challenge in constructing such models is that the number of possible regulator transcripts within a global network is on the order of thousands, and the number of biological samples is typically on the order of 10. Nevertheless, there are large gene expression databases that can be used to construct networks that could be helpful in modeling transcriptional regulation in smaller experiments.</p> <p>Results: We demonstrate a type of penalized regression model that can be estimated from large gene expression databases, and then applied to smaller experiments. The ridge parameter is selected by minimizing the cross-validation error of the predictions in the independent out-sample. This tends to increase the model stability and leads to a much greater degree of parameter shrinkage, but the resulting biased estimation is mitigated by a second round of regression. Nevertheless, the proposed computationally efficient “over-shrinkage” method outperforms previously used LASSO-based techniques. In two independent datasets, we find that the median proportion of explained variability in expression is approximately 25%, and this results in a substantial increase in the signal-to-noise ratio allowing more powerful inferences on differential gene expression leading to biologically intuitive findings. We also show that a large proportion of gene dependencies are conditional on the biological state, which would be impossible with standard differential expression methods.</p> <p>Conclusions: By adjusting for the effects of the global network on individual genes, both the sensitivity and reliability of differential expression measures are greatly improved.</p&gt

    META-analysis of microarray data to assess gender biased differential gene expression in hepatic tissue

    Get PDF
    Hepatocellular carcinoma (HCC) is the second deadliest cancer globally, and with an estimated 782,000 new cases in 2012, it is the fifth most common cancer in men and ninth in women. HCC is of particular concern in Egypt because of the high prevalence of Hepatitis C Virus (HCV). Due to its poor prognosis, HCC is the leading cause of cancer-related deaths in Egypt. A gender disparity is observed in liver cancer cases, with higher prevalence in men by three to five fold. This sex bias is even more pronounced in mouse models of HCC, which was found to be sex hormone-dependent. Some studies have attempted to elucidate the molecular mechanisms of this disparity; but with inconclusive and sometimes contradicting outcomes, they remain largely unresolved. Understanding the natural protective mechanisms in females would allow for the development of preventative and therapeutic strategies for patients at risk for HCC or already inflicted with the disease. In this study, we applied a meta-analysis approach on already available microarray data from human normal liver tissues to identify differentially expressed genes between males and females. Microarray datasets were downloaded from the Gene Expression Omnibus database, Robust Multiarray Average pre-processed and analyzed for differential expression. The combination of 2 distinct datasets and analysis using a p-value cut-off of 0.05 and fold change cut-off of 2 revealed male up-regulated genes including RPS4Y1, EIF1AY, CYorf15B, UTY, DDX3Y and USP9Y. Female up-regulated genes included XIST, PNPLA4 and PZP. Our results confirm gender-specific differential expression patterns found in other tissues and call for further investigation using a larger sample size and more sensitive approaches such as RNA-Sequencing and, targeted protein-level studies

    CARMAweb: comprehensive R- and bioconductor-based web service for microarray data analysis

    Get PDF
    CARMAweb (Comprehensive R-based Microarray Analysis web service) is a web application designed for the analysis of microarray data. CARMAweb performs data preprocessing (background correction, quality control and normalization), detection of differentially expressed genes, cluster analysis, dimension reduction and visualization, classification, and Gene Ontology-term analysis. This web application accepts raw data from a variety of imaging software tools for the most widely used microarray platforms: Affymetrix GeneChips, spotted two-color microarrays and Applied Biosystems (ABI) microarrays. R and packages from the Bioconductor project are used as an analytical engine in combination with the R function Sweave, which allows automatic generation of analysis reports. These report files contain all R commands used to perform the analysis and guarantee therefore a maximum transparency and reproducibility for each analysis. The web application is implemented in Java based on the latest J2EE (Java 2 Enterprise Edition) software technology. CARMAweb is freely available at

    Tellipsoid: Exploiting inter-gene correlation for improved detection of differential gene expression

    Full text link
    Motivation: Algorithms for differential analysis of microarray data are vital to modern biomedical research. Their accuracy strongly depends on effective treatment of inter-gene correlation. Correlation is ordinarily accounted for in terms of its effect on significance cut-offs. In this paper it is shown that correlation can, in fact, be exploited {to share information across tests}, which, in turn, can increase statistical power. Results: Vastly and demonstrably improved differential analysis approaches are the result of combining identifiability (the fact that in most microarray data sets, a large proportion of genes can be identified a priori as non-differential) with optimization criteria that incorporate correlation. As a special case, we develop a method which builds upon the widely used two-sample t-statistic based approach and uses the Mahalanobis distance as an optimality criterion. Results on the prostate cancer data of Singh et al. (2002) suggest that the proposed method outperforms all published approaches in terms of statistical power. Availability: The proposed algorithm is implemented in MATLAB and in R. The software, called Tellipsoid, and relevant data sets are available at http://www.egr.msu.edu/~desaikeyComment: 19 pages, Submitted to Bioinformatic

    Microarray Analysis in Drug Discovery and Biomarker Identification

    Get PDF

    Normal uniform mixture differential gene expression detection for cDNA microarrays

    Get PDF
    BACKGROUND: One of the primary tasks in analysing gene expression data is finding genes that are differentially expressed in different samples. Multiple testing issues due to the thousands of tests run make some of the more popular methods for doing this problematic. RESULTS: We propose a simple method, Normal Uniform Differential Gene Expression (NUDGE) detection for finding differentially expressed genes in cDNA microarrays. The method uses a simple univariate normal-uniform mixture model, in combination with new normalization methods for spread as well as mean that extend the lowess normalization of Dudoit, Yang, Callow and Speed (2002) [1]. It takes account of multiple testing, and gives probabilities of differential expression as part of its output. It can be applied to either single-slide or replicated experiments, and it is very fast. Three datasets are analyzed using NUDGE, and the results are compared to those given by other popular methods: unadjusted and Bonferroni-adjusted t tests, Significance Analysis of Microarrays (SAM), and Empirical Bayes for microarrays (EBarrays) with both Gamma-Gamma and Lognormal-Normal models. CONCLUSION: The method gives a high probability of differential expression to genes known/suspected a priori to be differentially expressed and a low probability to the others. In terms of known false positives and false negatives, the method outperforms all multiple-replicate methods except for the Gamma-Gamma EBarrays method to which it offers comparable results with the added advantages of greater simplicity, speed, fewer assumptions and applicability to the single replicate case. An R package called nudge to implement the methods in this paper will be made available soon at
    • 

    corecore