12 research outputs found

    Regularized gene selection in cancer microarray meta-analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In cancer studies, it is common that multiple microarray experiments are conducted to measure the same clinical outcome and expressions of the same set of genes. An important goal of such experiments is to identify a subset of genes that can potentially serve as predictive markers for cancer development and progression. Analyses of individual experiments may lead to unreliable gene selection results because of the small sample sizes. Meta analysis can be used to pool multiple experiments, increase statistical power, and achieve more reliable gene selection. The meta analysis of cancer microarray data is challenging because of the high dimensionality of gene expressions and the differences in experimental settings amongst different experiments.</p> <p>Results</p> <p>We propose a Meta Threshold Gradient Descent Regularization (MTGDR) approach for gene selection in the meta analysis of cancer microarray data. The MTGDR has many advantages over existing approaches. It allows different experiments to have different experimental settings. It can account for the joint effects of multiple genes on cancer, and it can select the same set of cancer-associated genes across multiple experiments. Simulation studies and analyses of multiple pancreatic and liver cancer experiments demonstrate the superior performance of the MTGDR.</p> <p>Conclusion</p> <p>The MTGDR provides an effective way of analyzing multiple cancer microarray studies and selecting reliable cancer-associated genes.</p

    Multi-TGDR: a regularization method for multi-class classification in microarray experiments

    Get PDF
    Background With microarray technology becoming mature and popular, the selection and use of a small number of relevant genes for accurate classification of samples is a hot topic in the circles of biostatistics and bioinformatics. However, most of the developed algorithms lack the ability to handle multiple classes, which arguably a common application. Here, we propose an extension to an existing regularization algorithm called Threshold Gradient Descent Regularization (TGDR) to specifically tackle multi-class classification of microarray data. When there are several microarray experiments addressing the same/similar objectives, one option is to use meta-analysis version of TGDR (Meta-TGDR), which considers the classification task as combination of classifiers with the same structure/model while allowing the parameters to vary across studies. However, the original Meta-TGDR extension did not offer a solution to the prediction on independent samples. Here, we propose an explicit method to estimate the overall coefficients of the biomarkers selected by Meta-TGDR. This extension permits broader applicability and allows a comparison between the predictive performance of Meta-TGDR and TGDR using an independent testing set. Results Using real-world applications, we demonstrated the proposed multi-TGDR framework works well and the number of selected genes is less than the sum of all individualized binary TGDRs. Additionally, Meta-TGDR and TGDR on the batch-effect adjusted pooled data approximately provided same results. By adding Bagging procedure in each application, the stability and good predictive performance are warranted. Conclusions Compared with Meta-TGDR, TGDR is less computing time intensive, and requires no samples of all classes in each study. On the adjusted data, it has approximate same predictive performance with Meta-TGDR. Thus, it is highly recommended

    A Selective Review of Group Selection in High-Dimensional Models

    Full text link
    Grouping structures arise naturally in many statistical modeling problems. Several methods have been proposed for variable selection that respect grouping structure in variables. Examples include the group LASSO and several concave group selection methods. In this article, we give a selective review of group selection concerning methodological developments, theoretical properties and computational algorithms. We pay particular attention to group selection methods involving concave penalties. We address both group selection and bi-level selection methods. We describe several applications of these methods in nonparametric additive models, semiparametric regression, seemingly unrelated regressions, genomic data analysis and genome wide association studies. We also highlight some issues that require further study.Comment: Published in at http://dx.doi.org/10.1214/12-STS392 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Detection of gene pathways with predictive power for breast cancer prognosis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Prognosis is of critical interest in breast cancer research. Biomedical studies suggest that genomic measurements may have independent predictive power for prognosis. Gene profiling studies have been conducted to search for predictive genomic measurements. Genes have the inherent pathway structure, where pathways are composed of multiple genes with coordinated functions. The goal of this study is to identify gene pathways with predictive power for breast cancer prognosis. Since our goal is fundamentally different from that of existing studies, a new pathway analysis method is proposed.</p> <p>Results</p> <p>The new method advances beyond existing alternatives along the following aspects. First, it can assess the predictive power of gene pathways, whereas existing methods tend to focus on model fitting accuracy only. Second, it can account for the joint effects of multiple genes in a pathway, whereas existing methods tend to focus on the marginal effects of genes. Third, it can accommodate multiple heterogeneous datasets, whereas existing methods analyze a single dataset only. We analyze four breast cancer prognosis studies and identify 97 pathways with significant predictive power for prognosis. Important pathways missed by alternative methods are identified.</p> <p>Conclusions</p> <p>The proposed method provides a useful alternative to existing pathway analysis methods. Identified pathways can provide further insights into breast cancer prognosis.</p

    Meta-analysis of archived DNA microarrays identifies genes regulated by hypoxia and involved in a metastatic phenotype in cancer cells

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Metastasis is a major cancer-related cause of death. Recent studies have described metastasis pathways. However, the exact contribution of each pathway remains unclear. Another key feature of a tumor is the presence of hypoxic areas caused by a lack of oxygen at the center of the tumor. Hypoxia leads to the expression of pro-metastatic genes as well as the repression of anti-metastatic genes. As many Affymetrix datasets about metastasis and hypoxia are publicly available and not fully exploited, this study proposes to re-analyze these datasets to extract new information about the metastatic phenotype induced by hypoxia in different cancer cell lines.</p> <p>Methods</p> <p>Affymetrix datasets about metastasis and/or hypoxia were downloaded from GEO and ArrayExpress. AffyProbeMiner and GCRMA packages were used for pre-processing and the Window Welch <it>t </it>test was used for processing. Three approaches of meta-analysis were eventually used for the selection of genes of interest.</p> <p>Results</p> <p>Three complementary approaches were used, that eventually selected 183 genes of interest. Out of these 183 genes, 99, among which the well known <it>JUNB</it>, <it>FOS </it>and <it>TP63</it>, have already been described in the literature to be involved in cancer. Moreover, 39 genes of those, such as <it>SERPINE1 </it>and <it>MMP7</it>, are known to regulate metastasis. Twenty-one genes including <it>VEGFA </it>and <it>ID2 </it>have also been described to be involved in the response to hypoxia. Lastly, DAVID classified those 183 genes in 24 different pathways, among which 8 are directly related to cancer while 5 others are related to proliferation and cell motility. A negative control composed of 183 random genes failed to provide such results. Interestingly, 6 pathways retrieved by DAVID with the 183 genes of interest concern pathogen recognition and phagocytosis.</p> <p>Conclusion</p> <p>The proposed methodology was able to find genes actually known to be involved in cancer, metastasis and hypoxia and, thus, we propose that the other genes selected based on the same methodology are of prime interest in the metastatic phenotype induced by hypoxia.</p

    Consistent Differential Expression Pattern (CDEP) on microarray to identify genes related to metastatic behavior

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>To utilize the large volume of gene expression information generated from different microarray experiments, several meta-analysis techniques have been developed. Despite these efforts, there remain significant challenges to effectively increasing the statistical power and decreasing the Type I error rate while pooling the heterogeneous datasets from public resources. The objective of this study is to develop a novel meta-analysis approach, Consistent Differential Expression Pattern (CDEP), to identify genes with common differential expression patterns across different datasets.</p> <p>Results</p> <p>We combined False Discovery Rate (FDR) estimation and the non-parametric RankProd approach to estimate the Type I error rate in each microarray dataset of the meta-analysis. These Type I error rates from all datasets were then used to identify genes with common differential expression patterns. Our simulation study showed that CDEP achieved higher statistical power and maintained low Type I error rate when compared with two recently proposed meta-analysis approaches. We applied CDEP to analyze microarray data from different laboratories that compared transcription profiles between metastatic and primary cancer of different types. Many genes identified as differentially expressed consistently across different cancer types are in pathways related to metastatic behavior, such as ECM-receptor interaction, focal adhesion, and blood vessel development. We also identified novel genes such as <it>AMIGO2</it>, <it>Gem</it>, and <it>CXCL11 </it>that have not been shown to associate with, but may play roles in, metastasis.</p> <p>Conclusions</p> <p>CDEP is a flexible approach that borrows information from each dataset in a meta-analysis in order to identify genes being differentially expressed consistently. We have shown that CDEP can gain higher statistical power than other existing approaches under a variety of settings considered in the simulation study, suggesting its robustness and insensitivity to data variation commonly associated with microarray experiments.</p> <p><b>Availability</b>: CDEP is implemented in R and freely available at: <url>http://genomebioinfo.musc.edu/CDEP/</url></p> <p><b>Contact</b>: [email protected]</p

    Meta-analysis derived atopic dermatitis (MADAD) transcriptome defines a robust AD signature highlighting the involvement of atherosclerosis and lipid metabolism pathways

    Get PDF
    BACKGROUND: Atopic dermatitis (AD) is a common inflammatory skin disease with limited treatment options. Several microarray experiments have been conducted on lesional/LS and non-lesional/NL AD skin to develop a genomic disease phenotype. Although these experiments have shed light on disease pathology, inter-study comparisons reveal large differences in resulting sets of differentially expressed genes (DEGs), limiting the utility of direct comparisons across studies. METHODS: We carried out a meta-analysis combining 4 published AD datasets to define a robust disease profile, termed meta-analysis derived AD (MADAD) transcriptome. RESULTS: This transcriptome enriches key AD pathways more than the individual studies, and associates AD with novel pathways, such as atherosclerosis signaling (IL-37, selectin E/SELE). We identified wide lipid abnormalities and, for the first time in vivo, correlated Th2 immune activation with downregulation of key epidermal lipids (FA2H, FAR2, ELOVL3), emphasizing the role of cytokines on the barrier disruption in AD. Key AD “classifier genes” discriminate lesional from nonlesional skin, and may evaluate therapeutic responses. CONCLUSIONS: Our meta-analysis provides novel and powerful insights into AD disease pathology, and reinforces the concept of AD as a systemic disease. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12920-015-0133-x) contains supplementary material, which is available to authorized users
    corecore