5,532 research outputs found

    Asterias: a parallelized web-based suite for the analysis of expression and aCGH data

    Get PDF
    Asterias (\url{http://www.asterias.info}) is an integrated collection of freely-accessible web tools for the analysis of gene expression and aCGH data. Most of the tools use parallel computing (via MPI). Most of our applications allow the user to obtain additional information for user-selected genes by using clickable links in tables and/or figures. Our tools include: normalization of expression and aCGH data; converting between different types of gene/clone and protein identifiers; filtering and imputation; finding differentially expressed genes related to patient class and survival data; searching for models of class prediction; using random forests to search for minimal models for class prediction or for large subsets of genes with predictive capacity; searching for molecular signatures and predictive genes with survival data; detecting regions of genomic DNA gain or loss. The capability to send results between different applications, access to additional functional information, and parallelized computation make our suite unique and exploit features only available to web-based applications.Comment: web based application; 3 figure

    Can Survival Prediction Be Improved By Merging Gene Expression Data Sets?

    Get PDF
    BACKGROUND:High-throughput gene expression profiling technologies generating a wealth of data, are increasingly used for characterization of tumor biopsies for clinical trials. By applying machine learning algorithms to such clinically documented data sets, one hopes to improve tumor diagnosis, prognosis, as well as prediction of treatment response. However, the limited number of patients enrolled in a single trial study limits the power of machine learning approaches due to over-fitting. One could partially overcome this limitation by merging data from different studies. Nevertheless, such data sets differ from each other with regard to technical biases, patient selection criteria and follow-up treatment. It is therefore not clear at all whether the advantage of increased sample size outweighs the disadvantage of higher heterogeneity of merged data sets. Here, we present a systematic study to answer this question specifically for breast cancer data sets. We use survival prediction based on Cox regression as an assay to measure the added value of merged data sets. RESULTS:Using time-dependent Receiver Operating Characteristic-Area Under the Curve (ROC-AUC) and hazard ratio as performance measures, we see in overall no significant improvement or deterioration of survival prediction with merged data sets as compared to individual data sets. This apparently was due to the fact that a few genes with strong prognostic power were not available on all microarray platforms and thus were not retained in the merged data sets. Surprisingly, we found that the overall best performance was achieved with a single-gene predictor consisting of CYB5D1. CONCLUSIONS:Merging did not deteriorate performance on average despite (a) The diversity of microarray platforms used. (b) The heterogeneity of patients cohorts. (c) The heterogeneity of breast cancer disease. (d) Substantial variation of time to death or relapse. (e) The reduced number of genes in the merged data sets. Predictors derived from the merged data sets were more robust, consistent and reproducible across microarray platforms. Moreover, merging data sets from different studies helps to better understand the biases of individual studies and can lead to the identification of strong survival factors like CYB5D1 expression

    Integrative Model-based clustering of microarray methylation and expression data

    Full text link
    In many fields, researchers are interested in large and complex biological processes. Two important examples are gene expression and DNA methylation in genetics. One key problem is to identify aberrant patterns of these processes and discover biologically distinct groups. In this article we develop a model-based method for clustering such data. The basis of our method involves the construction of a likelihood for any given partition of the subjects. We introduce cluster specific latent indicators that, along with some standard assumptions, impose a specific mixture distribution on each cluster. Estimation is carried out using the EM algorithm. The methods extend naturally to multiple data types of a similar nature, which leads to an integrated analysis over multiple data platforms, resulting in higher discriminating power.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS533 the Annals of Applied Statistics (http://www.imstat.org/aoas/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Cyclin D1-mediated microRNA expression signature predicts breast cancer outcome

    Get PDF
    Background: Genetic classification of breast cancer based on the coding mRNA suggests the evolution of distinct subtypes. Whether the non-coding genome is altered concordantly with the coding genome and the mechanism by which the cell cycle directly controls the non-coding genome is poorly understood. Methods: Herein, the miRNA signature maintained by endogenous cyclin D1 in human breast cancer cells was defined. In order to determine the clinical significance of the cyclin D1-mediated miRNA signature, we defined a miRNA expression superset from 459 breast cancer samples. We compared the coding and non-coding genome of breast cancer subtypes. Results: Hierarchical clustering of human breast cancers defined four distinct miRNA clusters (G1-G4) associated with distinguishable relapse-free survival by Kaplan-Meier analysis. The cyclin D1-regulated miRNA signature included several oncomirs, was conserved in multiple breast cancer cell lines, was associated with the G2 tumor miRNA cluster, ERα+ status, better outcome and activation of the Wnt pathway. The coding and non-coding genome were discordant within breast cancer subtypes. Seed elements for cyclin D1-regulated miRNA were identified in 63 genes of the Wnt signaling pathway including DKK. Cyclin D1 restrained DKK1 via the 3\u27UTR. In vivo studies using inducible transgenics confirmed cyclin D1 induces Wnt-dependent gene expression. Conclusion: The non-coding genome defines breast cancer subtypes that are discordant with their coding genome subtype suggesting distinct evolutionary drivers within the tumors. Cyclin D1 orchestrates expression of a miRNA signature that induces Wnt/β-catenin signaling, therefore cyclin D1 serves both upstream and downstream of Wnt/β-catenin signaling

    Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages

    Get PDF
    BACKGROUND: With an abundant amount of microarray gene expression data sets available through public repositories, new possibilities lie in combining multiple existing data sets. In this new context, analysis itself is no longer the problem, but retrieving and consistently integrating all this data before delivering it to the wide variety of existing analysis tools becomes the new bottleneck. RESULTS: We present the newly released inSilicoMerging R/Bioconductor package which, together with the earlier released inSilicoDb R/Bioconductor package, allows consistent retrieval, integration and analysis of publicly available microarray gene expression data sets. Inside the inSilicoMerging package a set of five visual and six quantitative validation measures are available as well. CONCLUSIONS: By providing (i) access to uniformly curated and preprocessed data, (ii) a collection of techniques to remove the batch effects between data sets from different sources, and (iii) several validation tools enabling the inspection of the integration process, these packages enable researchers to fully explore the potential of combining gene expression data for downstream analysis. The power of using both packages is demonstrated by programmatically retrieving and integrating gene expression studies from the InSilico DB repository [https://insilicodb.org/app/]

    A Toolbox for Functional Analysis and the Systematic Identification of Diagnostic and Prognostic Gene Expression Signatures Combining Meta-Analysis and Machine Learning

    Get PDF
    The identification of biomarker signatures is important for cancer diagnosis and prognosis. However, the detection of clinical reliable signatures is influenced by limited data availability, which may restrict statistical power. Moreover, methods for integration of large sample cohorts and signature identification are limited. We present a step-by-step computational protocol for functional gene expression analysis and the identification of diagnostic and prognostic signatures by combining meta-analysis with machine learning and survival analysis. The novelty of the toolbox lies in its all-in-one functionality, generic design, and modularity. It is exemplified for lung cancer, including a comprehensive evaluation using different validation strategies. However, the protocol is not restricted to specific disease types and can therefore be used by a broad community. The accompanying R package vignette runs in ~1 h and describes the workflow in detail for use by researchers with limited bioinformatics training

    Computational Models for Transplant Biomarker Discovery.

    Get PDF
    Translational medicine offers a rich promise for improved diagnostics and drug discovery for biomedical research in the field of transplantation, where continued unmet diagnostic and therapeutic needs persist. Current advent of genomics and proteomics profiling called "omics" provides new resources to develop novel biomarkers for clinical routine. Establishing such a marker system heavily depends on appropriate applications of computational algorithms and software, which are basically based on mathematical theories and models. Understanding these theories would help to apply appropriate algorithms to ensure biomarker systems successful. Here, we review the key advances in theories and mathematical models relevant to transplant biomarker developments. Advantages and limitations inherent inside these models are discussed. The principles of key -computational approaches for selecting efficiently the best subset of biomarkers from high--dimensional omics data are highlighted. Prediction models are also introduced, and the integration of multi-microarray data is also discussed. Appreciating these key advances would help to accelerate the development of clinically reliable biomarker systems
    corecore