19 research outputs found

    KC-SMARTR: An R package for detection of statistically significant aberrations in multi-experiment aCGH data

    Get PDF
    Background: Most approaches used to find recurrent or differential DNA Copy Number Alterations (CNA) in array Comparative Genomic Hybridization (aCGH) data from groups of tumour samples depend on the discretization of the aCGH data to gain, loss or no-change states. This causes loss of valuable biological information in tumour samples, which are frequently heterogeneous. We have previously developed an algorithm, KC-SMART, that bases its estimate of the magnitude of the CNA at a given genomic location on kernel convolution (Klijn et al., 2008). This accounts for the intensity of the probe signal, its local genomic environment and the signal distribution across multiple samples. Results: Here we extend the approach to allow comparative analyses of two groups of samples and introduce the R implementation of these two approaches. The comparative module allows for a supervised analysis to be performed, to enable the identification of regions that are differentially aberrated between two user-defined classes. We analyzed data from a series of B- and T-cell lymphomas and were able to retrieve all positive control regions (VDJ regions) in addition to a number of new regions. A t-test employing segmented data, that we implemented, was also able to locate all the positive control regions and a number of new regions but these regions were highly fragmented. Conclusions: KC-SMARTR offers recurrent CNA and class specific CNA detection, at different genomic scales, in a single package without the need for additional segmentation. It is memory efficient and runs on a wide range of machines. Most importantly, it does not rely on data discretization and therefore maximally exploits the biological information in the aCGH data.MediamaticsElectrical Engineering, Mathematics and Computer Scienc

    Gene Expression Profiles from Formalin Fixed Paraffin Embedded Breast Cancer Tissue Are Largely Comparable to Fresh Frozen Matched Tissue

    Get PDF
    BACKGROUND AND METHODS: Formalin Fixed Paraffin Embedded (FFPE) samples represent a valuable resource for cancer research. However, the discovery and development of new cancer biomarkers often requires fresh frozen (FF) samples. Recently, the Whole Genome (WG) DASL (cDNA-mediated Annealing, Selection, extension and Ligation) assay was specifically developed to profile FFPE tissue. However, a thorough comparison of data generated from FFPE RNA and Fresh Frozen (FF) RNA using this platform is lacking. To this end we profiled, in duplicate, 20 FFPE tissues and 20 matched FF tissues and evaluated the concordance of the DASL results from FFPE and matched FF material. METHODOLOGY AND PRINCIPAL FINDINGS: We show that after proper normalization, all FFPE and FF pairs exhibit a high level of similarity (Pearson correlation >0.7), significantly larger than the similarity between non-paired samples. Interestingly, the probes showing the highest correlation had a higher percentage G/C content and were enriched for cell cycle genes. Predictions of gene expression signatures developed on frozen material (Intrinsic subtype, Genomic Grade Index, 70 gene signature) showed a high level of concordance between FFPE and FF matched pairs. Interestingly, predictions based on a 60 gene DASL list (best match with the 70 gene signature) showed very high concordance with the MammaPrintÂŽ results. CONCLUSIONS AND SIGNIFICANCE: We demonstrate that data generated from FFPE material with the DASL assay, if properly processed, are comparable to data extracted from the FF counterpart. Specifically, gene expression profiles for a known set of prognostic genes for a specific disease are highly comparable between two conditions. This opens up the possibility of using both FFPE and FF material in gene expressions analyses, leading to a vast increase in the potential resources available for cancer research

    Identifying subgroup markers in heterogeneous populations

    No full text
    Traditional methods that aim to identify biomarkers that distinguish between two groups, like Significance Analysis of Microarrays or the t-test, perform optimally when such biomarkers show homogeneous behavior within each group and differential behavior between the groups. However, in many applications, this is not the case. Instead, a subgroup of samples in one group shows differential behavior with respect to all other samples. To successfully detect markers showing such imbalanced patterns of differential signal, a different approach is required. We propose a novel method, specifically designed for the Detection of Imbalanced Differential Signal (DIDS). We use an artificial dataset and a human breast cancer dataset to measure its performance and compare it with three traditional methods and four approaches that take imbalanced signal into account. Supported by extensive experimental results, we show that DIDS outperforms all other approaches in terms of power and positive predictive value. In a mouse breast cancer dataset, DIDS is the only approach that detects a functionally validated marker of chemotherapy resistance. DIDS can be applied to any continuous value data, including gene expression data, and in any context where imbalanced differential signal is manifested.Intelligent SystemsElectrical Engineering, Mathematics and Computer Scienc

    The AUC scores for the best performing predictors on each subtype.

    No full text
    <p>AUCs for the (A) HER2 positive subtype; (B) Luminal subtype; (C) Triple negative subtype and (D) HER2 positive and ER negative subtype. The red bars represent the clinical predictors, blue bars the expression based predictors and darker colors represent non-subtype specific predictors. When two boxplots are connected with a u-shaped line, the means of the AUC distributions are significantly different for the experiment represented by the boxplots (two-sided t-test, p<0.05, Bonferroni multiple testing corrected.)</p

    Cartoon of the double loop cross-validation scheme.

    No full text
    <p>Our analysis employed a double look cross-validation. The inner loop determines the optimal number of features to be used by a specific combination of feature selection and classifier, here depicted by the green block. This inner loop uses 2/3 of all data (i.e. the training data), the remaining 1/3 is employed to measure the performance of the trained classifier (i.e. a 3 fold cross-validation setup). The outer loop is repeated 15 times in order to get an average AUC for each predictor.</p

    Characteristics of the optimal predictors for the different subtypes.

    No full text
    <p>In each cell the optimal combination of classifier, and feature selection method, is shown.</p><p><b>Legend:</b><b>classifiers:</b> NB = Naive Bayes, NM = Nearest Mean, LREG: Logistic regression, SVM = Support vector machine, 3NN = 3-Nearest Neighbor; <b>Feature selection methods:</b> CFS = Correlated feature selection, WMW = Wilcoxon-Mann-Whitney, BWR = Ratio between to within class sum of squares, WMW-uncor. = Wilcoxon-Mann-Whitney where correlated features are removed, Inf.gain = information gain.</p

    Distribution of samples in the subgroups.

    No full text
    <p>The sample sizes that are depicted are from the expression based predictors. The sample sizes for the clinical predictors are a bit lower due to missing data and can be found in <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0088551#pone.0088551.s004" target="_blank">Table S1</a>.</p><p>*The HER2 positive, ER positive group was not included in the analysis due to the small sample size.</p
    corecore