49 research outputs found

    Adjusted Measures for Feature Selection Stability for Data Sets with Similar Features

    Full text link
    For data sets with similar features, for example highly correlated features, most existing stability measures behave in an undesired way: They consider features that are almost identical but have different identifiers as different features. Existing adjusted stability measures, that is, stability measures that take into account the similarities between features, have major theoretical drawbacks. We introduce new adjusted stability measures that overcome these drawbacks. We compare them to each other and to existing stability measures based on both artificial and real sets of selected features. Based on the results, we suggest using one new stability measure that considers highly similar features as exchangeable

    Validation of ZAP-70 methylation and its relative significance in predicting outcome in chronic lymphocytic leukemia

    Get PDF
    ZAP-70 methylation 223 nucleotides downstream of transcription start (CpG+223) predicts outcome in chronic lymphocytic leukemia (CLL), but its impact relative to CD38 and ZAP-70 expression or immunoglobulin heavy chain variable region (IGHV) status is uncertain. Additionally, standardizing ZAP-70 expression analysis has been unsuccessful. CpG+223 methylation was quantitatively determined in 295 untreated CLL cases using MassARRAY. Impact on clinical outcome vs CD38 and ZAP-70 expression and IGHV status was evaluated. Cases with low methylation (0.90). Thus, ZAP-70 CpG+223 methylation represents a superior biomarker for TT and OS that can be feasibly measured, supporting its use in risk-stratifying CLL

    Predictive value of DNA methylation patterns in AML patients treated with an azacytidine containing induction regimen

    Get PDF
    BACKGROUND: Acute myeloid leukemia (AML) is a heterogeneous disease with a poor prognosis. Dysregulation of the epigenetic machinery is a significant contributor to disease development. Some AML patients benefit from treatment with hypomethylating agents (HMAs), but no predictive biomarkers for therapy response exist. Here, we investigated whether unbiased genome-wide assessment of pre-treatment DNA-methylation profiles in AML bone marrow blasts can help to identify patients who will achieve a remission after an azacytidine-containing induction regimen. RESULTS: A total of n = 155 patients with newly diagnosed AML treated in the AMLSG 12-09 trial were randomly assigned to a screening and a refinement and validation cohort. The cohorts were divided according to azacytidine-containing induction regimens and response status. Methylation status was assessed for 664,227 500-bp-regions using methyl-CpG immunoprecipitation-seq, resulting in 1755 differentially methylated regions (DMRs). Top regions were distilled and included genes such as WNT10A and GATA3. 80% of regions identified as a hit were represented on HumanMethlyation 450k Bead Chips. Quantitative methylation analysis confirmed 90% of these regions (36 of 40 DMRs). A classifier was trained using penalized logistic regression and fivefold cross validation containing 17 CpGs. Validation based on mass spectra generated by MALDI-TOF failed (AUC 0.59). However, discriminative ability was maintained by adding neighboring CpGs. A recomposed classifier with 12 CpGs resulted in an AUC of 0.77. When evaluated in the non-azacytidine containing group, the AUC was 0.76. CONCLUSIONS: Our analysis evaluated the value of a whole genome methyl-CpG screening assay for the identification of informative methylation changes. We also compared the informative content and discriminatory power of regions and single CpGs for predicting response to therapy. The relevance of the identified DMRs is supported by their association with key regulatory processes of oncogenic transformation and support the idea of relevant DMRs being enriched at distinct loci rather than evenly distribution across the genome. Predictive response to therapy could be established but lacked specificity for treatment with azacytidine. Our results suggest that a predictive epigenotype carries its methylation information at a complex, genome-wide level, that is confined to regions, rather than to single CpGs. With increasing application of combinatorial regimens, response prediction may become even more complicated

    Assessment and optimisation of normalisation methods for dual-colour antibody microarrays

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Recent advances in antibody microarray technology have made it possible to measure the expression of hundreds of proteins simultaneously in a competitive dual-colour approach similar to dual-colour gene expression microarrays. Thus, the established normalisation methods for gene expression microarrays, e.g. loess regression, can in principle be applied to protein microarrays. However, the typical assumptions of such normalisation methods might be violated due to a bias in the selection of the proteins to be measured. Due to high costs and limited availability of high quality antibodies, the current arrays usually focus on a high proportion of regulated targets. Housekeeping features could be used to circumvent this problem, but they are typically underrepresented on protein arrays. Therefore, it might be beneficial to select invariant features among the features already represented on available arrays for normalisation by a dedicated selection algorithm.</p> <p>Results</p> <p>We compare the performance of several normalisation methods that have been established for dual-colour gene expression microarrays. The focus is on an invariant selection algorithm, for which effective improvements are proposed. In a simulation study the performances of the different normalisation methods are compared with respect to their impact on the ability to correctly detect differentially expressed features. Furthermore, we apply the different normalisation methods to a pancreatic cancer data set to assess the impact on the classification power.</p> <p>Conclusions</p> <p>The simulation study and the data application demonstrate the superior performance of the improved invariant selection algorithms in comparison to other normalisation methods, especially in situations where the assumptions of the usual global loess normalisation are violated.</p

    Effect of training-sample size and classification difficulty on the accuracy of genomic predictors

    Get PDF
    Introduction: As part of the MicroArray Quality Control (MAQC)-II project, this analysis examines how the choice of univariate feature-selection methods and classification algorithms may influence the performance of genomic predictors under varying degrees of prediction difficulty represented by three clinically relevant endpoints. Methods: We used gene-expression data from 230 breast cancers (grouped into training and independent validation sets), and we examined 40 predictors (five univariate feature-selection methods combined with eight different classifiers) for each of the three endpoints. Their classification performance was estimated on the training set by using two different resampling methods and compared with the accuracy observed in the independent validation set. Results: A ranking of the three classification problems was obtained, and the performance of 120 models was estimated and assessed on an independent validation set. The bootstrapping estimates were closer to the validation performance than were the cross-validation estimates. The required sample size for each endpoint was estimated, and both gene-level and pathway-level analyses were performed on the obtained models. Conclusions: We showed that genomic predictor accuracy is determined largely by an interplay between sample size and classification difficulty. Variations on univariate feature-selection methods and choice of classification algorithm have only a modest impact on predictor performance, and several statistically equally good predictors can be developed for any given classification problem

    Effect of Size and Heterogeneity of Samples on Biomarker Discovery: Synthetic and Real Data Assessment

    Get PDF
    MOTIVATION: The identification of robust lists of molecular biomarkers related to a disease is a fundamental step for early diagnosis and treatment. However, methodologies for the discovery of biomarkers using microarray data often provide results with limited overlap. These differences are imputable to 1) dataset size (few subjects with respect to the number of features); 2) heterogeneity of the disease; 3) heterogeneity of experimental protocols and computational pipelines employed in the analysis. In this paper, we focus on the first two issues and assess, both on simulated (through an in silico regulation network model) and real clinical datasets, the consistency of candidate biomarkers provided by a number of different methods. METHODS: We extensively simulated the effect of heterogeneity characteristic of complex diseases on different sets of microarray data. Heterogeneity was reproduced by simulating both intrinsic variability of the population and the alteration of regulatory mechanisms. Population variability was simulated by modeling evolution of a pool of subjects; then, a subset of them underwent alterations in regulatory mechanisms so as to mimic the disease state. RESULTS: The simulated data allowed us to outline advantages and drawbacks of different methods across multiple studies and varying number of samples and to evaluate precision of feature selection on a benchmark with known biomarkers. Although comparable classification accuracy was reached by different methods, the use of external cross-validation loops is helpful in finding features with a higher degree of precision and stability. Application to real data confirmed these results

    Community assessment to advance computational prediction of cancer drug combinations in a pharmacogenomic screen

    Get PDF
    The effectiveness of most cancer targeted therapies is short-lived. Tumors often develop resistance that might be overcome with drug combinations. However, the number of possible combinations is vast, necessitating data-driven approaches to find optimal patient-specific treatments. Here we report AstraZeneca’s large drug combination dataset, consisting of 11,576 experiments from 910 combinations across 85 molecularly characterized cancer cell lines, and results of a DREAM Challenge to evaluate computational strategies for predicting synergistic drug pairs and biomarkers. 160 teams participated to provide a comprehensive methodological development and benchmarking. Winning methods incorporate prior knowledge of drug-target interactions. Synergy is predicted with an accuracy matching biological replicates for >60% of combinations. However, 20% of drug combinations are poorly predicted by all methods. Genomic rationale for synergy predictions are identified, including ADAM17 inhibitor antagonism when combined with PIK3CB/D inhibition contrasting to synergy when combined with other PI3K-pathway inhibitors in PIK3CA mutant cells.Peer reviewe
    corecore