2 research outputs found

    An Evaluation of Feature Selection Robustness on Class Noisy Data

    Get PDF
    With the increasing growth of data dimensionality, feature selection has become a crucial step in a variety of machine learning and data mining applications. In fact, it allows identifying the most important attributes of the task at hand, improving the efficiency, interpretability, and final performance of the induced models. In recent literature, several studies have examined the strengths and weaknesses of the available feature selection methods from different points of view. Still, little work has been performed to investigate how sensitive they are to the presence of noisy instances in the input data. This is the specific field in which our work wants to make a contribution. Indeed, since noise is arguably inevitable in several application scenarios, it would be important to understand the extent to which the different selection heuristics can be affected by noise, in particular class noise (which is more harmful in supervised learning tasks). Such an evaluation may be especially important in the context of class-imbalanced problems, where any perturbation in the set of training records can strongly affect the final selection outcome. In this regard, we provide here a two-fold contribution by presenting (i) a general methodology to evaluate feature selection robustness on class noisy data and (ii) an experimental study that involves different selection methods, both univariate and multivariate. The experiments have been conducted on eight high-dimensional datasets chosen to be representative of different real-world domains, with interesting insights into the intrinsic degree of robustness of the considered selection approaches

    A Survey of Machine Learning Approaches Applied to Gene Expression Analysis for Cancer Prediction

    Get PDF
    Machine learning approaches are powerful techniques commonly employed for developing cancer prediction models using associated gene expression and mutation data. Our survey provides a comprehensive review of recent cancer studies that have employed gene expression data from several cancer types (breast, lung, kidney, ovarian, liver, central nervous system and gallbladder) for survival prediction,tumor identification and stratification. We also provide an overview of biomarker studies that are associated with these cancer types. The survey captures multiple aspects of machine learning associated cancer studies,including cancer classification, cancer prediction, identification of biomarker genes, microarray, and RNA-Seq data.We discuss the technical issues with current cancer prediction models and the corresponding measurement tools for determining the activity levels of gene expression between cancerous tissues and noncancerous tissues. Additionally, we investigate how identifying putative biomarker gene expression patterns can aid in predicting future risk of cancer and inform the provision of personalized treatment
    corecore