4 research outputs found

    SVMRFE-GSEA: m贸dulo de aprendizaje supervisado en GSEA

    Get PDF
    El presente trabajo presenta un m贸dulo de software que amplia y fortalece las capacidades de GSEA. El principal objetivo de SVMRFE-GSEA es mejorar el an谩lisis de selecci贸n de genes y hacerlo m谩s robusto. Para ello, SVMRFEGSEA completa los resultados de las m茅tricas de GSEA con un aprendizaje automatizado que considera la interacci贸n entre genes. Resultados experimentales sobre el conjunto de datos Diabetes y Leukemia muestran un primer aporte de la herramienta.Sociedad Argentina de Inform谩tica e Investigaci贸n Operativ

    SVM-RFE PEAK SELECTION FOR CANCER CLASSIFICATION WITH MASS SPECTROMETRY DATA

    No full text

    Data mining of many-attribute data : investigating the interaction between feature selection strategy and statistical features of datasets

    Get PDF
    In many datasets, there is a very large number of attributes (e.g. many thousands). Such datasets can cause many problems for machine learning methods. Various feature selection (FS) strategies have been developed to address these problems. The idea of an FS strategy is to reduce the number of features in a dataset (e.g. from many thousands to a few hundred) so that machine learning and/or statistical analysis can be done much more quickly and effectively. Obviously, FS strategies attempt to select the features that are most important, considering the machine learning task to be done. The work presented in this dissertation concerns the comparison between several popular feature selection strategies, and, in particular, investigation of the interaction between feature selection strategy and simple statistical features of the dataset. The basic hypothesis, not investigated before, is that the correct choice of FS strategy for a particular dataset should be based on a simple (at least) statistical analysis of the dataset. First, we examined the performance of several strategies on a selection of datasets. Strategies examined were: four widely-used FS strategies (Correlation, Relief F, Evolutionary Algorithm, no-feature-selection), several feature bias (FB) strategies (in which the machine learning method considers all features, but makes use of bias values suggested by the FB strategy), and also combinations of FS and FB strategies. The results showed us that FB methods displayed strong capability on some datasets and that combined strategies were also often successful. Examining these results, we noted that patterns of performance were not immediately understandable. This led to the above hypothesis (one of the main contributions of the thesis) that statistical features of the dataset are an important consideration when choosing an FS strategy. We then investigated this hypothesis with several further experiments. Analysis of the results revealed that a simple statistical feature of a dataset, that can be easily pre-calculated, has a clear relationship with the performance Silang Luo PHD-06-2009 Page 2 of certain FS methods, and a similar relationship with differences in performance between certain pairs of FS strategies. In particular, Correlation based FS is a very widely-used FS technique based on the basic hypothesis that good feature sets contain features that are highly correlated with the class, yet uncorrelated with each other. By analysing the outcome of several FS strategies on different artificial datasets, the experiments suggest that CFS is never the best choice for poorly correlated data. Finally, considering several methods, we suggest tentative guidelines for choosing an FS strategy based on simply calculated measures of the dataset
    corecore