282,216 research outputs found

    A Sparse-Modeling Based Approach for Class Specific Feature Selection

    Get PDF
    In this work, we propose a novel Feature Selection framework called Sparse-Modeling Based Approach for Class Specific Feature Selection (SMBA-CSFS), that simultaneously exploits the idea of Sparse Modeling and Class-Specific Feature Selection. Feature selection plays a key role in several fields (e.g., computational biology), making it possible to treat models with fewer variables which, in turn, are easier to explain, by providing valuable insights on the importance of their role, and likely speeding up the experimental validation. Unfortunately, also corroborated by the no free lunch theorems, none of the approaches in literature is the most apt to detect the optimal feature subset for building a final model, thus it still represents a challenge. The proposed feature selection procedure conceives a two-step approach: (a) a sparse modeling-based learning technique is first used to find the best subset of features, for each class of a training set; (b) the discovered feature subsets are then fed to a class-specific feature selection scheme, in order to assess the effectiveness of the selected features in classification tasks. To this end, an ensemble of classifiers is built, where each classifier is trained on its own feature subset discovered in the previous phase, and a proper decision rule is adopted to compute the ensemble responses. In order to evaluate the performance of the proposed method, extensive experiments have been performed on publicly available datasets, in particular belonging to the computational biology field where feature selection is indispensable: the acute lymphoblastic leukemia and acute myeloid leukemia, the human carcinomas, the human lung carcinomas, the diffuse large B-cell lymphoma, and the malignant glioma. SMBA-CSFS is able to identify/retrieve the most representative features that maximize the classification accuracy. With top 20 and 80 features, SMBA-CSFS exhibits a promising performance when compared to its competitors from literature, on all considered datasets, especially those with a higher number of features. Experiments show that the proposed approach may outperform the state-of-the-art methods when the number of features is high. For this reason, the introduced approach proposes itself for selection and classification of data with a large number of features and classes

    Modeling differences in the time-frequency representation of EEG signals through HMM’s for classification of imaginary motor tasks

    Get PDF
    Brain Computer interfaces are systems that allow the control of external devices using the information extracted from the brain signals. Such systems find applications in rehabilitation, as an alternative communication channel and in multimedia applications for entertainment and gaming. In this work, a new approach based on the Time-Frequency (TF) distribution of the signal power, obtained by autoregressive methods and the use Hidden Markov models (HMM) is developed. This approach take into account the changes of power on different frequency bands with time. For that purpose HMM’s are used to modeling the changes in the power during the execution of two different motor tasks. The use of TF methods involves a problem related to the selection of the frequency bands that can lead to over fitting (due to the course of dimensionality) as well as problems related to the selection of the model parameters. These problems are solved in this work by combining two methods for feature selection: Fisher Score and Sequential Floating Forward Selection. The results are compared to the three top results of the BCI competition IV. It is shown here that the proposed method over perform those other methods in four subjects and the average over all the subjects equals the one obtained by the winner algorithm of the competition

    Multinomial Logit Models with Implicit Variable Selection

    Get PDF
    Multinomial logit models which are most commonly used for the modeling of unordered multi-category responses are typically restricted to the use of few predictors. In the high-dimensional case maximum likelihood estimates frequently do not exist. In this paper we are developing a boosting technique called multinomBoost that performs variable selection and fits the multinomial logit model also when predictors are high-dimensional. Since in multicategory models the effect of one predictor variable is represented by several parameters one has to distinguish between variable selection and parameter selection. A special feature of the approach is that, in contrast to existing approaches, it selects variables not parameters. The method can distinguish between mandatory predictors and optional predictors. Moreover, it adapts to metric, binary, nominal and ordinal predictors. Regularization within the algorithm allows to include nominal and ordinal variables which have many categories. In the case of ordinal predictors the order information is used. The performance of the boosting technique with respect to mean squared error, prediction error and the identification of relevant variables is investigated in a simulation study. For two real life data sets the results are also compared with the Lasso approach which selects parameters
    corecore