3 research outputs found
Learning using Unselected Features (LUFe)
Feature selection has been studied in machine learning and data mining for many years, and is a valuable way to improve classification accuracy while reducing model complexity. Two main classes of feature selection methods - filter and wrapper - discard those features which are not selected, and do not consider them in the predictive model. We propose that these unselected features may instead be used as an additional source of information at train time. We describe a strategy called Learning using Unselected Features (LUFe) that allows selected and unselected features to serve different functions in classification. In this framework, selected features are used directly to set the decision boundary, and unselected features are utilised in a secondary role, with no additional cost at test time. Our empirical results on 49 textual datasets show that LUFe can improve classification performance in comparison with standard wrapper and filter feature selection
Recommended from our members
Privileged Learning using Unselected Features
This thesis proposes a novel machine learning paradigm called Learning using Unselected Features (LUFe), which front-loads computation to training time in order to improve classifier performance, without additional cost at deployment. This is achieved by repurposing and combining techniques from feature selection and Learning Using Privileged Information (LUPI). Feature selection is a means of reducing model complexity, which enables deployment in devices with limited computational power, but this can waste additional resources which may be available at training time. LUPI is a paradigm that allows extra information about the training data to be harnessed by the learner, but this requires an additional set of highly informative attributes. In the LUFe setting, feature selection is used to partition datasets into primary and secondary subsets, instead of discarding the features which are unselected. Both datasets are then passed to a LUPI algorithm, enabling the secondary feature-set to provide additional guidance at training time only, in place of `privileged' information. Only the selected features are used at train time, maintaining low-cost deployment while exploiting train-time resources.
Experimental results on a large number of datasets demonstrate that LUFe facilitates an improvement in classification accuracy over standard feature selection approaches in a majority of cases. This performance boost is consistent across a range of feature selection approaches, and is largest when the SVM+ algorithm is used for implementation. This effect is shown to be partially dependent on the usage of information in the unselected features, as well as resulting from the presence of additional constraints on the function space searched for the model. The enhancement by LUFe is shown to be inversely correlated with the performance of standard feature selection and mediated by a further reduction in model variance, beyond that provided by standard feature selection. Aside from demonstrating the direct practical benefit of LUFe, this work makes the contribution of broadening the scope of applications for the LUPI framework