261,196 research outputs found

    Feature Selection with Cost Constraint

    Get PDF
    When acquiring consumer data for marketing or new business initiatives, it is important to decide what features of potential customers should be acquired. We study feature selection and acquisition problem with cost constraint in the context of regression prediction. We formulate the feature selection and acquisition problem as a nonlinear programming problem that minimizes prediction error and number of features used in the model subject to a budget constraint. We derive the analytical properties of the solution for this problem and provide a computational procedure for solving the problem. The results of a preliminary experiment demonstrate the effectiveness of our approach

    Modelling with feature costs under a total cost budget constraint

    Get PDF
    In modern high-dimensional data sets, feature selection is an essential pre-processing step for many statistical modelling tasks. The field of cost-sensitive feature selection extends the concepts of feature selection by introducing so-called feature costs. These do not necessarily relate to financial costs, but can be seen as a general construct to numerically valuate any disfavored aspect of a feature, like for example the run-time of a measurement procedure, or the patient harm of a biomarker test. There are multiple ideas to define a cost-sensitive feature selection setup. The strategy applied in this thesis is to introduce an additive cost-budget as an upper bound of the total costs. This extends the standard feature selection problem by an additional constraint on the sum of costs for included features. Main areas of research in this field include adaptations of standard feature selection algorithms to account for this additional constraint. However, cost-aware selection criteria also play an important role for the overall performance of these methods and need to be discussed in detail as well. This cumulative dissertation summarizes the work of three papers in this field. Two of these introduce new methods for cost-sensitive feature selection with a fixed budget constraint. The other discusses a common trade-off criterion of performance and cost. For this criterion, an analysis of the selection outcome in different setups revealed a reduction of the ability to distinguish between information and noise. This can for example be counteracted by introducing a hyperparameter in the criterion. The presented research on new cost-sensitive methods comprises adaptations of Greedy Forward Selection, Genetic Algorithms, filter approaches and a novel Random Forest based algorithm, which selects individual trees from a low-cost tree ensemble. Central concepts of each method are discussed and thorough simulation studies to evaluate individual strengths and weaknesses are provided. Every simulation study includes artificial, as well as real-world data examples to validate results in a broad context. Finally, all chapters present discussions with practical recommendations on the application of the proposed methods and conclude with an outlook on possible further research for the respective topics

    Intelligent feature based resource selection and process planning

    Get PDF
    Lien vers la version éditeur: https://www.inderscience.com/books/index.php?action=record&rec_id=755&chapNum=3&journalID=1022&year=2010This paper presents an intelligent knowledge-based integrated manufacturing system using the STEP feature-based modeling and rule based intelligent techniques to generate suitable process plans for prismatic parts. The system carries out several stages of process planning, such as identification of the pairs of feature/tool that satisfy the required conditions, generation of the possible process plans from identified tools/machine pairs, and selection of the most interesting process plans considering the economical or timing indicators. The suitable processes plans are selected according to the acceptable range of quality, time and cost factors. Each process plan is represented in the tree format by the information items corresponding to their CNC Machine, required tools characteristics, times (machining, setup, preparatory) and the required machining sequences. The process simulation module is provided to demonstrate the different sequences of machining. After selection of suitable process plan, the G-code language used by CNC machines is generated automatically. This approach is validated through a case

    Oscar : Optimal subset cardinality regression using the L0-pseudonorm with applications to prognostic modelling of prostate cancer

    Get PDF
    Author summaryFeature subset selection has become a crucial part of building biomedical models, due to the abundance of available predictors in many applications, yet there remains an uncertainty of their importance and generalization ability. Regularized regression methods have become popular approaches to tackle this challenge by balancing the model goodness-of-fit against the increasing complexity of the model in terms of coefficients that deviate from zero. Regularization norms are pivotal in formulating the model complexity, and currently L-1-norm (LASSO), L-2-norm (Ridge Regression) and their hybrid (Elastic Net) dominate the field. In this paper, we present a novel methodology that is based on the L-0-pseudonorm, also known as the best subset selection, which has largely gone overlooked due to its challenging discrete nature. Our methodology makes use of a continuous transformation of the discrete optimization problem, and provides effective solvers implemented in a user friendly R software package. We exemplify the use of oscar-package in the context of prostate cancer prognostic prediction using both real-world hospital registry and clinical cohort data. By benchmarking the methodology against existing regularization methods, we illustrate the advantages of the L-0-pseudonorm for better clinical applicability, selection of grouped features, and demonstrate its applicability in high-dimensional transcriptomics datasets.In many real-world applications, such as those based on electronic health records, prognostic prediction of patient survival is based on heterogeneous sets of clinical laboratory measurements. To address the trade-off between the predictive accuracy of a prognostic model and the costs related to its clinical implementation, we propose an optimized L-0-pseudonorm approach to learn sparse solutions in multivariable regression. The model sparsity is maintained by restricting the number of nonzero coefficients in the model with a cardinality constraint, which makes the optimization problem NP-hard. In addition, we generalize the cardinality constraint for grouped feature selection, which makes it possible to identify key sets of predictors that may be measured together in a kit in clinical practice. We demonstrate the operation of our cardinality constraint-based feature subset selection method, named OSCAR, in the context of prognostic prediction of prostate cancer patients, where it enables one to determine the key explanatory predictors at different levels of model sparsity. We further explore how the model sparsity affects the model accuracy and implementation cost. Lastly, we demonstrate generalization of the presented methodology to high-dimensional transcriptomics data.Peer reviewe

    Feature selection for chemical sensor arrays using mutual information

    Get PDF
    We address the problem of feature selection for classifying a diverse set of chemicals using an array of metal oxide sensors. Our aim is to evaluate a filter approach to feature selection with reference to previous work, which used a wrapper approach on the same data set, and established best features and upper bounds on classification performance. We selected feature sets that exhibit the maximal mutual information with the identity of the chemicals. The selected features closely match those found to perform well in the previous study using a wrapper approach to conduct an exhaustive search of all permitted feature combinations. By comparing the classification performance of support vector machines (using features selected by mutual information) with the performance observed in the previous study, we found that while our approach does not always give the maximum possible classification performance, it always selects features that achieve classification performance approaching the optimum obtained by exhaustive search. We performed further classification using the selected feature set with some common classifiers and found that, for the selected features, Bayesian Networks gave the best performance. Finally, we compared the observed classification performances with the performance of classifiers using randomly selected features. We found that the selected features consistently outperformed randomly selected features for all tested classifiers. The mutual information filter approach is therefore a computationally efficient method for selecting near optimal features for chemical sensor arrays

    ASlib: A Benchmark Library for Algorithm Selection

    Full text link
    The task of algorithm selection involves choosing an algorithm from a set of algorithms on a per-instance basis in order to exploit the varying performance of algorithms over a set of instances. The algorithm selection problem is attracting increasing attention from researchers and practitioners in AI. Years of fruitful applications in a number of domains have resulted in a large amount of data, but the community lacks a standard format or repository for this data. This situation makes it difficult to share and compare different approaches effectively, as is done in other, more established fields. It also unnecessarily hinders new researchers who want to work in this area. To address this problem, we introduce a standardized format for representing algorithm selection scenarios and a repository that contains a growing number of data sets from the literature. Our format has been designed to be able to express a wide variety of different scenarios. Demonstrating the breadth and power of our platform, we describe a set of example experiments that build and evaluate algorithm selection models through a common interface. The results display the potential of algorithm selection to achieve significant performance improvements across a broad range of problems and algorithms.Comment: Accepted to be published in Artificial Intelligence Journa

    Resource Constrained Structured Prediction

    Full text link
    We study the problem of structured prediction under test-time budget constraints. We propose a novel approach applicable to a wide range of structured prediction problems in computer vision and natural language processing. Our approach seeks to adaptively generate computationally costly features during test-time in order to reduce the computational cost of prediction while maintaining prediction performance. We show that training the adaptive feature generation system can be reduced to a series of structured learning problems, resulting in efficient training using existing structured learning algorithms. This framework provides theoretical justification for several existing heuristic approaches found in literature. We evaluate our proposed adaptive system on two structured prediction tasks, optical character recognition (OCR) and dependency parsing and show strong performance in reduction of the feature costs without degrading accuracy
    • …
    corecore