10,909 research outputs found

    Simple stopping criteria for information theoretic feature selection

    Full text link
    Feature selection aims to select the smallest feature subset that yields the minimum generalization error. In the rich literature in feature selection, information theory-based approaches seek a subset of features such that the mutual information between the selected features and the class labels is maximized. Despite the simplicity of this objective, there still remain several open problems in optimization. These include, for example, the automatic determination of the optimal subset size (i.e., the number of features) or a stopping criterion if the greedy searching strategy is adopted. In this paper, we suggest two stopping criteria by just monitoring the conditional mutual information (CMI) among groups of variables. Using the recently developed multivariate matrix-based Renyi's \alpha-entropy functional, which can be directly estimated from data samples, we showed that the CMI among groups of variables can be easily computed without any decomposition or approximation, hence making our criteria easy to implement and seamlessly integrated into any existing information theoretic feature selection methods with a greedy search strategy.Comment: Paper published in the journal of Entrop

    Resampling methods for parameter-free and robust feature selection with mutual information

    Get PDF
    Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic dataset as well as on real-world examples

    Perfect Information vs Random Investigation: Safety Guidelines for a Consumer in the Jungle of Product Differentiation

    Full text link
    We present a graph-theoretic model of consumer choice, where final decisions are shown to be influenced by information and knowledge, in the form of individual awareness, discriminating ability, and perception of market structure. Building upon the distance-based Hotelling's differentiation idea, we describe the behavioral experience of several prototypes of consumers, who walk a hypothetical cognitive path in an attempt to maximize their satisfaction. Our simulations show that even consumers endowed with a small amount of information and knowledge may reach a very high level of utility. On the other hand, complete ignorance negatively affects the whole consumption process. In addition, rather unexpectedly, a random walk on the graph reveals to be a winning strategy, below a minimal threshold of information and knowledge.Comment: 27 pages, 12 figure

    Massively-Parallel Feature Selection for Big Data

    Full text link
    We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of pp-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

    Optimal Structuring of Assessment Processes in Competition Law: A Survey of Theoretical Approaches

    Get PDF
    In competition law, the problem of the optimal design of institutional and procedural rules concerns assessment processes of the pro- and anticompetitiveness of business behaviors. This is well recognized in the discussion about the relative merits of different assessment principles such as the rule of reason and per se rules. Supported by modern industrial organization research, which applies a more differentiated analysis to the welfare effects of different business behaviors, a full-scale case-by-case assessment seems to be the prevailing idea. Even though the discussion mainly focuses on extreme solutions, different theoretical approaches do exist, which provide important determinants and allow for a sound analysis of appropriate legal directives and investigation procedures from a ‘Law and Economics’ perspective. Integrating and examining them in light of various constellations results in differentiated solutions of optimally structured assessment processes.Law Enforcement, Competition Law, Competition Policy, Antitrust Law, Antitrust Policy, Decision-Making

    Basics of Feature Selection and Statistical Learning for High Energy Physics

    Get PDF
    This document introduces basics in data preparation, feature selection and learning basics for high energy physics tasks. The emphasis is on feature selection by principal component analysis, information gain and significance measures for features. As examples for basic statistical learning algorithms, the maximum a posteriori and maximum likelihood classifiers are shown. Furthermore, a simple rule based classification as a means for automated cut finding is introduced. Finally two toolboxes for the application of statistical learning techniques are introduced.Comment: 12 pages, 8 figures. Part of the proceedings of the Track 'Computational Intelligence for HEP Data Analysis' at iCSC 200
    • 

    corecore