Search CORE

10,909 research outputs found

Simple stopping criteria for information theoretic feature selection

Author: Principe Jose C.
Yu Shujian
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

Feature selection aims to select the smallest feature subset that yields the minimum generalization error. In the rich literature in feature selection, information theory-based approaches seek a subset of features such that the mutual information between the selected features and the class labels is maximized. Despite the simplicity of this objective, there still remain several open problems in optimization. These include, for example, the automatic determination of the optimal subset size (i.e., the number of features) or a stopping criterion if the greedy searching strategy is adopted. In this paper, we suggest two stopping criteria by just monitoring the conditional mutual information (CMI) among groups of variables. Using the recently developed multivariate matrix-based Renyi's \alpha-entropy functional, which can be directly estimated from data samples, we showed that the CMI among groups of variables can be easily computed without any decomposition or approximation, hence making our criteria easy to implement and seamlessly integrated into any existing information theoretic feature selection methods with a greedy search strategy.Comment: Paper published in the journal of Entrop

arXiv.org e-Print Archive

Directory of Open Access Journals

Resampling methods for parameter-free and robust feature selection with mutual information

Author: Andreas Hahn
Battiti
Bellmann
Benoudjit
Bonnlander
Conrad
Craddock
D. François
Dijck
Diks
F. Rossi
Fleuret
Frank
François
Friedman
Fung
Good
Guyon
Guyon
Hammer
Hild
Hoffman
Hummel
Kraskov
Kwak
Kwak
M. Verleysen
Nicolaou
Opdyke
Purushothaman
Rossi
Rossi
Rossi
Scott
Stefansson
V. Wertz
Verikas
Zhou
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Combining the mutual information criterion with a forward feature selection strategy offers a good trade-off between optimality of the selected feature subset and computation time. However, it requires to set the parameter(s) of the mutual information estimator and to determine when to halt the forward procedure. These two choices are difficult to make because, as the dimensionality of the subset increases, the estimation of the mutual information becomes less and less reliable. This paper proposes to use resampling methods, a K-fold cross-validation and the permutation test, to address both issues. The resampling methods bring information about the variance of the estimator, information which can then be used to automatically set the parameter and to calculate a threshold to stop the forward procedure. The procedure is illustrated on a synthetic dataset as well as on real-world examples

arXiv.org e-Print Archive

Crossref

INRIA a CCSD electronic archive server

DIAL UCLouvain

Perfect Information vs Random Investigation: Safety Guidelines for a Consumer in the Jungle of Product Differentiation

Author: Biondo A. E.
Giarlotta A.
Pluchino A.
Rapisarda A.
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/07/2015
Field of study

We present a graph-theoretic model of consumer choice, where final decisions are shown to be influenced by information and knowledge, in the form of individual awareness, discriminating ability, and perception of market structure. Building upon the distance-based Hotelling's differentiation idea, we describe the behavioral experience of several prototypes of consumers, who walk a hypothetical cognitive path in an attempt to maximize their satisfaction. Our simulations show that even consumers endowed with a small amount of information and knowledge may reach a very high level of utility. On the other hand, complete ignorance negatively affects the whole consumption process. In addition, rather unexpectedly, a random walk on the graph reveals to be a winning strategy, below a minimal threshold of information and knowledge.Comment: 27 pages, 12 figure

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Massively-Parallel Feature Selection for Big Data

Author: Borboudakis Giorgos
Christophides Vassilis
Katsogridakis Pavlos
Pratikakis Polyvios
Tsamardinos Ioannis
Publication venue
Publication date: 23/08/2017
Field of study

We present the Parallel, Forward-Backward with Pruning (PFBP) algorithm for feature selection (FS) in Big Data settings (high dimensionality and/or sample size). To tackle the challenges of Big Data FS PFBP partitions the data matrix both in terms of rows (samples, training examples) as well as columns (features). By employing the concepts of

p

-values of conditional independence tests and meta-analysis techniques PFBP manages to rely only on computations local to a partition while minimizing communication costs. Then, it employs powerful and safe (asymptotically sound) heuristics to make early, approximate decisions, such as Early Dropping of features from consideration in subsequent iterations, Early Stopping of consideration of features within the same iteration, or Early Return of the winner in each iteration. PFBP provides asymptotic guarantees of optimality for data distributions faithfully representable by a causal network (Bayesian network or maximal ancestral graph). Our empirical analysis confirms a super-linear speedup of the algorithm with increasing sample size, linear scalability with respect to the number of features and processing cores, while dominating other competitive algorithms in its class

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Hal-Diderot

Optimal Structuring of Assessment Processes in Competition Law: A Survey of Theoretical Approaches

Author: Jürgen-Peter Kretschmer
Publication venue
Publication date
Field of study

In competition law, the problem of the optimal design of institutional and procedural rules concerns assessment processes of the pro- and anticompetitiveness of business behaviors. This is well recognized in the discussion about the relative merits of different assessment principles such as the rule of reason and per se rules. Supported by modern industrial organization research, which applies a more differentiated analysis to the welfare effects of different business behaviors, a full-scale case-by-case assessment seems to be the prevailing idea. Even though the discussion mainly focuses on extreme solutions, different theoretical approaches do exist, which provide important determinants and allow for a sound analysis of appropriate legal directives and investigation procedures from a ‘Law and Economics’ perspective. Integrating and examining them in light of various constellations results in differentiated solutions of optimally structured assessment processes.Law Enforcement, Competition Law, Competition Policy, Antitrust Law, Antitrust Policy, Decision-Making

Research Papers in Economics

Basics of Feature Selection and Statistical Learning for High Energy Physics

Author: Vossen Anselm
Publication venue
Publication date: 16/03/2008
Field of study

This document introduces basics in data preparation, feature selection and learning basics for high energy physics tasks. The emphasis is on feature selection by principal component analysis, information gain and significance measures for features. As examples for basic statistical learning algorithms, the maximum a posteriori and maximum likelihood classifiers are shown. Furthermore, a simple rule based classification as a means for automated cut finding is introduced. Finally two toolboxes for the application of statistical learning techniques are introduced.Comment: 12 pages, 8 figures. Part of the proceedings of the Track 'Computational Intelligence for HEP Data Analysis' at iCSC 200

arXiv.org e-Print Archive

CERN Document Server