4 research outputs found
Consistent Feature Construction with Constrained Genetic Programming for Experimental Physics
A good feature representation is a determinant factor to achieve high
performance for many machine learning algorithms in terms of classification.
This is especially true for techniques that do not build complex internal
representations of data (e.g. decision trees, in contrast to deep neural
networks). To transform the feature space, feature construction techniques
build new high-level features from the original ones. Among these techniques,
Genetic Programming is a good candidate to provide interpretable features
required for data analysis in high energy physics. Classically, original
features or higher-level features based on physics first principles are used as
inputs for training. However, physicists would benefit from an automatic and
interpretable feature construction for the classification of particle collision
events.
Our main contribution consists in combining different aspects of Genetic
Programming and applying them to feature construction for experimental physics.
In particular, to be applicable to physics, dimensional consistency is enforced
using grammars.
Results of experiments on three physics datasets show that the constructed
features can bring a significant gain to the classification accuracy. To the
best of our knowledge, it is the first time a method is proposed for
interpretable feature construction with units of measurement, and that experts
in high-energy physics validate the overall approach as well as the
interpretability of the built features.Comment: Accepted in this version to CEC 201
Avoiding the bloat with stochastic grammar-based genetic programming
Abstract. The application of Genetic Programming to the discovery of empirical laws is often impaired by the huge size of the search space, and consequently by the computer resources needed. In many cases, the extreme demand for memory and CPU is due to the massive growth of non-coding segments, the introns. The paper presents a new program evolution framework which combines distribution-based evolution in the PBIL spirit, with grammar-based genetic programming; the information is stored as a probability distribution on the grammar rules, rather than in a population. Experiments on a real-world like problem show that this approach gives a practical solution to the problem of intron growth.