Search CORE

14,973 research outputs found

Risk Bounds for Embedded Variable Selection in Classification Trees

Author: Gey Servane
Mary-Huard Tristan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

International audienceThe problems of model and variable selections for classification trees are jointly considered. A penalized criterion is proposed which explicitly takes into account the number of variables, and a risk bound inequality is provided for the tree classifier minimizing this criterion. This penalized criterion is compared to the one used during the pruning step of the CART algorithm. It is shown that the two criteria are similar under some specific margin assumptions. In practice, the tuning parameter of the CART penalty has to be calibrated by hold-out. Simulation studies are performed which confirm that the hold-out procedure mimics the form of the proposed penalized criterion

HAL Descartes

Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded

Author: Krennrich Gerhard
Lee Robert M.
Letsios Dimitrios
Misener Ruth
Mistry Miten
Publication venue
Publication date: 25/09/2019
Field of study

Decision trees usefully represent sparse, high dimensional and noisy data. Having learned a function from this data, we may want to thereafter integrate the function into a larger decision-making problem, e.g., for picking the best chemical process catalyst. We study a large-scale, industrially-relevant mixed-integer nonlinear nonconvex optimization problem involving both gradient-boosted trees and penalty functions mitigating risk. This mixed-integer optimization problem with convex penalty terms broadly applies to optimizing pre-trained regression tree models. Decision makers may wish to optimize discrete models to repurpose legacy predictive models, or they may wish to optimize a discrete model that particularly well-represents a data set. We develop several heuristic methods to find feasible solutions, and an exact, branch-and-bound algorithm leveraging structural properties of the gradient-boosted trees and penalty functions. We computationally test our methods on concrete mixture design instance and a chemical catalysis industrial instance

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

PhysicsGP: A Genetic Programming Approach to Event Selection

Author: Cousins
Cranmer
Cranmer
Field
Kishore
Koza
Kyle Cranmer
Luke
R. Sean Bowman
Rumelhart
Scott
Sontag
Vaiciulis
Vapnik
Vapnik
Werbos
Publication venue: 'Elsevier BV'
Publication date: 05/02/2004
Field of study

We present a novel multivariate classification technique based on Genetic Programming. The technique is distinct from Genetic Algorithms and offers several advantages compared to Neural Networks and Support Vector Machines. The technique optimizes a set of human-readable classifiers with respect to some user-defined performance measure. We calculate the Vapnik-Chervonenkis dimension of this class of learning machines and consider a practical example: the search for the Standard Model Higgs Boson at the LHC. The resulting classifier is very fast to evaluate, human-readable, and easily portable. The software may be downloaded at: http://cern.ch/~cranmer/PhysicsGP.htmlComment: 16 pages 9 figures, 1 table. Submitted to Comput. Phys. Commu

arXiv.org e-Print Archive

CiteSeerX

Crossref

CERN Document Server

A Regularized Method for Selecting Nested Groups of Relevant Genes from Microarray Data

Author: De Mol Christine
Mosci Sofia
Traskine Magali
Verri Alessandro
Publication venue
Publication date: 10/09/2008
Field of study

Gene expression analysis aims at identifying the genes able to accurately predict biological parameters like, for example, disease subtyping or progression. While accurate prediction can be achieved by means of many different techniques, gene identification, due to gene correlation and the limited number of available samples, is a much more elusive problem. Small changes in the expression values often produce different gene lists, and solutions which are both sparse and stable are difficult to obtain. We propose a two-stage regularization method able to learn linear models characterized by a high prediction performance. By varying a suitable parameter these linear models allow to trade sparsity for the inclusion of correlated genes and to produce gene lists which are almost perfectly nested. Experimental results on synthetic and microarray data confirm the interesting properties of the proposed method and its potential as a starting point for further biological investigationsComment: 17 pages, 8 Post-script figure

arXiv.org e-Print Archive

DI-fusion

The Voice of Optimization

Author: Bertsimas Dimitris
Stellato Bartolomeo
Publication venue
Publication date: 02/06/2020
Field of study

We introduce the idea that using optimal classification trees (OCTs) and optimal classification trees with-hyperplanes (OCT-Hs), interpretable machine learning algorithms developed by Bertsimas and Dunn [2017, 2018], we are able to obtain insight on the strategy behind the optimal solution in continuous and mixed-integer convex optimization problem as a function of key parameters that affect the problem. In this way, optimization is not a black box anymore. Instead, we redefine optimization as a multiclass classification problem where the predictor gives insights on the logic behind the optimal solution. In other words, OCTs and OCT-Hs give optimization a voice. We show on several realistic examples that the accuracy behind our method is in the 90%-100% range, while even when the predictions are not correct, the degree of suboptimality or infeasibility is very low. We compare optimal strategy predictions of OCTs and OCT-Hs and feedforward neural networks (NNs) and conclude that the performance of OCT-Hs and NNs is comparable. OCTs are somewhat weaker but often competitive. Therefore, our approach provides a novel insightful understanding of optimal strategies to solve a broad class of continuous and mixed-integer optimization problems

arXiv.org e-Print Archive

DSpace@MIT

Recommended from our members

Software tools for stochastic programming: A Stochastic Programming Integrated Environment (SPInE)

Author: Kyriakis T
Mitra G
Poojari CA
Valente P
Publication venue: The Centre for the Analysis of Risk and Optimisation Modelling Applications (CARISMA), Brunel University
Publication date: 01/01/2001
Field of study

SP models combine the paradigm of dynamic linear programming with modelling of random parameters, providing optimal decisions which hedge against future uncertainties. Advances in hardware as well as software techniques and solution methods have made SP a viable optimisation tool. We identify a growing need for modelling systems which support the creation and investigation of SP problems. Our SPInE system integrates a number of components which include a flexible modelling tool (based on stochastic extensions of the algebraic modelling languages AMPL and MPL), stochastic solvers, as well as special purpose scenario generators and database tools. We introduce an asset/liability management model and illustrate how SPInE can be used to create and process this model as a multistage SP application

Brunel University Research Archive