8,008 research outputs found
Mixed-Integer Convex Nonlinear Optimization with Gradient-Boosted Trees Embedded
Decision trees usefully represent sparse, high dimensional and noisy data.
Having learned a function from this data, we may want to thereafter integrate
the function into a larger decision-making problem, e.g., for picking the best
chemical process catalyst. We study a large-scale, industrially-relevant
mixed-integer nonlinear nonconvex optimization problem involving both
gradient-boosted trees and penalty functions mitigating risk. This
mixed-integer optimization problem with convex penalty terms broadly applies to
optimizing pre-trained regression tree models. Decision makers may wish to
optimize discrete models to repurpose legacy predictive models, or they may
wish to optimize a discrete model that particularly well-represents a data set.
We develop several heuristic methods to find feasible solutions, and an exact,
branch-and-bound algorithm leveraging structural properties of the
gradient-boosted trees and penalty functions. We computationally test our
methods on concrete mixture design instance and a chemical catalysis industrial
instance
New multicategory boosting algorithms based on multicategory Fisher-consistent losses
Fisher-consistent loss functions play a fundamental role in the construction
of successful binary margin-based classifiers. In this paper we establish the
Fisher-consistency condition for multicategory classification problems. Our
approach uses the margin vector concept which can be regarded as a
multicategory generalization of the binary margin. We characterize a wide class
of smooth convex loss functions that are Fisher-consistent for multicategory
classification. We then consider using the margin-vector-based loss functions
to derive multicategory boosting algorithms. In particular, we derive two new
multicategory boosting algorithms by using the exponential and logistic
regression losses.Comment: Published in at http://dx.doi.org/10.1214/08-AOAS198 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Formal Verification of Input-Output Mappings of Tree Ensembles
Recent advances in machine learning and artificial intelligence are now being
considered in safety-critical autonomous systems where software defects may
cause severe harm to humans and the environment. Design organizations in these
domains are currently unable to provide convincing arguments that their systems
are safe to operate when machine learning algorithms are used to implement
their software.
In this paper, we present an efficient method to extract equivalence classes
from decision trees and tree ensembles, and to formally verify that their
input-output mappings comply with requirements. The idea is that, given that
safety requirements can be traced to desirable properties on system
input-output patterns, we can use positive verification outcomes in safety
arguments. This paper presents the implementation of the method in the tool
VoTE (Verifier of Tree Ensembles), and evaluates its scalability on two case
studies presented in current literature.
We demonstrate that our method is practical for tree ensembles trained on
low-dimensional data with up to 25 decision trees and tree depths of up to 20.
Our work also studies the limitations of the method with high-dimensional data
and preliminarily investigates the trade-off between large number of trees and
time taken for verification
Efficient Monte Carlo Integration Using Boosted Decision Trees and Generative Deep Neural Networks
New machine learning based algorithms have been developed and tested for
Monte Carlo integration based on generative Boosted Decision Trees and Deep
Neural Networks. Both of these algorithms exhibit substantial improvements
compared to existing algorithms for non-factorizable integrands in terms of the
achievable integration precision for a given number of target function
evaluations. Large scale Monte Carlo generation of complex collider physics
processes with improved efficiency can be achieved by implementing these
algorithms into commonly used matrix element Monte Carlo generators once their
robustness is demonstrated and performance validated for the relevant classes
of matrix elements
Optimization by gradient boosting
Gradient boosting is a state-of-the-art prediction technique that
sequentially produces a model in the form of linear combinations of simple
predictors---typically decision trees---by solving an infinite-dimensional
convex optimization problem. We provide in the present paper a thorough
analysis of two widespread versions of gradient boosting, and introduce a
general framework for studying these algorithms from the point of view of
functional optimization. We prove their convergence as the number of iterations
tends to infinity and highlight the importance of having a strongly convex risk
functional to minimize. We also present a reasonable statistical context
ensuring consistency properties of the boosting predictors as the sample size
grows. In our approach, the optimization procedures are run forever (that is,
without resorting to an early stopping strategy), and statistical
regularization is basically achieved via an appropriate penalization of
the loss and strong convexity arguments
- …