23 research outputs found
Optimal Decision Trees for Nonlinear Metrics
Nonlinear metrics, such as the F1-score, Matthews correlation coefficient,
and Fowlkes-Mallows index, are often used to evaluate the performance of
machine learning models, in particular, when facing imbalanced datasets that
contain more samples of one class than the other. Recent optimal decision tree
algorithms have shown remarkable progress in producing trees that are optimal
with respect to linear criteria, such as accuracy, but unfortunately nonlinear
metrics remain a challenge. To address this gap, we propose a novel algorithm
based on bi-objective optimisation, which treats misclassifications of each
binary class as a separate objective. We show that, for a large class of
metrics, the optimal tree lies on the Pareto frontier. Consequently, we obtain
the optimal tree by using our method to generate the set of all nondominated
trees. To the best of our knowledge, this is the first method to compute
provably optimal decision trees for nonlinear metrics. Our approach leads to a
trade-off when compared to optimising linear metrics: the resulting trees may
be more desirable according to the given nonlinear metric at the expense of
higher runtimes. Nevertheless, the experiments illustrate that runtimes are
reasonable for majority of the tested datasets
Smart Predict-and-Optimize for Hard Combinatorial Optimization Problems
Combinatorial optimization assumes that all parameters of the optimization
problem, e.g. the weights in the objective function is fixed. Often, these
weights are mere estimates and increasingly machine learning techniques are
used to for their estimation. Recently, Smart Predict and Optimize (SPO) has
been proposed for problems with a linear objective function over the
predictions, more specifically linear programming problems. It takes the regret
of the predictions on the linear problem into account, by repeatedly solving it
during learning. We investigate the use of SPO to solve more realistic discrete
optimization problems. The main challenge is the repeated solving of the
optimization problem. To this end, we investigate ways to relax the problem as
well as warmstarting the learning and the solving. Our results show that even
for discrete problems it often suffices to train by solving the relaxation in
the SPO loss. Furthermore, this approach outperforms, for most instances, the
state-of-the-art approach of Wilder, Dilkina, and Tambe. We experiment with
weighted knapsack problems as well as complex scheduling problems and show for
the first time that a predict-and-optimize approach can successfully be used on
large-scale combinatorial optimization problems
Necessary and Sufficient Conditions for Optimal Decision Trees using Dynamic Programming
Global optimization of decision trees has shown to be promising in terms of
accuracy, size, and consequently human comprehensibility. However, many of the
methods used rely on general-purpose solvers for which scalability remains an
issue. Dynamic programming methods have been shown to scale much better because
they exploit the tree structure by solving subtrees as independent subproblems.
However, this only works when an objective can be optimized separately for
subtrees. We explore this relationship in detail and show necessary and
sufficient conditions for such separability and generalize previous dynamic
programming approaches into a framework that can optimize any combination of
separable objectives and constraints. Experiments on five application domains
show the general applicability of this framework, while outperforming the
scalability of general-purpose solvers by a large margin
Evaluation of Free Form Deformation and Demons Registration with Discontinuities
Medical image registration plays an important part in most today’s clinical procedures. Registration goal is to find transformation which warps one image into the space of another. Registration of moving organs in human body has a significant part in therapy planning. This task is harder in cases when one organ (tissue) slides along another, i.e. in a case of discontinuities in the motion field. Discontinuities introduce unwanted transformations which often lead to poor or unsatisfied registration results. In this paper we evaluate one form of discontinuities for two well-known and used registration algorithms namely Free Form Deformation and Demons
MurTree: Optimal Classification Trees via Dynamic Programming and Search
Decision tree learning is a widely used approach in machine learning,
favoured in applications that require concise and interpretable models.
Heuristic methods are traditionally used to quickly produce models with
reasonably high accuracy. A commonly criticised point, however, is that the
resulting trees may not necessarily be the best representation of the data in
terms of accuracy and size. In recent years, this motivated the development of
optimal classification tree algorithms that globally optimise the decision tree
in contrast to heuristic methods that perform a sequence of locally optimal
decisions. We follow this line of work and provide a novel algorithm for
learning optimal classification trees based on dynamic programming and search.
Our algorithm supports constraints on the depth of the tree and number of
nodes. The success of our approach is attributed to a series of specialised
techniques that exploit properties unique to classification trees. Whereas
algorithms for optimal classification trees have traditionally been plagued by
high runtimes and limited scalability, we show in a detailed experimental study
that our approach uses only a fraction of the time required by the
state-of-the-art and can handle datasets with tens of thousands of instances,
providing several orders of magnitude improvements and notably contributing
towards the practical realisation of optimal decision trees
With food to health : proceedings of 11th International symposium
Proceedings contains 13 original scientific papers, 10 professional papers and 2 review papers which were presented at "10th International Scientific and Professional Conference WITH FOOD TO HEALTH", organised in following sections: Nutrition, Dietetics and diet therapy, Functional food and food supplemnents, Food safety, Food analysis, Production of safe food and food with added nutritional value
SAT-Based approaches for the general high school timetabling problem
High School Timetabling (HSTT) is a well known and widespread problem. The problem consists of coordinating resources (e.g. teachers, rooms), times, and events (e.g. lectures) with respect to various constraints. Unfortunately, HSTT is hard to solve and just finding a feasible solution for simple variants of HSTT have been proven to be NP-complete. In this work, we consider the general HSTT problem, abbreviated as XHSTT. Despite significant research efforts for XHSTT and other timetabling problems, no \emph{silver bullet} algorithm has been found so far. Many problems have yet to be efficiently and/or optimally solved. The main goal of this thesis is to explore the relation between propositional logic and high school timetabling, as well as related approaches. We model the complex formalism of XHSTT using Boolean variables and basic logical connectives only. We evaluated different cardinality constraint encodings, solvers, and important special cases in order to significantly simplify the modeling in practice. We note that resource assignment constraints have been considered only for special cases, rather than in general. In addition, we investigated a maxSAT-based satisfiability modulo theories (SMT) approach. Another model we studied in this work is based on bitvectors. By using a series of bitvector operations (such as \emph{AND, OR}, and \emph{XOR}) on the set of event bitvectors, we were able to model all constraints, with the exception of resource assignment constraints. The bitvector models serves as an efficient data structure for local search algorithms such as hill climbing and simulated annealing. To integrate maxSAT into a hybrid algorithm, we combined local search with a large neighborhood search algorithm that exploits maxSAT. Furthermore, to the best of our knowledge, it is the first time maxSAT is used within a large neighborhood search scheme. We carried out thorough experimentation on important benchmark instances that can be found in the repository of the third international timetabling competition (ITC 2011) and compared with the state-of-the-art algorithms for XHSTT. Detailed experiments were performed in order to determine the most appropriate maxSAT solvers and cardinality constraint encodings, evaluate our SMT approach, and compare with integer programming and the ITC 2011 results. Computational results demonstrate that we outperform the integer programming approach on numerous benchmarks. We are able to obtain even better results by combining several maxSAT solvers. When compared to the leading KHE engine for XHSTT, the bitvector modeling approach provided significant improvements for local search algorithms such as hill climbing and simulated annealing. Lastly, our large neighborhood search algorithm excelled in situations when limited computational time is allocated, being able to obtain better results than the state-of-the-art solvers and the pure maxSAT approach in many benchmarks.12
Modeling high school timetabling with bitvectors
High school timetabling (HSTT) is a well known and wide spread problem. The problem consists of coordinating resources (e.g. teachers, rooms), times, and events (e.g. lectures) with respect to various constraints. Unfortunately, HSTT is hard to solve and just finding a feasible solution for simple variants of HSTT has been proven to be NP-complete. We propose a new modeling approach for HSTT using bitvectors in which constraint costs of the general HSTT can be calculated using bit operations. This model allows efficient computation of constraint costs making it useful when implementing HSTT algorithms. Additionally, it can be used to solve HSTT with satisfiability modulo theory (SMT) solvers that support bitvectors. We evaluate the performance for our bitvector modeling approach and compare it to the leading engine KHE when developing local search algorithms such as hill climbing and simulated annealing. The experimental results show that our approach is useful for this problem. Furthermore, experimental results using SMT are given on instances from the ITC 2011 benchmark repository.Austrian Science Fund (FWF
Blossom: an Anytime Algorithm for Computing Optimal Decision Trees
International audienceWe propose a simple algorithm to learn optimal decision trees of bounded depth. This algorithm is essentially an anytime version of the state-ofthe-art dynamic programming approach. It has virtually no overhead compared to heuristic methods and is comparable to the best exact methods to prove optimality on most data sets. Experiments show that whereas existing exact methods hardly scale to deep trees, this algorithm learns trees comparable to standard heuristics without computational overhead, and can significantly improve their accuracy when given more computation time, even for deep trees