11,825 research outputs found
Optimistic Robust Optimization With Applications To Machine Learning
Robust Optimization has traditionally taken a pessimistic, or worst-case
viewpoint of uncertainty which is motivated by a desire to find sets of optimal
policies that maintain feasibility under a variety of operating conditions. In
this paper, we explore an optimistic, or best-case view of uncertainty and show
that it can be a fruitful approach. We show that these techniques can be used
to address a wide variety of problems. First, we apply our methods in the
context of robust linear programming, providing a method for reducing
conservatism in intuitive ways that encode economically realistic modeling
assumptions. Second, we look at problems in machine learning and find that this
approach is strongly connected to the existing literature. Specifically, we
provide a new interpretation for popular sparsity inducing non-convex
regularization schemes. Additionally, we show that successful approaches for
dealing with outliers and noise can be interpreted as optimistic robust
optimization problems. Although many of the problems resulting from our
approach are non-convex, we find that DCA or DCA-like optimization approaches
can be intuitive and efficient
Binarized support vector machines
The widely used Support Vector Machine (SVM) method has shown to yield very good results in
Supervised Classification problems. Other methods such as Classification Trees have become
more popular among practitioners than SVM thanks to their interpretability, which is an important
issue in Data Mining.
In this work, we propose an SVM-based method that automatically detects the most important
predictor variables, and the role they play in the classifier. In particular, the proposed method is
able to detect those values and intervals which are critical for the classification. The method
involves the optimization of a Linear Programming problem, with a large number of decision
variables. The numerical experience reported shows that a rather direct use of the standard
Column-Generation strategy leads to a classification method which, in terms of classification
ability, is competitive against the standard linear SVM and Classification Trees. Moreover, the
proposed method is robust, i.e., it is stable in the presence of outliers and invariant to change of
scale or measurement units of the predictor variables.
When the complexity of the classifier is an important issue, a wrapper feature selection method is
applied, yielding simpler, still competitive, classifiers
Learning Dynamic Feature Selection for Fast Sequential Prediction
We present paired learning and inference algorithms for significantly
reducing computation and increasing speed of the vector dot products in the
classifiers that are at the heart of many NLP components. This is accomplished
by partitioning the features into a sequence of templates which are ordered
such that high confidence can often be reached using only a small fraction of
all features. Parameter estimation is arranged to maximize accuracy and early
confidence in this sequence. Our approach is simpler and better suited to NLP
than other related cascade methods. We present experiments in left-to-right
part-of-speech tagging, named entity recognition, and transition-based
dependency parsing. On the typical benchmarking datasets we can preserve POS
tagging accuracy above 97% and parsing LAS above 88.5% both with over a
five-fold reduction in run-time, and NER F1 above 88 with more than 2x increase
in speed.Comment: Appears in The 53rd Annual Meeting of the Association for
Computational Linguistics, Beijing, China, July 201
Discussion: The Dantzig selector: Statistical estimation when is much larger than
Discussion of ``The Dantzig selector: Statistical estimation when is much
larger than '' [math/0506081]Comment: Published in at http://dx.doi.org/10.1214/009053607000000442 the
Annals of Statistics (http://www.imstat.org/aos/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Optimization with Sparsity-Inducing Penalties
Sparse estimation methods are aimed at using or obtaining parsimonious
representations of data or models. They were first dedicated to linear variable
selection but numerous extensions have now emerged such as structured sparsity
or kernel selection. It turns out that many of the related estimation problems
can be cast as convex optimization problems by regularizing the empirical risk
with appropriate non-smooth norms. The goal of this paper is to present from a
general perspective optimization tools and techniques dedicated to such
sparsity-inducing penalties. We cover proximal methods, block-coordinate
descent, reweighted -penalized techniques, working-set and homotopy
methods, as well as non-convex formulations and extensions, and provide an
extensive set of experiments to compare various algorithms from a computational
point of view
- …