28,927 research outputs found
Binarized support vector machines
The widely used Support Vector Machine (SVM) method has shown to yield very good results in
Supervised Classification problems. Other methods such as Classification Trees have become
more popular among practitioners than SVM thanks to their interpretability, which is an important
issue in Data Mining.
In this work, we propose an SVM-based method that automatically detects the most important
predictor variables, and the role they play in the classifier. In particular, the proposed method is
able to detect those values and intervals which are critical for the classification. The method
involves the optimization of a Linear Programming problem, with a large number of decision
variables. The numerical experience reported shows that a rather direct use of the standard
Column-Generation strategy leads to a classification method which, in terms of classification
ability, is competitive against the standard linear SVM and Classification Trees. Moreover, the
proposed method is robust, i.e., it is stable in the presence of outliers and invariant to change of
scale or measurement units of the predictor variables.
When the complexity of the classifier is an important issue, a wrapper feature selection method is
applied, yielding simpler, still competitive, classifiers
Robust risk aggregation with neural networks
We consider settings in which the distribution of a multivariate random
variable is partly ambiguous. We assume the ambiguity lies on the level of the
dependence structure, and that the marginal distributions are known.
Furthermore, a current best guess for the distribution, called reference
measure, is available. We work with the set of distributions that are both
close to the given reference measure in a transportation distance (e.g. the
Wasserstein distance), and additionally have the correct marginal structure.
The goal is to find upper and lower bounds for integrals of interest with
respect to distributions in this set. The described problem appears naturally
in the context of risk aggregation. When aggregating different risks, the
marginal distributions of these risks are known and the task is to quantify
their joint effect on a given system. This is typically done by applying a
meaningful risk measure to the sum of the individual risks. For this purpose,
the stochastic interdependencies between the risks need to be specified. In
practice the models of this dependence structure are however subject to
relatively high model ambiguity. The contribution of this paper is twofold:
Firstly, we derive a dual representation of the considered problem and prove
that strong duality holds. Secondly, we propose a generally applicable and
computationally feasible method, which relies on neural networks, in order to
numerically solve the derived dual problem. The latter method is tested on a
number of toy examples, before it is finally applied to perform robust risk
aggregation in a real world instance.Comment: Revised version. Accepted for publication in "Mathematical Finance
Parameter Selection and Pre-Conditioning for a Graph Form Solver
In a recent paper, Parikh and Boyd describe a method for solving a convex
optimization problem, where each iteration involves evaluating a proximal
operator and projection onto a subspace. In this paper we address the critical
practical issues of how to select the proximal parameter in each iteration, and
how to scale the original problem variables, so as the achieve reliable
practical performance. The resulting method has been implemented as an
open-source software package called POGS (Proximal Graph Solver), that targets
multi-core and GPU-based systems, and has been tested on a wide variety of
practical problems. Numerical results show that POGS can solve very large
problems (with, say, more than a billion coefficients in the data), to modest
accuracy in a few tens of seconds. As just one example, a radiation treatment
planning problem with around 100 million coefficients in the data can be solved
in a few seconds, as compared to around one hour with an interior-point method.Comment: 28 pages, 1 figure, 1 open source implementatio
Binarized support vector machines
The widely used Support Vector Machine (SVM) method has shown to yield very good results in Supervised Classification problems. Other methods such as Classification Trees have become more popular among practitioners than SVM thanks to their interpretability, which is an important issue in Data Mining. In this work, we propose an SVM-based method that automatically detects the most important predictor variables, and the role they play in the classifier. In particular, the proposed method is able to detect those values and intervals which are critical for the classification. The method involves the optimization of a Linear Programming problem, with a large number of decision variables. The numerical experience reported shows that a rather direct use of the standard Column-Generation strategy leads to a classification method which, in terms of classification ability, is competitive against the standard linear SVM and Classification Trees. Moreover, the proposed method is robust, i.e., it is stable in the presence of outliers and invariant to change of scale or measurement units of the predictor variables. When the complexity of the classifier is an important issue, a wrapper feature selection method is applied, yielding simpler, still competitive, classifiers.Supervised classification, Binarization, Column generation, Support vector machines
Efficient intrusion detection scheme based on SVM
The network intrusion detection problem is the focus of current academic research. In this paper, we propose to use Support Vector Machine (SVM) model to identify and detect the network intrusion problem, and simultaneously introduce a new optimization search method, referred to as Improved Harmony Search (IHS) algorithm, to determine the parameters of the SVM model for better classification accuracy. Taking the general mechanism network system of a growing city in China between 2006 and 2012 as the sample, this study divides the mechanism into normal network system and crisis network system according to the harm extent of network intrusion. We consider a crisis network system coupled with two to three normal network systems as paired samples. Experimental results show that SVMs based on IHS have a high prediction accuracy which can perform prediction and classification of network intrusion detection and assist in guarding against network intrusion
Semistochastic Quadratic Bound Methods
Partition functions arise in a variety of settings, including conditional
random fields, logistic regression, and latent gaussian models. In this paper,
we consider semistochastic quadratic bound (SQB) methods for maximum likelihood
inference based on partition function optimization. Batch methods based on the
quadratic bound were recently proposed for this class of problems, and
performed favorably in comparison to state-of-the-art techniques.
Semistochastic methods fall in between batch algorithms, which use all the
data, and stochastic gradient type methods, which use small random selections
at each iteration. We build semistochastic quadratic bound-based methods, and
prove both global convergence (to a stationary point) under very weak
assumptions, and linear convergence rate under stronger assumptions on the
objective. To make the proposed methods faster and more stable, we consider
inexact subproblem minimization and batch-size selection schemes. The efficacy
of SQB methods is demonstrated via comparison with several state-of-the-art
techniques on commonly used datasets.Comment: 11 pages, 1 figur
- …