38 research outputs found
Laplacian Support Vector Machines Trained in the Primal
In the last few years, due to the growing ubiquity of unlabeled data, much
effort has been spent by the machine learning community to develop better
understanding and improve the quality of classifiers exploiting unlabeled data.
Following the manifold regularization approach, Laplacian Support Vector
Machines (LapSVMs) have shown the state of the art performance in
semi--supervised classification. In this paper we present two strategies to
solve the primal LapSVM problem, in order to overcome some issues of the
original dual formulation. Whereas training a LapSVM in the dual requires two
steps, using the primal form allows us to collapse training to a single step.
Moreover, the computational complexity of the training algorithm is reduced
from O(n^3) to O(n^2) using preconditioned conjugate gradient, where n is the
combined number of labeled and unlabeled examples. We speed up training by
using an early stopping strategy based on the prediction on unlabeled data or,
if available, on labeled validation examples. This allows the algorithm to
quickly compute approximate solutions with roughly the same classification
accuracy as the optimal ones, considerably reducing the training time. Due to
its simplicity, training LapSVM in the primal can be the starting point for
additional enhancements of the original LapSVM formulation, such as those for
dealing with large datasets. We present an extensive experimental evaluation on
real world data showing the benefits of the proposed approach.Comment: 39 pages, 14 figure
Comparison of SVM Optimization Techniques in the Primal
This paper examines the efficacy of different optimization techniques in a
primal formulation of a support vector machine (SVM). Three main techniques are
compared. The dataset used to compare all three techniques was the Sentiment
Analysis on Movie Reviews dataset, from kaggle.com
Representing data by sparse combination of contextual data points for classification
In this paper, we study the problem of using contextual da- ta points of a
data point for its classification problem. We propose to represent a data point
as the sparse linear reconstruction of its context, and learn the sparse
context to gather with a linear classifier in a su- pervised way to increase
its discriminative ability. We proposed a novel formulation for context
learning, by modeling the learning of context reconstruction coefficients and
classifier in a unified objective. In this objective, the reconstruction error
is minimized and the coefficient spar- sity is encouraged. Moreover, the hinge
loss of the classifier is minimized and the complexity of the classifier is
reduced. This objective is opti- mized by an alternative strategy in an
iterative algorithm. Experiments on three benchmark data set show its advantage
over state-of-the-art context-based data representation and classification
methods
Distributed Majorization-Minimization for Laplacian Regularized Problems
We consider the problem of minimizing a block separable convex function
(possibly nondifferentiable, and including constraints) plus Laplacian
regularization, a problem that arises in applications including model fitting,
regularizing stratified models, and multi-period portfolio optimization. We
develop a distributed majorization-minimization method for this general
problem, and derive a complete, self-contained, general, and simple proof of
convergence. Our method is able to scale to very large problems, and we
illustrate our approach on two applications, demonstrating its scalability and
accuracy.Comment: 18 pages, 3 figure
Supervised learning of sparse context reconstruction coefficients for data representation and classification
Context of data points, which is usually defined as the other data points in
a data set, has been found to play important roles in data representation and
classification. In this paper, we study the problem of using context of a data
point for its classification problem. Our work is inspired by the observation
that actually only very few data points are critical in the context of a data
point for its representation and classification. We propose to represent a data
point as the sparse linear combination of its context, and learn the sparse
context in a supervised way to increase its discriminative ability. To this
end, we proposed a novel formulation for context learning, by modeling the
learning of context parameter and classifier in a unified objective, and
optimizing it with an alternative strategy in an iterative algorithm.
Experiments on three benchmark data set show its advantage over
state-of-the-art context-based data representation and classification methods.Comment: arXiv admin note: substantial text overlap with arXiv:1507.0001
Semi-Supervised Representation Learning based on Probabilistic Labeling
In this paper, we present a new algorithm for semi-supervised representation
learning. In this algorithm, we first find a vector representation for the
labels of the data points based on their local positions in the space. Then, we
map the data to lower-dimensional space using a linear transformation such that
the dependency between the transformed data and the assigned labels is
maximized. In fact, we try to find a mapping that is as discriminative as
possible. The approach will use Hilber-Schmidt Independence Criterion (HSIC) as
the dependence measure. We also present a kernelized version of the algorithm,
which allows non-linear transformations and provides more flexibility in
finding the appropriate mapping. Use of unlabeled data for learning new
representation is not always beneficial and there is no algorithm that can
deterministically guarantee the improvement of the performance by exploiting
unlabeled data. Therefore, we also propose a bound on the performance of the
algorithm, which can be used to determine the effectiveness of using the
unlabeled data in the algorithm. We demonstrate the ability of the algorithm in
finding the transformation using both toy examples and real-world datasets.Comment: 8 pages, 7 figure
Data-dependent kernels in nearly-linear time
We propose a method to efficiently construct data-dependent kernels which can
make use of large quantities of (unlabeled) data. Our construction makes an
approximation in the standard construction of semi-supervised kernels in
Sindhwani et al. 2005. In typical cases these kernels can be computed in
nearly-linear time (in the amount of data), improving on the cubic time of the
standard construction, enabling large scale semi-supervised learning in a
variety of contexts. The methods are validated on semi-supervised and
unsupervised problems on data sets containing upto 64,000 sample points
Scaling Active Search using Linear Similarity Functions
Active Search has become an increasingly useful tool in information retrieval
problems where the goal is to discover as many target elements as possible
using only limited label queries. With the advent of big data, there is a
growing emphasis on the scalability of such techniques to handle very large and
very complex datasets.
In this paper, we consider the problem of Active Search where we are given a
similarity function between data points. We look at an algorithm introduced by
Wang et al. [2013] for Active Search over graphs and propose crucial
modifications which allow it to scale significantly. Their approach selects
points by minimizing an energy function over the graph induced by the
similarity function on the data. Our modifications require the similarity
function to be a dot-product between feature vectors of data points, equivalent
to having a linear kernel for the adjacency matrix. With this, we are able to
scale tremendously: for data points, the original algorithm runs in
time per iteration while ours runs in only given
-dimensional features.
We also describe a simple alternate approach using a weighted-neighbor
predictor which also scales well. In our experiments, we show that our method
is competitive with existing semi-supervised approaches. We also briefly
discuss conditions under which our algorithm performs well.Comment: To be published as conference paper at IJCAI 2017, 11 pages, 2
figure
A Laplacian-Based Approach for Finding Near Globally Optimal Solutions to OPF Problems
A semidefinite programming (SDP) relaxation globally solves many optimal
power flow (OPF) problems. For other OPF problems where the SDP relaxation only
provides a lower bound on the objective value rather than the globally optimal
decision variables, recent literature has proposed a penalization approach to
find feasible points that are often nearly globally optimal. A disadvantage of
this penalization approach is the need to specify penalty parameters. This
paper presents an alternative approach that algorithmically determines a
penalization appropriate for many OPF problems. The proposed approach
constrains the generation cost to be close to the lower bound from the SDP
relaxation. The objective function is specified using iteratively determined
weights for a Laplacian matrix. This approach yields feasible points to the OPF
problem that are guaranteed to have objective values near the global optimum
due to the constraint on generation cost. The proposed approach is demonstrated
on both small OPF problems and a variety of large test cases representing
portions of European power systems.Comment: 11 pages, 4 figure
Multi-Label Learning with Global and Local Label Correlation
It is well-known that exploiting label correlations is important to
multi-label learning. Existing approaches either assume that the label
correlations are global and shared by all instances; or that the label
correlations are local and shared only by a data subset. In fact, in the
real-world applications, both cases may occur that some label correlations are
globally applicable and some are shared only in a local group of instances.
Moreover, it is also a usual case that only partial labels are observed, which
makes the exploitation of the label correlations much more difficult. That is,
it is hard to estimate the label correlations when many labels are absent. In
this paper, we propose a new multi-label approach GLOCAL dealing with both the
full-label and the missing-label cases, exploiting global and local label
correlations simultaneously, through learning a latent label representation and
optimizing label manifolds. The extensive experimental studies validate the
effectiveness of our approach on both full-label and missing-label data