34,753 research outputs found
Remote Sensing Image Classification with Large Scale Gaussian Processes
Current remote sensing image classification problems have to deal with an
unprecedented amount of heterogeneous and complex data sources. Upcoming
missions will soon provide large data streams that will make land cover/use
classification difficult. Machine learning classifiers can help at this, and
many methods are currently available. A popular kernel classifier is the
Gaussian process classifier (GPC), since it approaches the classification
problem with a solid probabilistic treatment, thus yielding confidence
intervals for the predictions as well as very competitive results to
state-of-the-art neural networks and support vector machines. However, its
computational cost is prohibitive for large scale applications, and constitutes
the main obstacle precluding wide adoption. This paper tackles this problem by
introducing two novel efficient methodologies for Gaussian Process (GP)
classification. We first include the standard random Fourier features
approximation into GPC, which largely decreases its computational cost and
permits large scale remote sensing image classification. In addition, we
propose a model which avoids randomly sampling a number of Fourier frequencies,
and alternatively learns the optimal ones within a variational Bayes approach.
The performance of the proposed methods is illustrated in complex problems of
cloud detection from multispectral imagery and infrared sounding data.
Excellent empirical results support the proposal in both computational cost and
accuracy.Comment: 11 pages, 6 figures, Accepted for publication in IEEE Transactions on
Geoscience and Remote Sensing; added the IEEE copyright statemen
Compact Nonlinear Maps and Circulant Extensions
Kernel approximation via nonlinear random feature maps is widely used in
speeding up kernel machines. There are two main challenges for the conventional
kernel approximation methods. First, before performing kernel approximation, a
good kernel has to be chosen. Picking a good kernel is a very challenging
problem in itself. Second, high-dimensional maps are often required in order to
achieve good performance. This leads to high computational cost in both
generating the nonlinear maps, and in the subsequent learning and prediction
process. In this work, we propose to optimize the nonlinear maps directly with
respect to the classification objective in a data-dependent fashion. The
proposed approach achieves kernel approximation and kernel learning in a joint
framework. This leads to much more compact maps without hurting the
performance. As a by-product, the same framework can also be used to achieve
more compact kernel maps to approximate a known kernel. We also introduce
Circulant Nonlinear Maps, which uses a circulant-structured projection matrix
to speed up the nonlinear maps for high-dimensional data
Approximate Stochastic Subgradient Estimation Training for Support Vector Machines
Subgradient algorithms for training support vector machines have been quite
successful for solving large-scale and online learning problems. However, they
have been restricted to linear kernels and strongly convex formulations. This
paper describes efficient subgradient approaches without such limitations. Our
approaches make use of randomized low-dimensional approximations to nonlinear
kernels, and minimization of a reduced primal formulation using an algorithm
based on robust stochastic approximation, which do not require strong
convexity. Experiments illustrate that our approaches produce solutions of
comparable prediction accuracy with the solutions acquired from existing SVM
solvers, but often in much shorter time. We also suggest efficient prediction
schemes that depend only on the dimension of kernel approximation, not on the
number of support vectors.Comment: An extended version of the ICPRAM 2012 pape
Probabilistic classifiers with low rank indefinite kernels
Indefinite similarity measures can be frequently found in bio-informatics by
means of alignment scores, but are also common in other fields like shape
measures in image retrieval. Lacking an underlying vector space, the data are
given as pairwise similarities only. The few algorithms available for such data
do not scale to larger datasets. Focusing on probabilistic batch classifiers,
the Indefinite Kernel Fisher Discriminant (iKFD) and the Probabilistic
Classification Vector Machine (PCVM) are both effective algorithms for this
type of data but, with cubic complexity. Here we propose an extension of iKFD
and PCVM such that linear runtime and memory complexity is achieved for low
rank indefinite kernels. Employing the Nystr\"om approximation for indefinite
kernels, we also propose a new almost parameter free approach to identify the
landmarks, restricted to a supervised learning problem. Evaluations at several
larger similarity data from various domains show that the proposed methods
provides similar generalization capabilities while being easier to parametrize
and substantially faster for large scale data
A literature survey of matrix methods for data science
Efficient numerical linear algebra is a core ingredient in many applications
across almost all scientific and industrial disciplines. With this survey we
want to illustrate that numerical linear algebra has played and is playing a
crucial role in enabling and improving data science computations with many new
developments being fueled by the availability of data and computing resources.
We highlight the role of various different factorizations and the power of
changing the representation of the data as well as discussing topics such as
randomized algorithms, functions of matrices, and high-dimensional problems. We
briefly touch upon the role of techniques from numerical linear algebra used
within deep learning
Parallel Support Vector Machines in Practice
In this paper, we evaluate the performance of various parallel optimization
methods for Kernel Support Vector Machines on multicore CPUs and GPUs. In
particular, we provide the first comparison of algorithms with explicit and
implicit parallelization. Most existing parallel implementations for multi-core
or GPU architectures are based on explicit parallelization of Sequential
Minimal Optimization (SMO)---the programmers identified parallelizable
components and hand-parallelized them, specifically tuned for a particular
architecture. We compare these approaches with each other and with implicitly
parallelized algorithms---where the algorithm is expressed such that most of
the work is done within few iterations with large dense linear algebra
operations. These can be computed with highly-optimized libraries, that are
carefully parallelized for a large variety of parallel platforms. We highlight
the advantages and disadvantages of both approaches and compare them on various
benchmark data sets. We find an approximate implicitly parallel algorithm which
is surprisingly efficient, permits a much simpler implementation, and leads to
unprecedented speedups in SVM training.Comment: 10 page
Learning Data-adaptive Nonparametric Kernels
Traditional kernels or their combinations are often not sufficiently flexible
to fit the data in complicated practical tasks. In this paper, we present a
Data-Adaptive Nonparametric Kernel (DANK) learning framework by imposing an
adaptive matrix on the kernel/Gram matrix in an entry-wise strategy. Since we
do not specify the formulation of the adaptive matrix, each entry in it can be
directly and flexibly learned from the data. Therefore, the solution space of
the learned kernel is largely expanded, which makes DANK flexible to adapt to
the data. Specifically, the proposed kernel learning framework can be
seamlessly embedded to support vector machines (SVM) and support vector
regression (SVR), which has the capability of enlarging the margin between
classes and reducing the model generalization error. Theoretically, we
demonstrate that the objective function of our devised model is
gradient-Lipschitz continuous. Thereby, the training process for kernel and
parameter learning in SVM/SVR can be efficiently optimized in a unified
framework. Further, to address the scalability issue in DANK, a
decomposition-based scalable approach is developed, of which the effectiveness
is demonstrated by both empirical studies and theoretical guarantees.
Experimentally, our method outperforms other representative kernel learning
based algorithms on various classification and regression benchmark datasets
Scalable Nonlinear AUC Maximization Methods
The area under the ROC curve (AUC) is a measure of interest in various
machine learning and data mining applications. It has been widely used to
evaluate classification performance on heavily imbalanced data. The kernelized
AUC maximization machines have established a superior generalization ability
compared to linear AUC machines because of their capability in modeling the
complex nonlinear structure underlying most real-world data. However, the high
training complexity renders the kernelized AUC machines infeasible for
large-scale data. In this paper, we present two nonlinear AUC maximization
algorithms that optimize pairwise linear classifiers over a finite-dimensional
feature space constructed via the k-means Nystr\"{o}m method. Our first
algorithm maximize the AUC metric by optimizing a pairwise squared hinge loss
function using the truncated Newton method. However, the second-order batch AUC
maximization method becomes expensive to optimize for extremely massive
datasets. This motivate us to develop a first-order stochastic AUC maximization
algorithm that incorporates a scheduled regularization update and scheduled
averaging techniques to accelerate the convergence of the classifier.
Experiments on several benchmark datasets demonstrate that the proposed AUC
classifiers are more efficient than kernelized AUC machines while they are able
to surpass or at least match the AUC performance of the kernelized AUC
machines. The experiments also show that the proposed stochastic AUC classifier
outperforms the state-of-the-art online AUC maximization methods in terms of
AUC classification accuracy
Efficient Approximation Algorithms for String Kernel Based Sequence Classification
Sequence classification algorithms, such as SVM, require a definition of
distance (similarity) measure between two sequences. A commonly used notion of
similarity is the number of matches between -mers (-length subsequences)
in the two sequences. Extending this definition, by considering two -mers to
match if their distance is at most , yields better classification
performance. This, however, makes the problem computationally much more
complex. Known algorithms to compute this similarity have computational
complexity that render them applicable only for small values of and . In
this work, we develop novel techniques to efficiently and accurately estimate
the pairwise similarity score, which enables us to use much larger values of
and , and get higher predictive accuracy. This opens up a broad avenue
of applying this classification approach to audio, images, and text sequences.
Our algorithm achieves excellent approximation performance with theoretical
guarantees. In the process we solve an open combinatorial problem, which was
posed as a major hindrance to the scalability of existing solutions. We give
analytical bounds on quality and runtime of our algorithm and report its
empirical performance on real world biological and music sequences datasets
The Extreme Value Machine
It is often desirable to be able to recognize when inputs to a recognition
function learned in a supervised manner correspond to classes unseen at
training time. With this ability, new class labels could be assigned to these
inputs by a human operator, allowing them to be incorporated into the
recognition function --- ideally under an efficient incremental update
mechanism. While good algorithms that assume inputs from a fixed set of classes
exist, e.g., artificial neural networks and kernel machines, it is not
immediately obvious how to extend them to perform incremental learning in the
presence of unknown query classes. Existing algorithms take little to no
distributional information into account when learning recognition functions and
lack a strong theoretical foundation. We address this gap by formulating a
novel, theoretically sound classifier --- the Extreme Value Machine (EVM). The
EVM has a well-grounded interpretation derived from statistical Extreme Value
Theory (EVT), and is the first classifier to be able to perform nonlinear
kernel-free variable bandwidth incremental learning. Compared to other
classifiers in the same deep network derived feature space, the EVM is accurate
and efficient on an established benchmark partition of the ImageNet dataset.Comment: Pre-print of a manuscript accepted to the IEEE Transactions on
Pattern Analysis and Machine Intelligence (T-PAMI) journa
- …