46,478 research outputs found
Multi-Task Kernel Null-Space for One-Class Classification
The one-class kernel spectral regression (OC-KSR), the regression-based
formulation of the kernel null-space approach has been found to be an effective
Fisher criterion-based methodology for one-class classification (OCC),
achieving state-of-the-art performance in one-class classification while
providing relatively high robustness against data corruption. This work extends
the OC-KSR methodology to a multi-task setting where multiple one-class
problems share information for improved performance. By viewing the multi-task
structure learning problem as one of compositional function learning, first,
the OC-KSR method is extended to learn multiple tasks' structure
\textit{linearly} by posing it as an instantiation of the separable kernel
learning problem in a vector-valued reproducing kernel Hilbert space where an
output kernel encodes tasks' structure while another kernel captures input
similarities. Next, a non-linear structure learning mechanism is proposed which
captures multiple tasks' relationships \textit{non-linearly} via an output
kernel. The non-linear structure learning method is then extended to a sparse
setting where different tasks compete in an output composition mechanism,
leading to a sparse non-linear structure among multiple problems. Through
extensive experiments on different data sets, the merits of the proposed
multi-task kernel null-space techniques are verified against the baseline as
well as other existing multi-task one-class learning techniques
A Benchmark to Select Data Mining Based Classification Algorithms For Business Intelligence And Decision Support Systems
DSS serve the management, operations, and planning levels of an organization
and help to make decisions, which may be rapidly changing and not easily
specified in advance. Data mining has a vital role to extract important
information to help in decision making of a decision support system.
Integration of data mining and decision support systems (DSS) can lead to the
improved performance and can enable the tackling of new types of problems.
Artificial Intelligence methods are improving the quality of decision support,
and have become embedded in many applications ranges from ant locking
automobile brakes to these days interactive search engines. It provides various
machine learning techniques to support data mining. The classification is one
of the main and valuable tasks of data mining. Several types of classification
algorithms have been suggested, tested and compared to determine the future
trends based on unseen data. There has been no single algorithm found to be
superior over all others for all data sets. The objective of this paper is to
compare various classification algorithms that have been frequently used in
data mining for decision support systems. Three decision trees based
algorithms, one artificial neural network, one statistical, one support vector
machines with and without ada boost and one clustering algorithm are tested and
compared on four data sets from different domains in terms of predictive
accuracy, error rate, classification index, comprehensibility and training
time. Experimental results demonstrate that Genetic Algorithm (GA) and support
vector machines based algorithms are better in terms of predictive accuracy.
SVM without adaboost shall be the first choice in context of speed and
predictive accuracy. Adaboost improves the accuracy of SVM but on the cost of
large training time.Comment: 18 Pages, 11 Figures, 6 Tables, Journa
High Dimensional Linear Regression using Lattice Basis Reduction
We consider a high dimensional linear regression problem where the goal is to
efficiently recover an unknown vector from noisy linear
observations , for known and unknown . Unlike most of the literature on
this model we make no sparsity assumption on . Instead we adopt a
regularization based on assuming that the underlying vectors have
rational entries with the same denominator . We call
this -rationality assumption.
We propose a new polynomial-time algorithm for this task which is based on
the seminal Lenstra-Lenstra-Lovasz (LLL) lattice basis reduction algorithm. We
establish that under the -rationality assumption, our algorithm recovers
exactly the vector for a large class of distributions for the iid
entries of and non-zero noise . We prove that it is successful under
small noise, even when the learner has access to only one observation ().
Furthermore, we prove that in the case of the Gaussian white noise for ,
and sufficiently large, our algorithm tolerates
a nearly optimal information-theoretic level of the noise
Online Hyperparameter-Free Sparse Estimation Method
In this paper we derive an online estimator for sparse parameter vectors
which, unlike the LASSO approach, does not require the tuning of any
hyperparameters. The algorithm is based on a covariance matching approach and
is equivalent to a weighted version of the square-root LASSO. The computational
complexity of the estimator is of the same order as that of the online versions
of regularized least-squares (RLS) and LASSO. We provide a numerical comparison
with feasible and infeasible implementations of the LASSO and RLS to illustrate
the advantage of the proposed online hyperparameter-free estimator
Predicting the Future Behavior of a Time-Varying Probability Distribution
We study the problem of predicting the future, though only in the
probabilistic sense of estimating a future state of a time-varying probability
distribution. This is not only an interesting academic problem, but solving
this extrapolation problem also has many practical application, e.g. for
training classifiers that have to operate under time-varying conditions. Our
main contribution is a method for predicting the next step of the time-varying
distribution from a given sequence of sample sets from earlier time steps. For
this we rely on two recent machine learning techniques: embedding probability
distributions into a reproducing kernel Hilbert space, and learning operators
by vector-valued regression. We illustrate the working principles and the
practical usefulness of our method by experiments on synthetic and real data.
We also highlight an exemplary application: training a classifier in a domain
adaptation setting without having access to examples from the test time
distribution at training time
Kernels on Sample Sets via Nonparametric Divergence Estimates
Most machine learning algorithms, such as classification or regression, treat
the individual data point as the object of interest. Here we consider extending
machine learning algorithms to operate on groups of data points. We suggest
treating a group of data points as an i.i.d. sample set from an underlying
feature distribution for that group. Our approach employs kernel machines with
a kernel on i.i.d. sample sets of vectors. We define certain kernel functions
on pairs of distributions, and then use a nonparametric estimator to
consistently estimate those functions based on sample sets. The projection of
the estimated Gram matrix to the cone of symmetric positive semi-definite
matrices enables us to use kernel machines for classification, regression,
anomaly detection, and low-dimensional embedding in the space of distributions.
We present several numerical experiments both on real and simulated datasets to
demonstrate the advantages of our new approach.Comment: Substantially updated version as submitted to T-PAMI. 15 pages
including appendi
Distance-based analysis of variance: approximate inference and an application to genome-wide association studies
In several modern applications, ranging from genetics to genomics and
neuroimaging, there is a need to compare observations across different
populations, such as groups of healthy and diseased individuals. The interest
is in detecting a group effect. When the observations are vectorial,
real-valued and follow a multivariate Normal distribution, multivariate
analysis of variance (MANOVA) tests are routinely applied. However, such
traditional procedures are not suitable when dealing with more complex data
structures such as functional (e.g. curves) or graph-structured (e.g. trees and
networks) objects, where the required distributional assumptions may be
violated. In this paper we discuss a distance-based MANOVA-like approach, the
DBF test, for detecting differences between groups for a wider range of data
types. The test statistic, analogously to other distance-based statistics, only
relies on a suitably chosen distance measure that captures the pairwise
dissimilarity among all available samples. An approximate null probability
distribution of the DBF statistic is proposed thus allowing inferences to be
drawn without the need for costly permutation procedures. Through extensive
simulations we provide evidence that the proposed methodology works well for a
range of data types and distances, and generalizes the traditional MANOVA
tests. We also report on an application of the proposed methodology for the
analysis of a multi-locus genome-wide association study of Alzheimer's disease,
which has been carried out using several genetic distance measures
Vector-Valued Graph Trend Filtering with Non-Convex Penalties
This work studies the denoising of piecewise smooth graph signals that
exhibit inhomogeneous levels of smoothness over a graph, where the value at
each node can be vector-valued. We extend the graph trend filtering framework
to denoising vector-valued graph signals with a family of non-convex
regularizers, which exhibit superior recovery performance over existing convex
regularizers. Using an oracle inequality, we establish the statistical error
rates of first-order stationary points of the proposed non-convex method for
generic graphs. Furthermore, we present an ADMM-based algorithm to solve the
proposed method and establish its convergence. Numerical experiments are
conducted on both synthetic and real-world data for denoising, support
recovery, event detection, and semi-supervised classification.Comment: The first two authors contributed equall
A Unified SVM Framework for Signal Estimation
This paper presents a unified framework to tackle estimation problems in
Digital Signal Processing (DSP) using Support Vector Machines (SVMs). The use
of SVMs in estimation problems has been traditionally limited to its mere use
as a black-box model. Noting such limitations in the literature, we take
advantage of several properties of Mercer's kernels and functional analysis to
develop a family of SVM methods for estimation in DSP. Three types of signal
model equations are analyzed. First, when a specific time-signal structure is
assumed to model the underlying system that generated the data, the linear
signal model (so called Primal Signal Model formulation) is first stated and
analyzed. Then, non-linear versions of the signal structure can be readily
developed by following two different approaches. On the one hand, the signal
model equation is written in reproducing kernel Hilbert spaces (RKHS) using the
well-known RKHS Signal Model formulation, and Mercer's kernels are readily used
in SVM non-linear algorithms. On the other hand, in the alternative and not so
common Dual Signal Model formulation, a signal expansion is made by using an
auxiliary signal model equation given by a non-linear regression of each time
instant in the observed time series. These building blocks can be used to
generate different novel SVM-based methods for problems of signal estimation,
and we deal with several of the most important ones in DSP. We illustrate the
usefulness of this methodology by defining SVM algorithms for linear and
non-linear system identification, spectral analysis, nonuniform interpolation,
sparse deconvolution, and array processing. The performance of the developed
SVM methods is compared to standard approaches in all these settings. The
experimental results illustrate the generality, simplicity, and capabilities of
the proposed SVM framework for DSP.Comment: 22 pages, 13 figures. Digital Signal Processing, 201
Multi-view Vector-valued Manifold Regularization for Multi-label Image Classification
In computer vision, image datasets used for classification are naturally
associated with multiple labels and comprised of multiple views, because each
image may contain several objects (e.g. pedestrian, bicycle and tree) and is
properly characterized by multiple visual features (e.g. color, texture and
shape). Currently available tools ignore either the label relationship or the
view complementary. Motivated by the success of the vector-valued function that
constructs matrix-valued kernels to explore the multi-label structure in the
output space, we introduce multi-view vector-valued manifold regularization
(MVMR) to integrate multiple features. MVMR exploits
the complementary property of different features and discovers the intrinsic
local geometry of the compact support shared by different features under the
theme of manifold regularization. We conducted extensive experiments on two
challenging, but popular datasets, PASCAL VOC' 07 (VOC) and MIR Flickr (MIR),
and validated the effectiveness of the proposed MVMR for image
classification
- âŠ