59,657 research outputs found
Proposing a Localized Relevance Vector Machine for Pattern Classification
Relevance vector machine (RVM) can be seen as a probabilistic version of
support vector machines which is able to produce sparse solutions by linearly
weighting a small number of basis functions instead using all of them.
Regardless of a few merits of RVM such as giving probabilistic predictions and
relax of parameter tuning, it has poor prediction for test instances that are
far away from the relevance vectors. As a solution, we propose a new
combination of RVM and k-nearest neighbor (k-NN) rule which resolves this issue
with regionally dealing with every test instance. In our settings, we obtain
the relevance vectors for each test instance in the local area given by k-NN
rule. In this way, relevance vectors are closer and more relevant to the test
instance which results in a more accurate model. This can be seen as a
piece-wise learner which locally classifies test instances. The model is hence
called localized relevance vector machine (LRVM). The LRVM is examined on
several datasets of the University of California, Irvine (UCI) repository.
Results supported by statistical tests indicate that the performance of LRVM is
competitive as compared with a few state-of-the-art classifiers
A Survey on Multi-View Clustering
With advances in information acquisition technologies, multi-view data become
ubiquitous. Multi-view learning has thus become more and more popular in
machine learning and data mining fields. Multi-view unsupervised or
semi-supervised learning, such as co-training, co-regularization has gained
considerable attention. Although recently, multi-view clustering (MVC) methods
have been developed rapidly, there has not been a survey to summarize and
analyze the current progress. Therefore, this paper reviews the common
strategies for combining multiple views of data and based on this summary we
propose a novel taxonomy of the MVC approaches. We further discuss the
relationships between MVC and multi-view representation, ensemble clustering,
multi-task clustering, multi-view supervised and semi-supervised learning.
Several representative real-world applications are elaborated. To promote
future development of MVC, we envision several open problems that may require
further investigation and thorough examination.Comment: 17 pages, 4 figure
Efficient Pairwise Learning Using Kernel Ridge Regression: an Exact Two-Step Method
Pairwise learning or dyadic prediction concerns the prediction of properties
for pairs of objects. It can be seen as an umbrella covering various machine
learning problems such as matrix completion, collaborative filtering,
multi-task learning, transfer learning, network prediction and zero-shot
learning. In this work we analyze kernel-based methods for pairwise learning,
with a particular focus on a recently-suggested two-step method. We show that
this method offers an appealing alternative for commonly-applied
Kronecker-based methods that model dyads by means of pairwise feature
representations and pairwise kernels. In a series of theoretical results, we
establish correspondences between the two types of methods in terms of linear
algebra and spectral filtering, and we analyze their statistical consistency.
In addition, the two-step method allows us to establish novel algorithmic
shortcuts for efficient training and validation on very large datasets. Putting
those properties together, we believe that this simple, yet powerful method can
become a standard tool for many problems. Extensive experimental results for a
range of practical settings are reported
Multiple Kernel Learning and Automatic Subspace Relevance Determination for High-dimensional Neuroimaging Data
Alzheimer's disease is a major cause of dementia. Its diagnosis requires
accurate biomarkers that are sensitive to disease stages. In this respect, we
regard probabilistic classification as a method of designing a probabilistic
biomarker for disease staging. Probabilistic biomarkers naturally support the
interpretation of decisions and evaluation of uncertainty associated with them.
In this paper, we obtain probabilistic biomarkers via Gaussian Processes.
Gaussian Processes enable probabilistic kernel machines that offer flexible
means to accomplish Multiple Kernel Learning. Exploiting this flexibility, we
propose a new variation of Automatic Relevance Determination and tackle the
challenges of high dimensionality through multiple kernels. Our research
results demonstrate that the Gaussian Process models are competitive with or
better than the well-known Support Vector Machine in terms of classification
performance even in the cases of single kernel learning. Extending the basic
scheme towards the Multiple Kernel Learning, we improve the efficacy of the
Gaussian Process models and their interpretability in terms of the known
anatomical correlates of the disease. For instance, the disease pathology
starts in and around the hippocampus and entorhinal cortex. Through the use of
Gaussian Processes and Multiple Kernel Learning, we have automatically and
efficiently determined those portions of neuroimaging data. In addition to
their interpretability, our Gaussian Process models are competitive with recent
deep learning solutions under similar settings.Comment: The material presented here is to promote the dissemination of
scholarly and technical work in a timely fashion. Data in this article are
from ADNI (adni.loni.usc.edu). As such, ADNI provided data but did not
participate in writing of this repor
ELKI: A large open-source library for data analysis - ELKI Release 0.7.5 "Heidelberg"
This paper documents the release of the ELKI data mining framework, version
0.7.5.
ELKI is an open source (AGPLv3) data mining software written in Java. The
focus of ELKI is research in algorithms, with an emphasis on unsupervised
methods in cluster analysis and outlier detection. In order to achieve high
performance and scalability, ELKI offers data index structures such as the
R*-tree that can provide major performance gains. ELKI is designed to be easy
to extend for researchers and students in this domain, and welcomes
contributions of additional methods. ELKI aims at providing a large collection
of highly parameterizable algorithms, in order to allow easy and fair
evaluation and benchmarking of algorithms.
We will first outline the motivation for this release, the plans for the
future, and then give a brief overview over the new functionality in this
version. We also include an appendix presenting an overview on the overall
implemented functionality
A Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its
aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks. In this paper, we
give a survey for MTL. First, we classify different MTL algorithms into several
categories, including feature learning approach, low-rank approach, task
clustering approach, task relation learning approach, and decomposition
approach, and then discuss the characteristics of each approach. In order to
improve the performance of learning tasks further, MTL can be combined with
other learning paradigms including semi-supervised learning, active learning,
unsupervised learning, reinforcement learning, multi-view learning and
graphical models. When the number of tasks is large or the data dimensionality
is high, batch MTL models are difficult to handle this situation and online,
parallel and distributed MTL models as well as dimensionality reduction and
feature hashing are reviewed to reveal their computational and storage
advantages. Many real-world applications use MTL to boost their performance and
we review representative works. Finally, we present theoretical analyses and
discuss several future directions for MTL
A review of heterogeneous data mining for brain disorders
With rapid advances in neuroimaging techniques, the research on brain
disorder identification has become an emerging area in the data mining
community. Brain disorder data poses many unique challenges for data mining
research. For example, the raw data generated by neuroimaging experiments is in
tensor representations, with typical characteristics of high dimensionality,
structural complexity and nonlinear separability. Furthermore, brain
connectivity networks can be constructed from the tensor data, embedding subtle
interactions between brain regions. Other clinical measures are usually
available reflecting the disease status from different perspectives. It is
expected that integrating complementary information in the tensor data and the
brain network data, and incorporating other clinical parameters will be
potentially transformative for investigating disease mechanisms and for
informing therapeutic interventions. Many research efforts have been devoted to
this area. They have achieved great success in various applications, such as
tensor-based modeling, subgraph pattern mining, multi-view feature analysis. In
this paper, we review some recent data mining methods that are used for
analyzing brain disorders
Towards Ultrahigh Dimensional Feature Selection for Big Data
In this paper, we present a new adaptive feature scaling scheme for
ultrahigh-dimensional feature selection on Big Data. To solve this problem
effectively, we first reformulate it as a convex semi-infinite programming
(SIP) problem and then propose an efficient \emph{feature generating paradigm}.
In contrast with traditional gradient-based approaches that conduct
optimization on all input features, the proposed method iteratively activates a
group of features and solves a sequence of multiple kernel learning (MKL)
subproblems of much reduced scale. To further speed up the training, we propose
to solve the MKL subproblems in their primal forms through a modified
accelerated proximal gradient approach. Due to such an optimization scheme,
some efficient cache techniques are also developed. The feature generating
paradigm can guarantee that the solution converges globally under mild
conditions and achieve lower feature selection bias. Moreover, the proposed
method can tackle two challenging tasks in feature selection: 1) group-based
feature selection with complex structures and 2) nonlinear feature selection
with explicit feature mappings. Comprehensive experiments on a wide range of
synthetic and real-world datasets containing tens of million data points with
features demonstrate the competitive performance of the proposed
method over state-of-the-art feature selection methods in terms of
generalization performance and training efficiency.Comment: 61 page
Robust and Discriminative Labeling for Multi-label Active Learning Based on Maximum Correntropy Criterion
Multi-label learning draws great interests in many real world applications.
It is a highly costly task to assign many labels by the oracle for one
instance. Meanwhile, it is also hard to build a good model without diagnosing
discriminative labels. Can we reduce the label costs and improve the ability to
train a good model for multi-label learning simultaneously?
Active learning addresses the less training samples problem by querying the
most valuable samples to achieve a better performance with little costs. In
multi-label active learning, some researches have been done for querying the
relevant labels with less training samples or querying all labels without
diagnosing the discriminative information. They all cannot effectively handle
the outlier labels for the measurement of uncertainty. Since Maximum
Correntropy Criterion (MCC) provides a robust analysis for outliers in many
machine learning and data mining algorithms, in this paper, we derive a robust
multi-label active learning algorithm based on MCC by merging uncertainty and
representativeness, and propose an efficient alternating optimization method to
solve it. With MCC, our method can eliminate the influence of outlier labels
that are not discriminative to measure the uncertainty. To make further
improvement on the ability of information measurement, we merge uncertainty and
representativeness with the prediction labels of unknown data. It can not only
enhance the uncertainty but also improve the similarity measurement of
multi-label data with labels information. Experiments on benchmark multi-label
data sets have shown a superior performance than the state-of-the-art methods
Prototype selection for interpretable classification
Prototype methods seek a minimal subset of samples that can serve as a
distillation or condensed view of a data set. As the size of modern data sets
grows, being able to present a domain specialist with a short list of
"representative" samples chosen from the data set is of increasing
interpretative value. While much recent statistical research has been focused
on producing sparse-in-the-variables methods, this paper aims at achieving
sparsity in the samples. We discuss a method for selecting prototypes in the
classification setting (in which the samples fall into known discrete
categories). Our method of focus is derived from three basic properties that we
believe a good prototype set should satisfy. This intuition is translated into
a set cover optimization problem, which we solve approximately using standard
approaches. While prototype selection is usually viewed as purely a means
toward building an efficient classifier, in this paper we emphasize the
inherent value of having a set of prototypical elements. That said, by using
the nearest-neighbor rule on the set of prototypes, we can of course discuss
our method as a classifier as well.Comment: Published in at http://dx.doi.org/10.1214/11-AOAS495 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org). arXiv admin note: text
overlap with arXiv:0908.228
- …