13,335 research outputs found
A Survey on Multi-Task Learning
Multi-Task Learning (MTL) is a learning paradigm in machine learning and its
aim is to leverage useful information contained in multiple related tasks to
help improve the generalization performance of all the tasks. In this paper, we
give a survey for MTL. First, we classify different MTL algorithms into several
categories, including feature learning approach, low-rank approach, task
clustering approach, task relation learning approach, and decomposition
approach, and then discuss the characteristics of each approach. In order to
improve the performance of learning tasks further, MTL can be combined with
other learning paradigms including semi-supervised learning, active learning,
unsupervised learning, reinforcement learning, multi-view learning and
graphical models. When the number of tasks is large or the data dimensionality
is high, batch MTL models are difficult to handle this situation and online,
parallel and distributed MTL models as well as dimensionality reduction and
feature hashing are reviewed to reveal their computational and storage
advantages. Many real-world applications use MTL to boost their performance and
we review representative works. Finally, we present theoretical analyses and
discuss several future directions for MTL
Kernelized LRR on Grassmann Manifolds for Subspace Clustering
Low rank representation (LRR) has recently attracted great interest due to
its pleasing efficacy in exploring low-dimensional sub- space structures
embedded in data. One of its successful applications is subspace clustering, by
which data are clustered according to the subspaces they belong to. In this
paper, at a higher level, we intend to cluster subspaces into classes of
subspaces. This is naturally described as a clustering problem on Grassmann
manifold. The novelty of this paper is to generalize LRR on Euclidean space
onto an LRR model on Grassmann manifold in a uniform kernelized LRR framework.
The new method has many applications in data analysis in computer vision tasks.
The proposed models have been evaluated on a number of practical data analysis
applications. The experimental results show that the proposed models outperform
a number of state-of-the-art subspace clustering methods
Feature Concatenation Multi-view Subspace Clustering
Multi-view clustering aims to achieve more promising clustering results than
single-view clustering by exploring the multi-view information. Since statistic
properties of different views are diverse, even incompatible, few approaches
implement multi-view clustering based on the concatenated features directly.
However, feature concatenation is a natural way to combine multiple views. To
this end, this paper proposes a novel multi-view subspace clustering approach
dubbed Feature Concatenation Multi-view Subspace Clustering (FCMSC).
Specifically, by exploring the consensus information, multi-view data are
concatenated into a joint representation firstly, then, -norm is
integrated into the objective function to deal with the sample-specific and
cluster-specific corruptions of multiple views for benefiting the clustering
performance. Furthermore, by introducing graph Laplacians of multiple views, a
graph regularized FCMSC is also introduced to explore both the consensus
information and complementary information for clustering. It is noteworthy that
the obtained coefficient matrix is not derived by directly applying the
Low-Rank Representation (LRR) to the joint view representation simply. Finally,
an effective algorithm based on the Augmented Lagrangian Multiplier (ALM) is
designed to optimized the objective functions. Comprehensive experiments on six
real world datasets illustrate the superiority of the proposed methods over
several state-of-the-art approaches for multi-view clustering
Multi-View Spectral Clustering Tailored Tensor Low-Rank Representation
This paper explores the problem of multi-view spectral clustering (MVSC)
based on tensor low-rank modeling. Unlike the existing methods that all adopt
an off-the-shelf tensor low-rank norm without considering the special
characteristics of the tensor in MVSC, we design a novel structured tensor
low-rank norm tailored to MVSC. Specifically, we explicitly impose a symmetric
low-rank constraint and a structured sparse low-rank constraint on the frontal
and horizontal slices of the tensor to characterize the intra-view and
inter-view relationships, respectively. Moreover, the two constraints could be
jointly optimized to achieve mutual refinement. On the basis of the novel
tensor low-rank norm, we formulate MVSC as a convex low-rank tensor recovery
problem, which is then efficiently solved with an augmented Lagrange multiplier
based method iteratively. Extensive experimental results on five benchmark
datasets show that the proposed method outperforms state-of-the-art methods to
a significant extent. Impressively, our method is able to produce perfect
clustering. In addition, the parameters of our method can be easily tuned, and
the proposed model is robust to different datasets, demonstrating its potential
in practice
Kernelized Low Rank Representation on Grassmann Manifolds
Low rank representation (LRR) has recently attracted great interest due to
its pleasing efficacy in exploring low-dimensional subspace structures embedded
in data. One of its successful applications is subspace clustering which means
data are clustered according to the subspaces they belong to. In this paper, at
a higher level, we intend to cluster subspaces into classes of subspaces. This
is naturally described as a clustering problem on Grassmann manifold. The
novelty of this paper is to generalize LRR on Euclidean space onto an LRR model
on Grassmann manifold in a uniform kernelized framework. The new methods have
many applications in computer vision tasks. Several clustering experiments are
conducted on handwritten digit images, dynamic textures, human face clips and
traffic scene sequences. The experimental results show that the proposed
methods outperform a number of state-of-the-art subspace clustering methods.Comment: 13 page
A review of heterogeneous data mining for brain disorders
With rapid advances in neuroimaging techniques, the research on brain
disorder identification has become an emerging area in the data mining
community. Brain disorder data poses many unique challenges for data mining
research. For example, the raw data generated by neuroimaging experiments is in
tensor representations, with typical characteristics of high dimensionality,
structural complexity and nonlinear separability. Furthermore, brain
connectivity networks can be constructed from the tensor data, embedding subtle
interactions between brain regions. Other clinical measures are usually
available reflecting the disease status from different perspectives. It is
expected that integrating complementary information in the tensor data and the
brain network data, and incorporating other clinical parameters will be
potentially transformative for investigating disease mechanisms and for
informing therapeutic interventions. Many research efforts have been devoted to
this area. They have achieved great success in various applications, such as
tensor-based modeling, subgraph pattern mining, multi-view feature analysis. In
this paper, we review some recent data mining methods that are used for
analyzing brain disorders
Multi-view Low-rank Sparse Subspace Clustering
Most existing approaches address multi-view subspace clustering problem by
constructing the affinity matrix on each view separately and afterwards propose
how to extend spectral clustering algorithm to handle multi-view data. This
paper presents an approach to multi-view subspace clustering that learns a
joint subspace representation by constructing affinity matrix shared among all
views. Relying on the importance of both low-rank and sparsity constraints in
the construction of the affinity matrix, we introduce the objective that
balances between the agreement across different views, while at the same time
encourages sparsity and low-rankness of the solution. Related low-rank and
sparsity constrained optimization problem is for each view solved using the
alternating direction method of multipliers. Furthermore, we extend our
approach to cluster data drawn from nonlinear subspaces by solving the
corresponding problem in a reproducing kernel Hilbert space. The proposed
algorithm outperforms state-of-the-art multi-view subspace clustering
algorithms on one synthetic and four real-world datasets
Joint Embedding of Meta-Path and Meta-Graph for Heterogeneous Information Networks
Meta-graph is currently the most powerful tool for similarity search on
heterogeneous information networks,where a meta-graph is a composition of
meta-paths that captures the complex structural information. However, current
relevance computing based on meta-graph only considers the complex structural
information, but ignores its embedded meta-paths information. To address this
problem, we proposeMEta-GrAph-based network embedding models, called MEGA and
MEGA++, respectively. The MEGA model uses normalized relevance or similarity
measures that are derived from a meta-graph and its embedded meta-paths between
nodes simultaneously, and then leverages tensor decomposition method to perform
node embedding. The MEGA++ further facilitates the use of coupled tensor-matrix
decomposition method to obtain a joint embedding for nodes, which
simultaneously considers the hidden relations of all meta information of a
meta-graph.Extensive experiments on two real datasets demonstrate thatMEGA and
MEGA++ are more effective than state-of-the-art approaches.Comment: accepted by ICBK 1
Supervised Nonnegative Matrix Factorization to Predict ICU Mortality Risk
ICU mortality risk prediction is a tough yet important task. On one hand, due
to the complex temporal data collected, it is difficult to identify the
effective features and interpret them easily; on the other hand, good
prediction can help clinicians take timely actions to prevent the mortality.
These correspond to the interpretability and accuracy problems. Most existing
methods lack of the interpretability, but recently Subgraph Augmented
Nonnegative Matrix Factorization (SANMF) has been successfully applied to time
series data to provide a path to interpret the features well. Therefore, we
adopted this approach as the backbone to analyze the patient data. One
limitation of the raw SANMF method is its poor prediction ability due to its
unsupervised nature. To deal with this problem, we proposed a supervised SANMF
algorithm by integrating the logistic regression loss function into the NMF
framework and solved it with an alternating optimization procedure. We used the
simulation data to verify the effectiveness of this method, and then we applied
it to ICU mortality risk prediction and demonstrated its superiority over other
conventional supervised NMF methods.Comment: 7 Pages, 2 figure
Vectorial Dimension Reduction for Tensors Based on Bayesian Inference
Dimensionality reduction for high-order tensors is a challenging problem. In
conventional approaches, higher order tensors are `vectorized` via Tucker
decomposition to obtain lower order tensors. This will destroy the inherent
high-order structures or resulting in undesired tensors, respectively. This
paper introduces a probabilistic vectorial dimensionality reduction model for
tensorial data. The model represents a tensor by employing a linear combination
of same order basis tensors, thus it offers a mechanism to directly reduce a
tensor to a vector. Under this expression, the projection base of the model is
based on the tensor CandeComp/PARAFAC (CP) decomposition and the number of free
parameters in the model only grows linearly with the number of modes rather
than exponentially. A Bayesian inference has been established via the
variational EM approach. A criterion to set the parameters (factor number of CP
decomposition and the number of extracted features) is empirically given. The
model outperforms several existing PCA-based methods and CP decomposition on
several publicly available databases in terms of classification and clustering
accuracy.Comment: Submiting to TNNL
- …