1,384 research outputs found
Multimodal sparse representation learning and applications
Unsupervised methods have proven effective for discriminative tasks in a
single-modality scenario. In this paper, we present a multimodal framework for
learning sparse representations that can capture semantic correlation between
modalities. The framework can model relationships at a higher level by forcing
the shared sparse representation. In particular, we propose the use of joint
dictionary learning technique for sparse coding and formulate the joint
representation for concision, cross-modal representations (in case of a missing
modality), and union of the cross-modal representations. Given the accelerated
growth of multimodal data posted on the Web such as YouTube, Wikipedia, and
Twitter, learning good multimodal features is becoming increasingly important.
We show that the shared representations enabled by our framework substantially
improve the classification performance under both unimodal and multimodal
settings. We further show how deep architectures built on the proposed
framework are effective for the case of highly nonlinear correlations between
modalities. The effectiveness of our approach is demonstrated experimentally in
image denoising, multimedia event detection and retrieval on the TRECVID
dataset (audio-video), category classification on the Wikipedia dataset
(image-text), and sentiment classification on PhotoTweet (image-text)
Discriminative Bayesian Dictionary Learning for Classification
We propose a Bayesian approach to learn discriminative dictionaries for
sparse representation of data. The proposed approach infers probability
distributions over the atoms of a discriminative dictionary using a Beta
Process. It also computes sets of Bernoulli distributions that associate class
labels to the learned dictionary atoms. This association signifies the
selection probabilities of the dictionary atoms in the expansion of
class-specific data. Furthermore, the non-parametric character of the proposed
approach allows it to infer the correct size of the dictionary. We exploit the
aforementioned Bernoulli distributions in separately learning a linear
classifier. The classifier uses the same hierarchical Bayesian model as the
dictionary, which we present along the analytical inference solution for Gibbs
sampling. For classification, a test instance is first sparsely encoded over
the learned dictionary and the codes are fed to the classifier. We performed
experiments for face and action recognition; and object and scene-category
classification using five public datasets and compared the results with
state-of-the-art discriminative sparse representation approaches. Experiments
show that the proposed Bayesian approach consistently outperforms the existing
approaches.Comment: 15 page
Robust Scene Text Recognition Using Sparse Coding based Features
In this paper, we propose an effective scene text recognition method using
sparse coding based features, called Histograms of Sparse Codes (HSC) features.
For character detection, we use the HSC features instead of using the
Histograms of Oriented Gradients (HOG) features. The HSC features are extracted
by computing sparse codes with dictionaries that are learned from data using
K-SVD, and aggregating per-pixel sparse codes to form local histograms. For
word recognition, we integrate multiple cues including character detection
scores and geometric contexts in an objective function. The final recognition
results are obtained by searching for the words which correspond to the maximum
value of the objective function. The parameters in the objective function are
learned using the Minimum Classification Error (MCE) training method.
Experiments on several challenging datasets demonstrate that the proposed
HSC-based scene text recognition method outperforms HOG-based methods
significantly and outperforms most state-of-the-art methods
Classifying Multi-channel UWB SAR Imagery via Tensor Sparsity Learning Techniques
Using low-frequency (UHF to L-band) ultra-wideband (UWB) synthetic aperture
radar (SAR) technology for detecting buried and obscured targets, e.g. bomb or
mine, has been successfully demonstrated recently. Despite promising recent
progress, a significant open challenge is to distinguish obscured targets from
other (natural and manmade) clutter sources in the scene. The problem becomes
exacerbated in the presence of noisy responses from rough ground surfaces. In
this paper, we present three novel sparsity-driven techniques, which not only
exploit the subtle features of raw captured data but also take advantage of the
polarization diversity and the aspect angle dependence information from
multi-channel SAR data. First, the traditional sparse representation-based
classification (SRC) is generalized to exploit shared information of classes
and various sparsity structures of tensor coefficients for multi-channel data.
Corresponding tensor dictionary learning models are consequently proposed to
enhance classification accuracy. Lastly, a new tensor sparsity model is
proposed to model responses from multiple consecutive looks of objects, which
is a unique characteristic of the dataset we consider. Extensive experimental
results on a high-fidelity electromagnetic simulated dataset and radar data
collected from the U.S. Army Research Laboratory side-looking SAR demonstrate
the advantages of proposed tensor sparsity models
Joint Projection and Dictionary Learning using Low-rank Regularization and Graph Constraints
In this paper, we aim at learning simultaneously a discriminative dictionary
and a robust projection matrix from noisy data. The joint learning, makes the
learned projection and dictionary a better fit for each other, so a more
accurate classification can be obtained. However, current prevailing joint
dimensionality reduction and dictionary learning methods, would fail when the
training samples are noisy or heavily corrupted. To address this issue, we
propose a joint projection and dictionary learning using low-rank
regularization and graph constraints (JPDL-LR). Specifically, the
discrimination of the dictionary is achieved by imposing Fisher criterion on
the coding coefficients. In addition, our method explicitly encodes the local
structure of data by incorporating a graph regularization term, that further
improves the discriminative ability of the projection matrix. Inspired by
recent advances of low-rank representation for removing outliers and noise, we
enforce a low-rank constraint on sub-dictionaries of all classes to make them
more compact and robust to noise. Experimental results on several benchmark
datasets verify the effectiveness and robustness of our method for both
dimensionality reduction and image classification, especially when the data
contains considerable noise or variations
Transfer Adaptation Learning: A Decade Survey
The world we see is ever-changing and it always changes with people, things,
and the environment. Domain is referred to as the state of the world at a
certain moment. A research problem is characterized as transfer adaptation
learning (TAL) when it needs knowledge correspondence between different
moments/domains. Conventional machine learning aims to find a model with the
minimum expected risk on test data by minimizing the regularized empirical risk
on the training data, which, however, supposes that the training and test data
share similar joint probability distribution. TAL aims to build models that can
perform tasks of target domain by learning knowledge from a semantic related
but distribution different source domain. It is an energetic research filed of
increasing influence and importance, which is presenting a blowout publication
trend. This paper surveys the advances of TAL methodologies in the past decade,
and the technical challenges and essential problems of TAL have been observed
and discussed with deep insights and new perspectives. Broader solutions of
transfer adaptation learning being created by researchers are identified, i.e.,
instance re-weighting adaptation, feature adaptation, classifier adaptation,
deep network adaptation and adversarial adaptation, which are beyond the early
semi-supervised and unsupervised split. The survey helps researchers rapidly
but comprehensively understand and identify the research foundation, research
status, theoretical limitations, future challenges and under-studied issues
(universality, interpretability, and credibility) to be broken in the field
toward universal representation and safe applications in open-world scenarios.Comment: 26 pages, 4 figure
Collaborative Representation for Classification, Sparse or Non-sparse?
Sparse representation based classification (SRC) has been proved to be a
simple, effective and robust solution to face recognition. As it gets popular,
doubts on the necessity of enforcing sparsity starts coming up, and primary
experimental results showed that simply changing the -norm based
regularization to the computationally much more efficient -norm based
non-sparse version would lead to a similar or even better performance. However,
that's not always the case. Given a new classification task, it's still unclear
which regularization strategy (i.e., making the coefficients sparse or
non-sparse) is a better choice without trying both for comparison. In this
paper, we present as far as we know the first study on solving this issue,
based on plenty of diverse classification experiments. We propose a scoring
function for pre-selecting the regularization strategy using only the dataset
size, the feature dimensionality and a discrimination score derived from a
given feature representation. Moreover, we show that when dictionary learning
is taking into account, non-sparse representation has a more significant
superiority to sparse representation. This work is expected to enrich our
understanding of sparse/non-sparse collaborative representation for
classification and motivate further research activities.Comment: 8 pages, 1 figur
Dictionary Learning for Robotic Grasp Recognition and Detection
The ability to grasp ordinary and potentially never-seen objects is an
important feature in both domestic and industrial robotics. For a system to
accomplish this, it must autonomously identify grasping locations by using
information from various sensors, such as Microsoft Kinect 3D camera. Despite
numerous progress, significant work still remains to be done in this field. To
this effect, we propose a dictionary learning and sparse representation (DLSR)
framework for representing RGBD images from 3D sensors in the context of
determining such good grasping locations. In contrast to previously proposed
approaches that relied on sophisticated regularization or very large datasets,
the derived perception system has a fast training phase and can work with small
datasets. It is also theoretically founded for dealing with masked-out entries,
which are common with 3D sensors. We contribute by presenting a comparative
study of several DLSR approach combinations for recognizing and detecting grasp
candidates on the standard Cornell dataset. Importantly, experimental results
show a performance improvement of 1.69% in detection and 3.16% in recognition
over current state-of-the-art convolutional neural network (CNN). Even though
nowadays most popular vision-based approach is CNN, this suggests that DLSR is
also a viable alternative with interesting advantages that CNN has not.Comment: Submitted at the 2016 IEEE/RSJ International Conference on
Intelligent Robots and Systems (IROS 2016
An Overview of Cross-media Retrieval: Concepts, Methodologies, Benchmarks and Challenges
Multimedia retrieval plays an indispensable role in big data utilization.
Past efforts mainly focused on single-media retrieval. However, the
requirements of users are highly flexible, such as retrieving the relevant
audio clips with one query of image. So challenges stemming from the "media
gap", which means that representations of different media types are
inconsistent, have attracted increasing attention. Cross-media retrieval is
designed for the scenarios where the queries and retrieval results are of
different media types. As a relatively new research topic, its concepts,
methodologies and benchmarks are still not clear in the literatures. To address
these issues, we review more than 100 references, give an overview including
the concepts, methodologies, major challenges and open issues, as well as build
up the benchmarks including datasets and experimental results. Researchers can
directly adopt the benchmarks to promptly evaluate their proposed methods. This
will help them to focus on algorithm design, rather than the time-consuming
compared methods and results. It is noted that we have constructed a new
dataset XMedia, which is the first publicly available dataset with up to five
media types (text, image, video, audio and 3D model). We believe this overview
will attract more researchers to focus on cross-media retrieval and be helpful
to them.Comment: 14 pages, accepted by IEEE Transactions on Circuits and Systems for
Video Technolog
A Study on Unsupervised Dictionary Learning and Feature Encoding for Action Classification
Many efforts have been devoted to develop alternative methods to traditional
vector quantization in image domain such as sparse coding and soft-assignment.
These approaches can be split into a dictionary learning phase and a feature
encoding phase which are often closely connected. In this paper, we investigate
the effects of these phases by separating them for video-based action
classification. We compare several dictionary learning methods and feature
encoding schemes through extensive experiments on KTH and HMDB51 datasets.
Experimental results indicate that sparse coding performs consistently better
than the other encoding methods in large complex dataset (i.e., HMDB51), and it
is robust to different dictionaries. For small simple dataset (i.e., KTH) with
less variation, however, all the encoding strategies perform competitively. In
addition, we note that the strength of sophisticated encoding approaches comes
not from their corresponding dictionaries but the encoding mechanisms, and we
can just use randomly selected exemplars as dictionaries for video-based action
classification
- …