161 research outputs found
Joint Geometrical and Statistical Alignment for Visual Domain Adaptation
This paper presents a novel unsupervised domain adaptation method for
cross-domain visual recognition. We propose a unified framework that reduces
the shift between domains both statistically and geometrically, referred to as
Joint Geometrical and Statistical Alignment (JGSA). Specifically, we learn two
coupled projections that project the source domain and target domain data into
low dimensional subspaces where the geometrical shift and distribution shift
are reduced simultaneously. The objective function can be solved efficiently in
a closed form. Extensive experiments have verified that the proposed method
significantly outperforms several state-of-the-art domain adaptation methods on
a synthetic dataset and three different real world cross-domain visual
recognition tasks
Learning a Pose Lexicon for Semantic Action Recognition
This paper presents a novel method for learning a pose lexicon comprising
semantic poses defined by textual instructions and their associated visual
poses defined by visual features. The proposed method simultaneously takes two
input streams, semantic poses and visual pose candidates, and statistically
learns a mapping between them to construct the lexicon. With the learned
lexicon, action recognition can be cast as the problem of finding the maximum
translation probability of a sequence of semantic poses given a stream of
visual pose candidates. Experiments evaluating pre-trained and zero-shot action
recognition conducted on MSRC-12 gesture and WorkoutSu-10 exercise datasets
were used to verify the efficacy of the proposed method.Comment: Accepted by the 2016 IEEE International Conference on Multimedia and
Expo (ICME 2016). 6 pages paper and 4 pages supplementary materia
Unsupervised Domain Adaptation: A Multi-task Learning-based Method
This paper presents a novel multi-task learning-based method for unsupervised
domain adaptation. Specifically, the source and target domain classifiers are
jointly learned by considering the geometry of target domain and the divergence
between the source and target domains based on the concept of multi-task
learning. Two novel algorithms are proposed upon the method using Regularized
Least Squares and Support Vector Machines respectively. Experiments on both
synthetic and real world cross domain recognition tasks have shown that the
proposed methods outperform several state-of-the-art domain adaptation methods
Importance Weighted Adversarial Nets for Partial Domain Adaptation
This paper proposes an importance weighted adversarial nets-based method for
unsupervised domain adaptation, specific for partial domain adaptation where
the target domain has less number of classes compared to the source domain.
Previous domain adaptation methods generally assume the identical label spaces,
such that reducing the distribution divergence leads to feasible knowledge
transfer. However, such an assumption is no longer valid in a more realistic
scenario that requires adaptation from a larger and more diverse source domain
to a smaller target domain with less number of classes. This paper extends the
adversarial nets-based domain adaptation and proposes a novel adversarial
nets-based partial domain adaptation method to identify the source samples that
are potentially from the outlier classes and, at the same time, reduce the
shift of shared classes between domains
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Signal analysis using a multiresolution form of the singular value decomposition
This paper proposes a multiresolution form of the singular value decomposition (SVD) and shows how it may be used for signal analysis and approximation. It is well-known that the SVD has optimal decorrelation and subrank approximation properties. The multiresolution form of SVD proposed here retains those properties, and moreover, has linear computational complexity. By using the multiresolution SVD, the following important characteristics of a signal may be measured, at each of several levels of resolution: isotropy, sphericity of principal components, self-similarity under scaling, and resolution of mean-squared error into meaningful components. Theoretical calculations are provided for simple statistical models to show what might be expected. Results are provided with real images to show the usefulness of the SVD decomposition
Investigation of Different Skeleton Features for CNN-based 3D Action Recognition
Deep learning techniques are being used in skeleton based action recognition
tasks and outstanding performance has been reported. Compared with RNN based
methods which tend to overemphasize temporal information, CNN-based approaches
can jointly capture spatio-temporal information from texture color images
encoded from skeleton sequences. There are several skeleton-based features that
have proven effective in RNN-based and handcrafted-feature-based methods.
However, it remains unknown whether they are suitable for CNN-based approaches.
This paper proposes to encode five spatial skeleton features into images with
different encoding methods. In addition, the performance implication of
different joints used for feature extraction is studied. The proposed method
achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action
analysis. An accuracy of 75.32\% was achieved in Large Scale 3D Human Activity
Analysis Challenge in Depth Videos
Creating Simplified 3D Models with High Quality Textures
This paper presents an extension to the KinectFusion algorithm which allows
creating simplified 3D models with high quality RGB textures. This is achieved
through (i) creating model textures using images from an HD RGB camera that is
calibrated with Kinect depth camera, (ii) using a modified scheme to update
model textures in an asymmetrical colour volume that contains a higher number
of voxels than that of the geometry volume, (iii) simplifying dense polygon
mesh model using quadric-based mesh decimation algorithm, and (iv) creating and
mapping 2D textures to every polygon in the output 3D model. The proposed
method is implemented in real-time by means of GPU parallel processing.
Visualization via ray casting of both geometry and colour volumes provides
users with a real-time feedback of the currently scanned 3D model. Experimental
results show that the proposed method is capable of keeping the model texture
quality even for a heavily decimated model and that, when reconstructing small
objects, photorealistic RGB textures can still be reconstructed.Comment: 2015 International Conference on Digital Image Computing: Techniques
and Applications (DICTA), Page 1 -
Learning Discriminative Bayesian Networks from High-dimensional Continuous Neuroimaging Data
Due to its causal semantics, Bayesian networks (BN) have been widely employed
to discover the underlying data relationship in exploratory studies, such as
brain research. Despite its success in modeling the probability distribution of
variables, BN is naturally a generative model, which is not necessarily
discriminative. This may cause the ignorance of subtle but critical network
changes that are of investigation values across populations. In this paper, we
propose to improve the discriminative power of BN models for continuous
variables from two different perspectives. This brings two general
discriminative learning frameworks for Gaussian Bayesian networks (GBN). In the
first framework, we employ Fisher kernel to bridge the generative models of GBN
and the discriminative classifiers of SVMs, and convert the GBN parameter
learning to Fisher kernel learning via minimizing a generalization error bound
of SVMs. In the second framework, we employ the max-margin criterion and build
it directly upon GBN models to explicitly optimize the classification
performance of the GBNs. The advantages and disadvantages of the two frameworks
are discussed and experimentally compared. Both of them demonstrate strong
power in learning discriminative parameters of GBNs for neuroimaging based
brain network analysis, as well as maintaining reasonable representation
capacity. The contributions of this paper also include a new Directed Acyclic
Graph (DAG) constraint with theoretical guarantee to ensure the graph validity
of GBN.Comment: 16 pages and 5 figures for the article (excluding appendix
Cooperative Training of Deep Aggregation Networks for RGB-D Action Recognition
A novel deep neural network training paradigm that exploits the conjoint
information in multiple heterogeneous sources is proposed. Specifically, in a
RGB-D based action recognition task, it cooperatively trains a single
convolutional neural network (named c-ConvNet) on both RGB visual features and
depth features, and deeply aggregates the two kinds of features for action
recognition. Differently from the conventional ConvNet that learns the deep
separable features for homogeneous modality-based classification with only one
softmax loss function, the c-ConvNet enhances the discriminative power of the
deeply learned features and weakens the undesired modality discrepancy by
jointly optimizing a ranking loss and a softmax loss for both homogeneous and
heterogeneous modalities. The ranking loss consists of intra-modality and
cross-modality triplet losses, and it reduces both the intra-modality and
cross-modality feature variations. Furthermore, the correlations between RGB
and depth data are embedded in the c-ConvNet, and can be retrieved by either of
the modalities and contribute to the recognition in the case even only one of
the modalities is available. The proposed method was extensively evaluated on
two large RGB-D action recognition datasets, ChaLearn LAP IsoGD and NTU RGB+D
datasets, and one small dataset, SYSU 3D HOI, and achieved state-of-the-art
results
- …