6,953 research outputs found
3D human pose estimation from depth maps using a deep combination of poses
Many real-world applications require the estimation of human body joints for
higher-level tasks as, for example, human behaviour understanding. In recent
years, depth sensors have become a popular approach to obtain three-dimensional
information. The depth maps generated by these sensors provide information that
can be employed to disambiguate the poses observed in two-dimensional images.
This work addresses the problem of 3D human pose estimation from depth maps
employing a Deep Learning approach. We propose a model, named Deep Depth Pose
(DDP), which receives a depth map containing a person and a set of predefined
3D prototype poses and returns the 3D position of the body joints of the
person. In particular, DDP is defined as a ConvNet that computes the specific
weights needed to linearly combine the prototypes for the given input. We have
thoroughly evaluated DDP on the challenging 'ITOP' and 'UBC3V' datasets, which
respectively depict realistic and synthetic samples, defining a new
state-of-the-art on them.Comment: Accepted for publication at "Journal of Visual Communication and
Image Representation
Visual identification by signature tracking
We propose a new camera-based biometric: visual signature identification. We discuss the importance of the parameterization of the signatures in order to achieve good classification results, independently of variations in the position of the camera with respect to the writing surface. We show that affine arc-length parameterization performs better than conventional time and Euclidean arc-length ones. We find that the system verification performance is better than 4 percent error on skilled forgeries and 1 percent error on random forgeries, and that its recognition performance is better than 1 percent error rate, comparable to the best camera-based biometrics
Online Multi-Stage Deep Architectures for Feature Extraction and Object Recognition
Multi-stage visual architectures have recently found success in achieving high classification accuracies over image datasets with large variations in pose, lighting, and scale. Inspired by techniques currently at the forefront of deep learning, such architectures are typically composed of one or more layers of preprocessing, feature encoding, and pooling to extract features from raw images. Training these components traditionally relies on large sets of patches that are extracted from a potentially large image dataset. In this context, high-dimensional feature space representations are often helpful for obtaining the best classification performances and providing a higher degree of invariance to object transformations. Large datasets with high-dimensional features complicate the implementation of visual architectures in memory constrained environments. This dissertation constructs online learning replacements for the components within a multi-stage architecture and demonstrates that the proposed replacements (namely fuzzy competitive clustering, an incremental covariance estimator, and multi-layer neural network) can offer performance competitive with their offline batch counterparts while providing a reduced memory footprint. The online nature of this solution allows for the development of a method for adjusting parameters within the architecture via stochastic gradient descent. Testing over multiple datasets shows the potential benefits of this methodology when appropriate priors on the initial parameters are unknown. Alternatives to batch based decompositions for a whitening preprocessing stage which take advantage of natural image statistics and allow simple dictionary learners to work well in the problem domain are also explored. Expansions of the architecture using additional pooling statistics and multiple layers are presented and indicate that larger codebook sizes are not the only step forward to higher classification accuracies. Experimental results from these expansions further indicate the important role of sparsity and appropriate encodings within multi-stage visual feature extraction architectures
Diversity vs. Recognizability: Human-like generalization in one-shot generative models
Robust generalization to new concepts has long remained a distinctive feature
of human intelligence. However, recent progress in deep generative models has
now led to neural architectures capable of synthesizing novel instances of
unknown visual concepts from a single training example. Yet, a more precise
comparison between these models and humans is not possible because existing
performance metrics for generative models (i.e., FID, IS, likelihood) are not
appropriate for the one-shot generation scenario. Here, we propose a new
framework to evaluate one-shot generative models along two axes: sample
recognizability vs. diversity (i.e., intra-class variability). Using this
framework, we perform a systematic evaluation of representative one-shot
generative models on the Omniglot handwritten dataset. We first show that
GAN-like and VAE-like models fall on opposite ends of the
diversity-recognizability space. Extensive analyses of the effect of key model
parameters further revealed that spatial attention and context integration have
a linear contribution to the diversity-recognizability trade-off. In contrast,
disentanglement transports the model along a parabolic curve that could be used
to maximize recognizability. Using the diversity-recognizability framework, we
were able to identify models and parameters that closely approximate human
data
Recent Advances in Transfer Learning for Cross-Dataset Visual Recognition: A Problem-Oriented Perspective
This paper takes a problem-oriented perspective and presents a comprehensive
review of transfer learning methods, both shallow and deep, for cross-dataset
visual recognition. Specifically, it categorises the cross-dataset recognition
into seventeen problems based on a set of carefully chosen data and label
attributes. Such a problem-oriented taxonomy has allowed us to examine how
different transfer learning approaches tackle each problem and how well each
problem has been researched to date. The comprehensive problem-oriented review
of the advances in transfer learning with respect to the problem has not only
revealed the challenges in transfer learning for visual recognition, but also
the problems (e.g. eight of the seventeen problems) that have been scarcely
studied. This survey not only presents an up-to-date technical review for
researchers, but also a systematic approach and a reference for a machine
learning practitioner to categorise a real problem and to look up for a
possible solution accordingly
Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data
Knowing when a trained segmentation model is encountering data that is
different to its training data is important. Understanding and mitigating the
effects of this play an important part in their application from a performance
and assurance perspective - this being a safety concern in applications such as
autonomous vehicles (AVs). This work presents a segmentation network that can
detect errors caused by challenging test domains without any additional
annotation in a single forward pass. As annotation costs limit the diversity of
labelled datasets, we use easy-to-obtain, uncurated and unlabelled data to
learn to perform uncertainty estimation by selectively enforcing consistency
over data augmentation. To this end, a novel segmentation benchmark based on
the SAX Dataset is used, which includes labelled test data spanning three
autonomous-driving domains, ranging in appearance from dense urban to off-road.
The proposed method, named Gamma-SSL, consistently outperforms uncertainty
estimation and Out-of-Distribution (OoD) techniques on this difficult benchmark
- by up to 10.7% in area under the receiver operating characteristic (ROC)
curve and 19.2% in area under the precision-recall (PR) curve in the most
challenging of the three scenarios.Comment: Accepted for publication in IEEE Transactions on Robotics (T-RO
- …