127 research outputs found
Learning Social Relation Traits from Face Images
Social relation defines the association, e.g, warm, friendliness, and
dominance, between two or more people. Motivated by psychological studies, we
investigate if such fine-grained and high-level relation traits can be
characterised and quantified from face images in the wild. To address this
challenging problem we propose a deep model that learns a rich face
representation to capture gender, expression, head pose, and age-related
attributes, and then performs pairwise-face reasoning for relation prediction.
To learn from heterogeneous attribute sources, we formulate a new network
architecture with a bridging layer to leverage the inherent correspondences
among these datasets. It can also cope with missing target attribute labels.
Extensive experiments show that our approach is effective for fine-grained
social relation learning in images and videos.Comment: To appear in International Conference on Computer Vision (ICCV) 201
A note on exploratory item factor analysis by singular value decomposition
We revisit a singular value decomposition (SVD) algorithm given in Chen et al. (Psychometrika 84:124–146, 2019b) for exploratory item factor analysis (IFA). This algorithm estimates a multidimensional IFA model by SVD and was used to obtain a starting point for joint maximum likelihood estimation in Chen et al. (2019b). Thanks to the analytic and computational properties of SVD, this algorithm guarantees a unique solution and has computational advantage over other exploratory IFA methods. Its computational advantage becomes significant when the numbers of respondents, items, and factors are all large. This algorithm can be viewed as a generalization of principal component analysis to binary data. In this note, we provide the statistical underpinning of the algorithm. In particular, we show its statistical consistency under the same double asymptotic setting as in Chen et al. (2019b). We also demonstrate how this algorithm provides a scree plot for investigating the number of factors and provide its asymptotic theory. Further extensions of the algorithm are discussed. Finally, simulation studies suggest that the algorithm has good finite sample performance
3D ShapeNets: A Deep Representation for Volumetric Shapes
3D shape is a crucial but heavily underutilized cue in today's computer
vision systems, mostly due to the lack of a good generic shape representation.
With the recent availability of inexpensive 2.5D depth sensors (e.g. Microsoft
Kinect), it is becoming increasingly important to have a powerful 3D shape
representation in the loop. Apart from category recognition, recovering full 3D
shapes from view-based 2.5D depth maps is also a critical part of visual
understanding. To this end, we propose to represent a geometric 3D shape as a
probability distribution of binary variables on a 3D voxel grid, using a
Convolutional Deep Belief Network. Our model, 3D ShapeNets, learns the
distribution of complex 3D shapes across different object categories and
arbitrary poses from raw CAD data, and discovers hierarchical compositional
part representations automatically. It naturally supports joint object
recognition and shape completion from 2.5D depth maps, and it enables active
object recognition through view planning. To train our 3D deep learning model,
we construct ModelNet -- a large-scale 3D CAD model dataset. Extensive
experiments show that our 3D deep representation enables significant
performance improvement over the-state-of-the-arts in a variety of tasks.Comment: to be appeared in CVPR 201
Residual Attention Network for Image Classification
In this work, we propose "Residual Attention Network", a convolutional neural
network using attention mechanism which can incorporate with state-of-art feed
forward network architecture in an end-to-end training fashion. Our Residual
Attention Network is built by stacking Attention Modules which generate
attention-aware features. The attention-aware features from different modules
change adaptively as layers going deeper. Inside each Attention Module,
bottom-up top-down feedforward structure is used to unfold the feedforward and
feedback attention process into a single feedforward process. Importantly, we
propose attention residual learning to train very deep Residual Attention
Networks which can be easily scaled up to hundreds of layers. Extensive
analyses are conducted on CIFAR-10 and CIFAR-100 datasets to verify the
effectiveness of every module mentioned above. Our Residual Attention Network
achieves state-of-the-art object recognition performance on three benchmark
datasets including CIFAR-10 (3.90% error), CIFAR-100 (20.45% error) and
ImageNet (4.8% single model and single crop, top-5 error). Note that, our
method achieves 0.6% top-1 accuracy improvement with 46% trunk depth and 69%
forward FLOPs comparing to ResNet-200. The experiment also demonstrates that
our network is robust against noisy labels.Comment: accepted to CVPR201
- …