72 research outputs found
Beauty3DFaceNet:Deep geometry and texture fusion for 3D facial attractiveness prediction
We present Beauty3DFaceNet, the first deep convolutional neural network to predict attractiveness on 3D faces with both geometry and texture information. The proposed network can learn discriminative and complementary 2D and 3D facial features, allowing accurate attractiveness prediction for 3D faces. The main component of our network is a fusion module that fuses geometric features and texture features. We further employ a novel sampling strategy for our network based on a prior of facial landmarks, which improves the performance of learning aesthetic features from a face point cloud. Comparing to previous work, our approach takes full advantage of 3D geometry and 2D texture and does not rely on handcrafted features based on highly accurate facial characteristics such as feature points. To facilitate 3D facial attractiveness research, we also construct the first 3D face dataset ShadowFace3D, which contains 6,000 high-quality 3D faces with attractiveness labeled by human annotators. Extensive quantitative and qualitative evaluations show that Beauty3DFaceNet achieves a significant correlation with the average human ratings. This validates that a deep learning network can effectively learn and predict 3D facial attractiveness.</p
Relational Self-Supervised Learning
Self-supervised Learning (SSL) including the mainstream contrastive learning
has achieved great success in learning visual representations without data
annotations. However, most methods mainly focus on the instance level
information (\ie, the different augmented images of the same instance should
have the same feature or cluster into the same class), but there is a lack of
attention on the relationships between different instances. In this paper, we
introduce a novel SSL paradigm, which we term as relational self-supervised
learning (ReSSL) framework that learns representations by modeling the
relationship between different instances. Specifically, our proposed method
employs sharpened distribution of pairwise similarities among different
instances as \textit{relation} metric, which is thus utilized to match the
feature embeddings of different augmentations. To boost the performance, we
argue that weak augmentations matter to represent a more reliable relation, and
leverage momentum strategy for practical efficiency. The designed asymmetric
predictor head and an InfoNCE warm-up strategy enhance the robustness to
hyper-parameters and benefit the resulting performance. Experimental results
show that our proposed ReSSL substantially outperforms the state-of-the-art
methods across different network architectures, including various lightweight
networks (\eg, EfficientNet and MobileNet).Comment: Extended version of NeurIPS 2021 paper. arXiv admin note: substantial
text overlap with arXiv:2107.0928
Dynamic fusion with intra-and inter-modality attention flow for visual question answering
Learning effective fusion of multi-modality features is at the heart of
visual question answering. We propose a novel method of dynamically fusing
multi-modal features with intra- and inter-modality information flow, which
alternatively pass dynamic information between and across the visual and
language modalities. It can robustly capture the high-level interactions
between language and vision domains, thus significantly improves the
performance of visual question answering. We also show that the proposed
dynamic intra-modality attention flow conditioned on the other modality can
dynamically modulate the intra-modality attention of the target modality, which
is vital for multimodality feature fusion. Experimental evaluations on the VQA
2.0 dataset show that the proposed method achieves state-of-the-art VQA
performance. Extensive ablation studies are carried out for the comprehensive
analysis of the proposed method.Comment: CVPR 2019 ORA
CoNe: Contrast Your Neighbours for Supervised Image Classification
Image classification is a longstanding problem in computer vision and machine
learning research. Most recent works (e.g. SupCon , Triplet, and max-margin)
mainly focus on grouping the intra-class samples aggressively and compactly,
with the assumption that all intra-class samples should be pulled tightly
towards their class centers. However, such an objective will be very hard to
achieve since it ignores the intra-class variance in the dataset. (i.e.
different instances from the same class can have significant differences).
Thus, such a monotonous objective is not sufficient. To provide a more
informative objective, we introduce Contrast Your Neighbours (CoNe) - a simple
yet practical learning framework for supervised image classification.
Specifically, in CoNe, each sample is not only supervised by its class center
but also directly employs the features of its similar neighbors as anchors to
generate more adaptive and refined targets. Moreover, to further boost the
performance, we propose ``distributional consistency" as a more informative
regularization to enable similar instances to have a similar probability
distribution. Extensive experimental results demonstrate that CoNe achieves
state-of-the-art performance across different benchmark datasets, network
architectures, and settings. Notably, even without a complicated training
recipe, our CoNe achieves 80.8\% Top-1 accuracy on ImageNet with ResNet-50,
which surpasses the recent Timm training recipe (80.4\%). Code and pre-trained
models are available at
\href{https://github.com/mingkai-zheng/CoNe}{https://github.com/mingkai-zheng/CoNe}
Blending using ODE swept surfaces with shape control and C1 continuity
Surface blending with tangential continuity is most widely applied in computer aided design, manufacturing systems, and geometric modeling. In this paper, we propose a new blending method to effectively control the shape of blending surfaces, which can also satisfy the blending constraints of tangent continuity exactly. This new blending method is based on the concept of swept surfaces controlled by a vector-valued fourth order ordinary differential equation (ODE). It creates blending surfaces by sweeping a generator along two trimlines and making the generator exactly satisfy the tangential constraints at the trimlines. The shape of blending surfaces is controlled by manipulating the generator with the solution to a vector-valued fourth order ODE. This new blending methods have the following advantages: 1). exact satisfaction of 1C continuous blending boundary constraints, 2). effective shape control of blending surfaces, 3). high computing efficiency due to explicit mathematical representation of blending surfaces, and 4). ability to blend multiple (more than two) primary surfaces
Dynamic skin deformation using finite difference solutions for character animation
We present a new skin deformation method to create dynamic skin deformations in this paper. The core
elements of our approach are a dynamic deformation model, an efficient data-driven finite difference
solution, and a curve-based representation of 3D models. We first reconstruct skin deformation models at different poses from the taken photos of a male human arm movement to achieve real deformed skin shapes. Then, we extract curves from these reconstructed skin deformation models. A new dynamic deformation model is proposed to describe physics of dynamic curve deformations, and its finite difference solution is developed to determine shape changes of the extracted curves. In order to improve visual realism of skin deformations, we employ data-driven methods and introduce skin shapes at the initial and final poses into our proposed dynamic deformation model. Experimental examples and comparisons made in this paper indicate that our proposed dynamic skin deformation technique can create realistic deformed skin shapes efficiently with a small data size
- …