23 research outputs found
Adaptive Graphical Model Network for 2D Handpose Estimation
In this paper, we propose a new architecture called Adaptive Graphical Model
Network (AGMN) to tackle the task of 2D hand pose estimation from a monocular
RGB image. The AGMN consists of two branches of deep convolutional neural
networks for calculating unary and pairwise potential functions, followed by a
graphical model inference module for integrating unary and pairwise potentials.
Unlike existing architectures proposed to combine DCNNs with graphical models,
our AGMN is novel in that the parameters of its graphical model are conditioned
on and fully adaptive to individual input images. Experiments show that our
approach outperforms the state-of-the-art method used in 2D hand keypoints
estimation by a notable margin on two public datasets.Comment: 30th British Machine Vision Conference (BMVC
Diffeomorphic Image Registration with Neural Velocity Field
Diffeomorphic image registration, offering smooth transformation and topology
preservation, is required in many medical image analysis tasks.Traditional
methods impose certain modeling constraints on the space of admissible
transformations and use optimization to find the optimal transformation between
two images. Specifying the right space of admissible transformations is
challenging: the registration quality can be poor if the space is too
restrictive, while the optimization can be hard to solve if the space is too
general. Recent learning-based methods, utilizing deep neural networks to learn
the transformation directly, achieve fast inference, but face challenges in
accuracy due to the difficulties in capturing the small local deformations and
generalization ability. Here we propose a new optimization-based method named
DNVF (Diffeomorphic Image Registration with Neural Velocity Field) which
utilizes deep neural network to model the space of admissible transformations.
A multilayer perceptron (MLP) with sinusoidal activation function is used to
represent the continuous velocity field and assigns a velocity vector to every
point in space, providing the flexibility of modeling complex deformations as
well as the convenience of optimization. Moreover, we propose a cascaded image
registration framework (Cas-DNVF) by combining the benefits of both
optimization and learning based methods, where a fully convolutional neural
network (FCN) is trained to predict the initial deformation, followed by DNVF
for further refinement. Experiments on two large-scale 3D MR brain scan
datasets demonstrate that our proposed methods significantly outperform the
state-of-the-art registration methods.Comment: WACV 202
Identity-Aware Hand Mesh Estimation and Personalization from RGB Images
Reconstructing 3D hand meshes from monocular RGB images has attracted
increasing amount of attention due to its enormous potential applications in
the field of AR/VR. Most state-of-the-art methods attempt to tackle this task
in an anonymous manner. Specifically, the identity of the subject is ignored
even though it is practically available in real applications where the user is
unchanged in a continuous recording session. In this paper, we propose an
identity-aware hand mesh estimation model, which can incorporate the identity
information represented by the intrinsic shape parameters of the subject. We
demonstrate the importance of the identity information by comparing the
proposed identity-aware model to a baseline which treats subject anonymously.
Furthermore, to handle the use case where the test subject is unseen, we
propose a novel personalization pipeline to calibrate the intrinsic shape
parameters using only a few unlabeled RGB images of the subject. Experiments on
two large scale public datasets validate the state-of-the-art performance of
our proposed method.Comment: ECCV 2022. Github
https://github.com/deyingk/PersonalizedHandMeshEstimatio
PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation
Recently, the vision transformer and its variants have played an increasingly
important role in both monocular and multi-view human pose estimation.
Considering image patches as tokens, transformers can model the global
dependencies within the entire image or across images from other views.
However, global attention is computationally expensive. As a consequence, it is
difficult to scale up these transformer-based methods to high-resolution
features and many views.
In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D
human pose estimation, which can locate a rough human mask and performs
self-attention only within selected tokens. Furthermore, we extend our PPT to
multi-view human pose estimation. Built upon PPT, we propose a new cross-view
fusion strategy, called human area fusion, which considers all human foreground
pixels as corresponding candidates. Experimental results on COCO and MPII
demonstrate that our PPT can match the accuracy of previous pose transformer
methods while reducing the computation. Moreover, experiments on Human 3.6M and
Ski-Pose demonstrate that our Multi-view PPT can efficiently fuse cues from
multiple views and achieve new state-of-the-art results.Comment: ECCV 2022. Code is available at https://github.com/HowieMa/PP
Hybrid-CSR: Coupling Explicit and Implicit Shape Representation for Cortical Surface Reconstruction
We present Hybrid-CSR, a geometric deep-learning model that combines explicit
and implicit shape representations for cortical surface reconstruction.
Specifically, Hybrid-CSR begins with explicit deformations of template meshes
to obtain coarsely reconstructed cortical surfaces, based on which the oriented
point clouds are estimated for the subsequent differentiable poisson surface
reconstruction. By doing so, our method unifies explicit (oriented point
clouds) and implicit (indicator function) cortical surface reconstruction.
Compared to explicit representation-based methods, our hybrid approach is more
friendly to capture detailed structures, and when compared with implicit
representation-based methods, our method can be topology aware because of
end-to-end training with a mesh-based deformation module. In order to address
topology defects, we propose a new topology correction pipeline that relies on
optimization-based diffeomorphic surface registration. Experimental results on
three brain datasets show that our approach surpasses existing implicit and
explicit cortical surface reconstruction methods in numeric metrics in terms of
accuracy, regularity, and consistency
The abundance of homoeologue transcripts is disrupted by hybridization and is partially restored by genome doubling in synthetic hexaploid wheat
Dataset S7. List of nonadditively expressed genes in F1 hybrids derived from AS2255 × AS60. (XLSX 72 kb
Recommended from our members
Deep Learning Models On Hand Pose Estimation and Mesh Reconstruction From RGB Images
Estimating and reconstructing human hand pose is a crucial task involved in many real world AI applications, such as human-computer interaction, augmented reality and virtual reality. However, hand pose estimation is challenging because the hand is highly articulated and dexterous, and hand pose estimation suffers severely from self-occlusion. To address the challenges of hand pose estimation from RGB images, several algorithms would be proposed in this thesis. In the first part, the task of 2D hand pose estimation from RBG images would be investigated. We introduce new techniques that combine traditional graphical probabilistic models with deep convolutional neural networks, and use these techniques to incorporate structural constraints of the hand to improve hand pose estimation. Apart from that, a novel graph neural network, spatial information aware GCN, would be proposed, which can efficiently extract spatial information from heatmaps of hand keypoints and propagate them through graph convolution. In the second part, the more challenging problem of 3D hand mesh reconstruction would be tackled. We will introduce an identity-aware hand mesh estimation network and a novel method to perform hand model calibration from RGB images. Extensive experiments have been conducted on multiple large-scale public datasets, demonstrating the state-of-the-art performance