Search CORE

107 research outputs found

DT-NeRF: Decomposed Triplane-Hash Neural Radiance Fields for High-Fidelity Talking Portrait Synthesis

Author: Su Yaoyu
Wang Haoqian
Wang Shaohui
Publication venue
Publication date: 14/09/2023
Field of study

In this paper, we present the decomposed triplane-hash neural radiance fields (DT-NeRF), a framework that significantly improves the photorealistic rendering of talking faces and achieves state-of-the-art results on key evaluation datasets. Our architecture decomposes the facial region into two specialized triplanes: one specialized for representing the mouth, and the other for the broader facial features. We introduce audio features as residual terms and integrate them as query vectors into our model through an audio-mouth-face transformer. Additionally, our method leverages the capabilities of Neural Radiance Fields (NeRF) to enrich the volumetric representation of the entire face through additive volumetric rendering techniques. Comprehensive experimental evaluations corroborate the effectiveness and superiority of our proposed approach.Comment: 5 pages, 5 figures. Submitted to ICASSP 202

arXiv.org e-Print Archive

Deep Planar Parallax for Monocular Depth Estimation

Author: Li Zhichao
Liang Haoqian
Wang Naiyan
Yang Ya
Publication venue
Publication date: 28/11/2023
Field of study

Recent research has highlighted the utility of Planar Parallax Geometry in monocular depth estimation. However, its potential has yet to be fully realized because networks rely heavily on appearance for depth prediction. Our in-depth analysis reveals that utilizing flow-pretrain can optimize the network's usage of consecutive frame modeling, leading to substantial performance enhancement. Additionally, we propose Planar Position Embedding (PPE) to handle dynamic objects that defy static scene assumptions and to tackle slope variations that are challenging to differentiate. Comprehensive experiments on autonomous driving datasets, namely KITTI and the Waymo Open Dataset (WOD), prove that our Planar Parallax Network (PPNet) significantly surpasses existing learning-based methods in performance

arXiv.org e-Print Archive

In-Motion Initial Alignment Method Based on Vector Observation and Truncated Vectorized K-Matrix for SINS

Author: Huang Haoqian
Wang Bing
Wang Di
Wei Jiaying
Zhang Li
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/08/2022
Field of study

Royal Holloway - Pure

A Coarse Alignment Method Based on Vector Observation and Truncated Vectorized κ-matrix for Underwater Vehicle

Author: Huang Haoqian
Wang Bing
Wang Shengli
Wei Jiaying
Zhang Li
Publication venue
Publication date: 15/02/2023
Field of study

Royal Holloway - Pure

Adaptive Sparse and Monotonic Attention for Transformer-based Automatic Speech Recognition

Author: Qu Xiaoyang
Wang Haoqian
Wang Jianzong
Wei Wen qi
Xiao Jing
Zhao Chendong
Publication venue
Publication date: 29/09/2022
Field of study

The Transformer architecture model, based on self-attention and multi-head attention, has achieved remarkable success in offline end-to-end Automatic Speech Recognition (ASR). However, self-attention and multi-head attention cannot be easily applied for streaming or online ASR. For self-attention in Transformer ASR, the softmax normalization function-based attention mechanism makes it impossible to highlight important speech information. For multi-head attention in Transformer ASR, it is not easy to model monotonic alignments in different heads. To overcome these two limits, we integrate sparse attention and monotonic attention into Transformer-based ASR. The sparse mechanism introduces a learned sparsity scheme to enable each self-attention structure to fit the corresponding head better. The monotonic attention deploys regularization to prune redundant heads for the multi-head attention structure. The experiments show that our method can effectively improve the attention mechanism on widely used benchmarks of speech recognition.Comment: Accepted to DSAA 202

arXiv.org e-Print Archive

One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer

Author: Li Yu
Lin Jing
Wang Haoqian
Zeng Ailing
Zhang Lei
Publication venue
Publication date: 28/03/2023
Field of study

Whole-body mesh recovery aims to estimate the 3D human body, face, and hands parameters from a single image. It is challenging to perform this task with a single network due to resolution issues, i.e., the face and hands are usually located in extremely small regions. Existing works usually detect hands and faces, enlarge their resolution to feed in a specific network to predict the parameter, and finally fuse the results. While this copy-paste pipeline can capture the fine-grained details of the face and hands, the connections between different parts cannot be easily recovered in late fusion, leading to implausible 3D rotation and unnatural pose. In this work, we propose a one-stage pipeline for expressive whole-body mesh recovery, named OSX, without separate networks for each part. Specifically, we design a Component Aware Transformer (CAT) composed of a global body encoder and a local face/hand decoder. The encoder predicts the body parameters and provides a high-quality feature map for the decoder, which performs a feature-level upsample-crop scheme to extract high-resolution part-specific features and adopt keypoint-guided deformable attention to estimate hand and face precisely. The whole pipeline is simple yet effective without any manual post-processing and naturally avoids implausible prediction. Comprehensive experiments demonstrate the effectiveness of OSX. Lastly, we build a large-scale Upper-Body dataset (UBody) with high-quality 2D and 3D whole-body annotations. It contains persons with partially visible bodies in diverse real-life scenarios to bridge the gap between the basic task and downstream applications.Comment: Accepted to CVPR2023; Top-1 on AGORA SMPLX benchmark; Project Page: https://osx-ubody.github.io

arXiv.org e-Print Archive