Search CORE

245 research outputs found

DVGaze: Dual-View Gaze Estimation

Author: Cheng Yihua
Lu Feng
Publication venue
Publication date: 20/08/2023
Field of study

Gaze estimation methods estimate gaze from facial appearance with a single camera. However, due to the limited view of a single camera, the captured facial appearance cannot provide complete facial information and thus complicate the gaze estimation problem. Recently, camera devices are rapidly updated. Dual cameras are affordable for users and have been integrated in many devices. This development suggests that we can further improve gaze estimation performance with dual-view gaze estimation. In this paper, we propose a dual-view gaze estimation network (DV-Gaze). DV-Gaze estimates dual-view gaze directions from a pair of images. We first propose a dual-view interactive convolution (DIC) block in DV-Gaze. DIC blocks exchange dual-view information during convolution in multiple feature scales. It fuses dual-view features along epipolar lines and compensates for the original feature with the fused feature. We further propose a dual-view transformer to estimate gaze from dual-view features. Camera poses are encoded to indicate the position information in the transformer. We also consider the geometric relation between dual-view gaze directions and propose a dual-view gaze consistency loss for DV-Gaze. DV-Gaze achieves state-of-the-art performance on ETH-XGaze and EVE datasets. Our experiments also prove the potential of dual-view gaze estimation. We release codes in https://github.com/yihuacheng/DVGaze.Comment: ICCV 202

arXiv.org e-Print Archive

Remedial Strategies of T-NPC Three-Level Asymmetric Six-Phase PMSM Drives Based on SVM-DTC

Author: Cheng Ming
Hu Yihua
Wang Xueqing
Wang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2017
Field of study

University of Liverpool Repository

Crossref

Comprehensive Investigation on Remedial Operation of Switch Faults for Dual Three-Phase PMSM Drives Fed by T-3L Inverters

Author: Cheng Ming
Hu Yihua
Wang Xueqing
Wang Zheng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2018
Field of study

University of Liverpool Repository

Crossref

High-Fidelity Eye Animatable Neural Radiance Fields for Human Face

Author: Chang Hyung Jin
Cheng Yihua
Wang Hengfei
Zhang Zhongqun
Publication venue
Publication date: 01/08/2023
Field of study

Face rendering using neural radiance fields (NeRF) is a rapidly developing research area in computer vision. While recent methods primarily focus on controlling facial attributes such as identity and expression, they often overlook the crucial aspect of modeling eyeball rotation, which holds importance for various downstream tasks. In this paper, we aim to learn a face NeRF model that is sensitive to eye movements from multi-view images. We address two key challenges in eye-aware face NeRF learning: how to effectively capture eyeball rotation for training and how to construct a manifold for representing eyeball rotation. To accomplish this, we first fit FLAME, a well-established parametric face model, to the multi-view images considering multi-view consistency. Subsequently, we introduce a new Dynamic Eye-aware NeRF (DeNeRF). DeNeRF transforms 3D points from different views into a canonical space to learn a unified face NeRF model. We design an eye deformation field for the transformation, including rigid transformation, e.g., eyeball rotation, and non-rigid transformation. Through experiments conducted on the ETH-XGaze dataset, we demonstrate that our model is capable of generating high-fidelity images with accurate eyeball rotation and non-rigid periocular deformation, even under novel viewing angles. Furthermore, we show that utilizing the rendered images can effectively enhance gaze estimation performance.Comment: Under revie

arXiv.org e-Print Archive

A Coarse-to-Fine Adaptive Network for Appearance-Based Gaze Estimation

Author: Cheng Yihua
Huang Shiyao
Lu Feng
Qian Chen
Wang Fei
Publication venue
Publication date: 01/01/2020
Field of study

Human gaze is essential for various appealing applications. Aiming at more accurate gaze estimation, a series of recent works propose to utilize face and eye images simultaneously. Nevertheless, face and eye images only serve as independent or parallel feature sources in those works, the intrinsic correlation between their features is overlooked. In this paper we make the following contributions: 1) We propose a coarse-to-fine strategy which estimates a basic gaze direction from face image and refines it with corresponding residual predicted from eye images. 2) Guided by the proposed strategy, we design a framework which introduces a bi-gram model to bridge gaze residual and basic gaze direction, and an attention component to adaptively acquire suitable fine-grained feature. 3) Integrating the above innovations, we construct a coarse-to-fine adaptive network named CA-Net and achieve state-of-the-art performances on MPIIGaze and EyeDiap.Comment: 9 pages, 7figures, AAAI-2

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recommended from our members

Eloquent: A More Robust Transmission Scheme for LLM Token Streaming

Author: Cheng Yihua
Du Kuntai
Jiang Junchen
Li Hanchen
Liu Yuhan
Ray Siddhant
Publication venue
Publication date: 13/08/2024
Field of study

To render each generated token in real-time for users, the Large Language Model (LLM) server generates tokens one by one and streams each token (or group of a few tokens) through the network to the user right after generation, which we refer to as LLM token streaming. However, under unstable network conditions, the LLM token streaming experience could suffer greatly from stalls since one packet loss could block the rendering of later tokens even if the packets containing them arrive on time. With a measurement study, we show that current applications suffer from increased stalls under unstable networks. For this emerging token streaming problem in LLM Chatbots that differs from previous multimedia and text applications, we propose a novel transmission scheme, called Eloquent, which puts newly generated tokens as well as currently unacknowledged tokens in the next outgoing packet. This ensures that each packet contains some new tokens and, in the meantime, is independently rendered when received, avoiding the aforementioned stalls caused by missing packets. Through simulation under various networks, we show Eloquent reduces stall ratio (proportion of token rendering wait time) by 71.0% compared to the retransmission method commonly used by real chatbot applications and by 31.6% compared to the baseline packet duplication scheme. By tailoring Eloquent to fit the token-by-token generation of LLM, we enable the Chatbots to respond like an eloquent speaker for users to better enjoy pervasive AI.</p

Knowledge UChicago

NL2Contact:Natural Language Guided 3D Hand-Object Contact Modeling with Diffusion Model

Author: Chang Hyung Jin
Cheng Yihua
Wang Hengfei
Yao Angela
Yu Ziwei
Zhang Zhongqun
Publication venue: Springer
Publication date: 01/07/2024
Field of study

Modeling the physical contacts between the hand and object is standard for refining inaccurate hand poses and generating novel human grasp in 3D hand-object reconstruction. However, existing methods rely on geometric constraints that cannot be specified or controlled. This paper introduces a novel task of controllable 3D hand-object contact modeling with natural language descriptions. Challenges include i) the complexity of cross-modal modeling from language to contact, and ii) a lack of descriptive text for contact patterns. To address these issues, we propose NL2Contact, a model that generates controllable contacts by leveraging staged diffusion models. Provided with a language description of the hand and contact, NL2Contact generates realistic and faithful 3D hand-object contacts. To train the model, we build ContactDescribe, the first dataset with hand-centered contact descriptions. It contains multi-level and diverse descriptions generated by large language models, based on carefully designed prompts (e.g. grasp action, grasp type, contact location, free finger status). We show applications of our model to grasp pose optimization and novel human grasp generation, both based on a textual contact description

University of Birmingham Research Portal