Search CORE

13 research outputs found

Human 3D Avatar Modeling with Implicit Neural Representation: A Brief Survey

Author: Jiang Yang
Kou Dongliang
Shan Weihua
Sun Mingyang
Yan Zhe
Yang Dingkang
Zhang Lihua
Publication venue
Publication date: 06/06/2023
Field of study

A human 3D avatar is one of the important elements in the metaverse, and the modeling effect directly affects people's visual experience. However, the human body has a complex topology and diverse details, so it is often expensive, time-consuming, and laborious to build a satisfactory model. Recent studies have proposed a novel method, implicit neural representation, which is a continuous representation method and can describe objects with arbitrary topology at arbitrary resolution. Researchers have applied implicit neural representation to human 3D avatar modeling and obtained more excellent results than traditional methods. This paper comprehensively reviews the application of implicit neural representation in human body modeling. First, we introduce three implicit representations of occupancy field, SDF, and NeRF, and make a classification of the literature investigated in this paper. Then the application of implicit modeling methods in the body, hand, and head are compared and analyzed respectively. Finally, we point out the shortcomings of current work and provide available suggestions for researchers.Comment: A Brief Surve

arXiv.org e-Print Archive

Text-oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences

Author: Chen Jiawei
Lei Yuxuan
Li Mingcheng
Wang Shunli
Yang Dingkang
Zhang Lihua
Publication venue
Publication date: 24/07/2023
Field of study

Multimodal Sentiment Analysis (MSA) aims to mine sentiment information from text, visual, and acoustic modalities. Previous works have focused on representation learning and feature fusion strategies. However, most of these efforts ignored the disparity in the semantic richness of different modalities and treated each modality in the same manner. That may lead to strong modalities being neglected and weak modalities being overvalued. Motivated by these observations, we propose a Text-oriented Modality Reinforcement Network (TMRN), which focuses on the dominance of the text modality in MSA. More specifically, we design a Text-Centered Cross-modal Attention (TCCA) module to make full interaction for text/acoustic and text/visual pairs, and a Text-Gated Self-Attention (TGSA) module to guide the self-reinforcement of the other two modalities. Furthermore, we present an adaptive fusion mechanism to decide the proportion of different modalities involved in the fusion process. Finally, we combine the feature matrices into vectors to get the final representation for the downstream tasks. Experimental results show that our TMRN outperforms the state-of-the-art methods on two MSA benchmarks.Comment: Accepted by CICAI 2023 (Finalist of Best Student Paper Award

arXiv.org e-Print Archive

SAM3D: Zero-Shot 3D Object Detection via Segment Anything Model

Author: Bai Xiang
Liang Dingkang
Liu Zhe
Yang Hongcheng
Ye Xiaoqing
Zhang Dingyuan
Zou Zhikang
Publication venue
Publication date: 03/06/2023
Field of study

With the development of large language models, many remarkable linguistic systems like ChatGPT have thrived and achieved astonishing success on many tasks, showing the incredible power of foundation models. In the spirit of unleashing the capability of foundation models on vision tasks, the Segment Anything Model (SAM), a vision foundation model for image segmentation, has been proposed recently and presents strong zero-shot ability on many downstream 2D tasks. However, whether SAM can be adapted to 3D vision tasks has yet to be explored, especially 3D object detection. With this inspiration, we explore adapting the zero-shot ability of SAM to 3D object detection in this paper. We propose a SAM-powered BEV processing pipeline to detect objects and get promising results on the large-scale Waymo open dataset. As an early attempt, our method takes a step toward 3D object detection with vision foundation models and presents the opportunity to unleash their power on 3D vision tasks. The code is released at https://github.com/DYZhang09/SAM3D.Comment: Technical Report. The code is released at https://github.com/DYZhang09/SAM3

arXiv.org e-Print Archive

Boosting the Transferability of Adversarial Attacks with Global Momentum Initialization

Author: Chen Zhaoyu
Hong Lingyi
Jiang Kaixun
Wang Jiafeng
Wang Yan
Yang Dingkang
Zhang Wenqiang
Publication venue
Publication date: 21/11/2022
Field of study

Deep neural networks are vulnerable to adversarial examples, which attach human invisible perturbations to benign inputs. Simultaneously, adversarial examples exhibit transferability under different models, which makes practical black-box attacks feasible. However, existing methods are still incapable of achieving desired transfer attack performance. In this work, from the perspective of gradient optimization and consistency, we analyze and discover the gradient elimination phenomenon as well as the local momentum optimum dilemma. To tackle these issues, we propose Global Momentum Initialization (GI) to suppress gradient elimination and help search for the global optimum. Specifically, we perform gradient pre-convergence before the attack and carry out a global search during the pre-convergence stage. Our method can be easily combined with almost all existing transfer methods, and we improve the success rate of transfer attacks significantly by an average of 6.4% under various advanced defense mechanisms compared to state-of-the-art methods. Eventually, we achieve an attack success rate of 95.4%, fully illustrating the insecurity of existing defense mechanisms

arXiv.org e-Print Archive

Improving Generalization in Visual Reinforcement Learning via Conflict-aware Gradient Agreement Augmentation

Author: Chen Zhaoyu
Gan Zhongxue
Li Wei
Liu Siao
Liu Yang
Wang Yuzheng
Yang Dingkang
Yi Xie
Zhang Wenqiang
Zhao Zhile
Zhou Ziqing
Publication venue
Publication date: 02/08/2023
Field of study

Learning a policy with great generalization to unseen environments remains challenging but critical in visual reinforcement learning. Despite the success of augmentation combination in the supervised learning generalization, naively applying it to visual RL algorithms may damage the training efficiency, suffering from serve performance degradation. In this paper, we first conduct qualitative analysis and illuminate the main causes: (i) high-variance gradient magnitudes and (ii) gradient conflicts existed in various augmentation methods. To alleviate these issues, we propose a general policy gradient optimization framework, named Conflict-aware Gradient Agreement Augmentation (CG2A), and better integrate augmentation combination into visual RL algorithms to address the generalization bias. In particular, CG2A develops a Gradient Agreement Solver to adaptively balance the varying gradient magnitudes, and introduces a Soft Gradient Surgery strategy to alleviate the gradient conflicts. Extensive experiments demonstrate that CG2A significantly improves the generalization performance and sample efficiency of visual RL algorithms.Comment: accepted by iccv202

arXiv.org e-Print Archive

Context De-confounded Emotion Recognition

Author: Chen Zhaoyu
Dong Zhiyan
Huang Shuai
Li Mingcheng
Liu Siao
Wang Shunli
Wang Yuzheng
Yang Dingkang
Zhai Peng
Zhang Lihua
Zhao Xiao
Publication venue
Publication date: 26/03/2023
Field of study

Context-Aware Emotion Recognition (CAER) is a crucial and challenging task that aims to perceive the emotional states of the target person with contextual information. Recent approaches invariably focus on designing sophisticated architectures or mechanisms to extract seemingly meaningful representations from subjects and contexts. However, a long-overlooked issue is that a context bias in existing datasets leads to a significantly unbalanced distribution of emotional states among different context scenarios. Concretely, the harmful bias is a confounder that misleads existing models to learn spurious correlations based on conventional likelihood estimation, significantly limiting the models' performance. To tackle the issue, this paper provides a causality-based perspective to disentangle the models from the impact of such bias, and formulate the causalities among variables in the CAER task via a tailored causal graph. Then, we propose a Contextual Causal Intervention Module (CCIM) based on the backdoor adjustment to de-confound the confounder and exploit the true causal effect for model training. CCIM is plug-in and model-agnostic, which improves diverse state-of-the-art approaches by considerable margins. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our CCIM and the significance of causal insight.Comment: Accepted by CVPR 2023. CCIM is available at https://github.com/ydk122024/CCI

arXiv.org e-Print Archive

Direct field-to-pattern monolithic design of holographic metasurface via residual encoder-decoder convolutional neural network

Author: Bo Feng
Dingkang Yang
Hongya Chen
Jiafu Wang
Ruichao Zhu
Shaobo Qu
Tianshuo Qiu
Tonghao Liu
Yajuan Han
Zuntian Chu
Publication venue: Institue of Optics and Electronics, Chinese Academy of Sciences
Publication date: 01/08/2023
Field of study

Complex-amplitude holographic metasurfaces (CAHMs) with the flexibility in modulating phase and amplitude profiles have been used to manipulate the propagation of wavefront with an unprecedented level, leading to higher image-reconstruction quality compared with their natural counterparts. However, prevailing design methods of CAHMs are based on Huygens-Fresnel theory, meta-atom optimization, numerical simulation and experimental verification, which results in a consumption of computing resources. Here, we applied residual encoder-decoder convolutional neural network to directly map the electric field distributions and input images for monolithic metasurface design. A pretrained network is firstly trained by the electric field distributions calculated by diffraction theory, which is subsequently migrated as transfer learning framework to map the simulated electric field distributions and input images. The training results show that the normalized mean pixel error is about 3% on dataset. As verification, the metasurface prototypes are fabricated, simulated and measured. The reconstructed electric field of reverse-engineered metasurface exhibits high similarity to the target electric field, which demonstrates the effectiveness of our design. Encouragingly, this work provides a monolithic field-to-pattern design method for CAHMs, which paves a new route for the direct reconstruction of metasurfaces

Directory of Open Access Journals

Alighting Stop Determination of Unlinked Trips Based on a Two-Layer Stacking Framework

Author: Cheng Wang
Dingkang Yang
Jianwei Chen
Ting He
Wei Wei
Yueer Gao
Ziwei Cui
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2021
Field of study

Smart card data of conventional bus passengers are important basic data for many studies such as bus network optimization. As only boarding information is recorded in most cities, alighting stops need to be identified. The classical trip chain method can only detect destinations of passengers who have trip cycles. However, the rest of unlinked trips without destinations are hard to analyze. To improve the accuracy of existing methods for determining alighting stops of unlinked trips, a two-layer stacking-framework-based method is proposed in this work. In the first layer, five methods are used, i.e., high-frequency stop method, stop attraction method, transfer convenience method, land-use type attraction method, and improved group historical set method (I-GHSM). Among them, the last one is presented here to cluster records with similar behavior patterns into a group more accurately. In the second layer, the logistic regression model is selected to get the appropriate weight of each method in the former layer for different datasets, which brings the generalization ability. Taking data from Xiamen BRT Line Kuai 1 as an example, I-GHSM given in the first layer has proved to be necessary and effective. Besides, the two-layer stacking-framework-based method can detect all destinations of unlinked trips with an accuracy of 51.88%, and this accuracy is higher than that of comparison methods, i.e., the two-step algorithms with KNN (k-nearest neighbor), Decision Tree or Random Forest, and a step-by-step method. Results indicate that the framework-based method presented has high accuracy in identifying all alighting stops of unlinked trips

Directory of Open Access Journals

Comparative transcriptomic analysis reveals the mechanism of leech environmental adaptation

Author: Bin Wang
Cloonan
Conesa
Debin Wang
Derganc
Dingkang Wang
Fang Zhao
Feng Zhao
Firestein
Fogden
Gideroglu
Grabherr
Gross
Harley
Herlin
Hibsh
Iseli
Jiang
Kaiqing Liu
Kristan
Kumar
Kvist
Kvist
Kvist
Langmead
Lijiang Yang
Macagno
Mardis
McCarroll
Min
Mortazavi
Oceguera-Figueroa
Pertea
Phillips
Sankaran
Seymour
Shendure
Siddall
Simon
Tamura
Tasiemski
Tiao Ning
Trapnell
Wang
Xiangrong Tong
Yang
Yang
Yanjie Wang
Yong
Zavalova
Zhao
Zhao
Zhao
Zhao
Zichao Liu
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Clinical study on microsurgical treatment for craniopharyngioma in a single consecutive institutional series of 335 patients

Author: Bao
Buchfelder
Cavallo
Clark
Dhandapani
Dingkang Xu
Du
Fang Wang
Flitsch
Fomichey
Fuyou Guo
Gerganov
Greenfield
Gucev
Guoqing Wang
Hasegawa
Hoffman
Jeswani
Laijun Song
Lee
Lee
Lee
Lin
Liu
Lo
Mengzhao Feng
Moussazadeh
Park
Rao
Schoenfeld
Shi
Tan
Van Effenterre
Vigneyshwar Suresh
Xianzhi Liu
Xiaoyang Zhang
Yang
Yasargil
Zhang
Zhang
Zhao
Zuccaro
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref