Search CORE

33 research outputs found

Visual Recognition with Deep Nearest Centroids

Author: Han Cheng
Liu Dongfang
Wang Wenguan
Zhou Tianfei
Publication venue
Publication date: 14/03/2023
Field of study

We devise deep nearest centroids (DNC), a conceptually elegant yet surprisingly effective network for large-scale visual recognition, by revisiting Nearest Centroids, one of the most classic and simple classifiers. Current deep models learn the classifier in a fully parametric manner, ignoring the latent data structure and lacking simplicity and explainability. DNC instead conducts nonparametric, case-based reasoning; it utilizes sub-centroids of training samples to describe class distributions and clearly explains the classification as the proximity of test data and the class sub-centroids in the feature space. Due to the distance-based nature, the network output dimensionality is flexible, and all the learnable parameters are only for data embedding. That means all the knowledge learnt for ImageNet classification can be completely transferred for pixel recognition learning, under the "pre-training and fine-tuning" paradigm. Apart from its nested simplicity and intuitive decision-making mechanism, DNC can even possess ad-hoc explainability when the sub-centroids are selected as actual training images that humans can view and inspect. Compared with parametric counterparts, DNC performs better on image classification (CIFAR-10, ImageNet) and greatly boots pixel recognition (ADE20K, Cityscapes), with improved transparency and fewer learnable parameters, using various network architectures (ResNet, Swin) and segmentation models (FCN, DeepLabV3, Swin). We feel this work brings fundamental insights into related fields.Comment: 23 pages, 8 figure

arXiv.org e-Print Archive

Bird's-Eye-View Scene Graph for Vision-Language Navigation

Author: Liu Rui
Wang Wenguan
Wang Xiaohan
Yang Yi
Publication venue
Publication date: 12/08/2023
Field of study

Vision-language navigation (VLN), which entails an agent to navigate 3D environments following human instructions, has shown great advances. However, current agents are built upon panoramic observations, which hinders their ability to perceive 3D scene geometry and easily leads to ambiguous selection of panoramic view. To address these limitations, we present a BEV Scene Graph (BSG), which leverages multi-step BEV representations to encode scene layouts and geometric cues of indoor environment under the supervision of 3D detection. During navigation, BSG builds a local BEV representation at each step and maintains a BEV-based global scene map, which stores and organizes all the online collected local BEV representations according to their topological relations. Based on BSG, the agent predicts a local BEV grid-level decision score and a global graph-level decision score, combined with a sub-view selection score on panoramic views, for more accurate action prediction. Our approach significantly outperforms state-of-the-art methods on REVERIE, R2R, and R4R, showing the potential of BEV perception in VLN.Comment: Accepted at ICCV 2023; Project page: https://github.com/DefaultRui/BEV-Scene-Grap

arXiv.org e-Print Archive

Omnidirectional Information Gathering for Knowledge Transfer-based Audio-Visual Navigation

Author: Chen Jinyu
Li Hongsheng
Liu Si
Wang Wenguan
Yang Yi
Publication venue
Publication date: 20/08/2023
Field of study

Audio-visual navigation is an audio-targeted wayfinding task where a robot agent is entailed to travel a never-before-seen 3D environment towards the sounding source. In this article, we present ORAN, an omnidirectional audio-visual navigator based on cross-task navigation skill transfer. In particular, ORAN sharpens its two basic abilities for a such challenging task, namely wayfinding and audio-visual information gathering. First, ORAN is trained with a confidence-aware cross-task policy distillation (CCPD) strategy. CCPD transfers the fundamental, point-to-point wayfinding skill that is well trained on the large-scale PointGoal task to ORAN, so as to help ORAN to better master audio-visual navigation with far fewer training samples. To improve the efficiency of knowledge transfer and address the domain gap, CCPD is made to be adaptive to the decision confidence of the teacher policy. Second, ORAN is equipped with an omnidirectional information gathering (OIG) mechanism, i.e., gleaning visual-acoustic observations from different directions before decision-making. As a result, ORAN yields more robust navigation behaviour. Taking CCPD and OIG together, ORAN significantly outperforms previous competitors. After the model ensemble, we got 1st in Soundspaces Challenge 2022, improving SPL and SR by 53% and 35% relatively.Comment: ICCV 202

arXiv.org e-Print Archive

Differentiable Multi-Granularity Human Representation Learning for Instance-Aware Human Semantic Parsing

Author: Liu Si
Van Gool Luc
Wang Wenguan
Yang Yi
Zhou Tianfei
Publication venue
Publication date: 01/01/2021
Field of study

To address the challenging task of instance-aware human part parsing, a new bottom-up regime is proposed to learn category-level human semantic segmentation as well as multi-person pose estimation in a joint and end-to-end manner. It is a compact, efficient and powerful framework that exploits structural information over different human granularities and eases the difficulty of person partitioning. Specifically, a dense-to-sparse projection field, which allows explicitly associating dense human semantics with sparse keypoints, is learnt and progressively improved over the network feature pyramid for robustness. Then, the difficult pixel grouping problem is cast as an easier, multi-person joint assembling task. By formulating joint association as maximum-weight bipartite matching, a differentiable solution is developed to exploit projected gradient descent and Dykstra's cyclic projection algorithm. This makes our method end-to-end trainable and allows back-propagating the grouping error to directly supervise multi-granularity human representation learning. This is distinguished from current bottom-up human parsers or pose estimators which require sophisticated post-processing or heuristic greedy algorithms. Experiments on three instance-aware human parsing datasets show that our model outperforms other bottom-up alternatives with much more efficient inference.Comment: CVPR 2021 (Oral). Code: https://github.com/tfzhou/MG-HumanParsin

arXiv.org e-Print Archive

Repository for Publications and Research Data

OPUS - University of Technology Sydney

ClusterFormer: Clustering As A Universal Visual Learner

Author: Cui Yiming
Geng Tong
Liang James C.
Liu Dongfang
Wang Qifan
Wang Wenguan
Publication venue
Publication date: 05/10/2023
Field of study

This paper presents CLUSTERFORMER, a universal vision model that is based on the CLUSTERing paradigm with TransFORMER. It comprises two novel designs: 1. recurrent cross-attention clustering, which reformulates the cross-attention mechanism in Transformer and enables recursive updates of cluster centers to facilitate strong representation learning; and 2. feature dispatching, which uses the updated cluster centers to redistribute image features through similarity-based metrics, resulting in a transparent pipeline. This elegant design streamlines an explainable and transferable workflow, capable of tackling heterogeneous vision tasks (i.e., image classification, object detection, and image segmentation) with varying levels of clustering granularity (i.e., image-, box-, and pixel-level). Empirical results demonstrate that CLUSTERFORMER outperforms various well-known specialized architectures, achieving 83.41% top-1 acc. over ImageNet-1K for image classification, 54.2% and 47.0% mAP over MS COCO for object detection and instance segmentation, 52.4% mIoU over ADE20K for semantic segmentation, and 55.8% PQ over COCO Panoptic for panoptic segmentation. For its efficacy, we hope our work can catalyze a paradigm shift in universal models in computer vision

arXiv.org e-Print Archive

Target-Driven Structured Transformer Planner for Vision-Language Navigation

Author: Chen Jinyu
Gao Chen
Liu Si
Ren Haibing
Wang Wenguan
Xia Huaxia
Yang Lirong
Zhao Yusheng
Publication venue
Publication date: 19/07/2022
Field of study

Vision-language navigation is the task of directing an embodied agent to navigate in 3D scenes with natural language instructions. For the agent, inferring the long-term navigation target from visual-linguistic clues is crucial for reliable path planning, which, however, has rarely been studied before in literature. In this article, we propose a Target-Driven Structured Transformer Planner (TD-STP) for long-horizon goal-guided and room layout-aware navigation. Specifically, we devise an Imaginary Scene Tokenization mechanism for explicit estimation of the long-term target (even located in unexplored environments). In addition, we design a Structured Transformer Planner which elegantly incorporates the explored room layout into a neural attention architecture for structured and global planning. Experimental results demonstrate that our TD-STP substantially improves previous best methods' success rate by 2% and 5% on the test set of R2R and REVERIE benchmarks, respectively. Our code is available at https://github.com/YushengZhao/TD-STP

arXiv.org e-Print Archive

Real-time superpixel segmentation by DBSCAN clustering algorithm

Author: Hao Xiaopeng
Liang Zhiyuan
Liu Yu
Shao Ling
Shen Jianbing
Wang Wenguan
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 11/10/2016
Field of study

In this paper, we propose a real-time image superpixel segmentation method with 50 frames/s by using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. In order to decrease the computational costs of superpixel algorithms, we adopt a fast two-step framework. In the first clustering stage, the DBSCAN algorithm with color-similarity and geometric restrictions is used to rapidly cluster the pixels, and then, small clusters are merged into superpixels by their neighborhood through a distance measurement defined by color and spatial features in the second merging stage. A robust and simple distance function is defined for obtaining better superpixels in these two steps. The experimental results demonstrate that our real-time superpixel algorithm (50 frames/s) by the DBSCAN clustering outperforms the state-of-the-art superpixel segmentation methods in terms of both accuracy and efficiency

Northumbria Research Link

Crossref

University of East Anglia digital repository

Organic-Inorganic Perovskite Light-Emitting Electrochemical Cells with a Large Capacitance

Author: Choy Wallace C. H.
He Zhiqun
Li Dan
Li Han
Liang Chunjun
Liang Jingjing
Lin Hong
Liu Hong
Polizzi Stefano
Sun Mengjie
Xiao Weikang
Zhang Fujun
Zhang Huimin
Zhang Wenguan
Zhao Yong
Publication venue: 'Wiley'
Publication date: 01/01/2015
Field of study

While perovskite light-emitting diodes typically made with high work function anodes and low work function cathodes have recently gained intense interests. Perovskite light-emitting devices with two high work function electrodes with interesting features are demonstrated here. Firstly, electroluminescence can be easily obtained from both forward and reverse biases. Secondly, the results of impedance spectroscopy indicate that the ionic conductivity in the iodide perovskite (CH3 NH3PbI3) is large with a value of approximate to 10(-8) S cm(-1). Thirdly, the shift of the emission spectrum in the mixed halide perovskite (CH3NH3PbI3-Br-x(x)) light-emitting devices indicates that I(-)ions are mobile in the perovskites. Fourthly, this work shows that the accumulated ions at the interfaces result in a large capacitance (approximate to 100 mu F cm(-2)). The above results conclusively prove that the organic-inorganic halide perovskites are solid electrolytes with mixed ionic and electronic conductivity and the light-emitting device is a light-emitting electrochemical cell. The work also suggests that the organic-inorganic halide perovskites are potential energy-storage materials, which may be applicable in the field of solid-state supercapacitors and batteries.While perovskite light-emitting diodes typically made with high work function anodes and low work function cathodes have recently gained intense interests. Perovskite light-emitting devices with two high work function electrodes with interesting features are demonstrated here. Firstly, electroluminescence can be easily obtained from both forward and reverse biases. Secondly, the results of impedance spectroscopy indicate that the ionic conductivity in the iodide perovskite (CH3NH3PbI3) is large with a value of ≈10-8 S cm-1. Thirdly, the shift of the emission spectrum in the mixed halide perovskite (CH3NH3PbI3-xBrx) light-emitting devices indicates that I- ions are mobile in the perovskites. Fourthly, this work shows that the accumulated ions at the interfaces result in a large capacitance (≈100 μF cm-2). The above results conclusively prove that the organic-inorganic halide perovskites are solid electrolytes with mixed ionic and electronic conductivity and the light-emitting device is a light-emitting electrochemical cell. The work also suggests that the organic-inorganic halide perovskites are potential energy-storage materials, which may be applicable in the field of solid-state supercapacitors and batteries. Light-emitting electrochemical cells (LECs) of organic-inorganic perovskite (CH3NH3PbI3) with two high work function electrodes are demonstrated. Results indicate that CH3NH3PbI3 has an ionic conductivity of ≈10-8 S cm-1. The accumulated ions at the interfaces result in a large capacitance, which suggests a potential application in electrochemical energy-storage devices, such as solid-state supercapacitors and batteries

Archivio Ricerca Ca'Foscari

Crossref

Archivio istituzionale della ricerca - Università degli Studi di Venezia Ca' Foscari

HKU Scholars Hub

Guest Editorial: Learning from limited annotations for computer vision tasks

Author: Dongfang Liu
Jin Zheng
Qiang Wu
Wenguan Wang
Yazhou Yao
Publication venue: Wiley
Publication date: 01/08/2023
Field of study

Directory of Open Access Journals