Search CORE

474 research outputs found

Object Detection in 20 Years: A Survey

Author: Guo Yuhong
Shi Zhenwei
Ye Jieping
Zou Zhengxia
Publication venue
Publication date: 15/05/2019
Field of study

Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

arXiv.org e-Print Archive

Progressive Scale-aware Network for Remote sensing Image Change Captioning

Author: Liu Chenyang
Qi Zipeng
Shi Zhenwei
Yang Jiajun
Zou Zhengxia
Publication venue
Publication date: 18/12/2023
Field of study

Remote sensing (RS) images contain numerous objects of different scales, which poses significant challenges for the RS image change captioning (RSICC) task to identify visual changes of interest in complex scenes and describe them via language. However, current methods still have some weaknesses in sufficiently extracting and utilizing multi-scale information. In this paper, we propose a progressive scale-aware network (PSNet) to address the problem. PSNet is a pure Transformer-based model. To sufficiently extract multi-scale visual features, multiple progressive difference perception (PDP) layers are stacked to progressively exploit the differencing features of bitemporal features. To sufficiently utilize the extracted multi-scale features for captioning, we propose a scale-aware reinforcement (SR) module and combine it with the Transformer decoding layer to progressively utilize the features from different PDP layers. Experiments show that the PDP layer and SR module are effective and our PSNet outperforms previous methods. Our code is public at https://github.com/Chen-Yang-Liu/PSNetComment: IGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposiu

arXiv.org e-Print Archive

Dense Pixel-to-Pixel Harmonization via Continuous Image Representation

Author: Chen Jianqi
Chen Keyan
Shi Zhenwei
Zhang Yilan
Zou Zhengxia
Publication venue
Publication date: 30/11/2023
Field of study

High-resolution (HR) image harmonization is of great significance in real-world applications such as image synthesis and image editing. However, due to the high memory costs, existing dense pixel-to-pixel harmonization methods are mainly focusing on processing low-resolution (LR) images. Some recent works resort to combining with color-to-color transformations but are either limited to certain resolutions or heavily depend on hand-crafted image filters. In this work, we explore leveraging the implicit neural representation (INR) and propose a novel image Harmonization method based on Implicit neural Networks (HINet), which to the best of our knowledge, is the first dense pixel-to-pixel method applicable to HR images without any hand-crafted filter design. Inspired by the Retinex theory, we decouple the MLPs into two parts to respectively capture the content and environment of composite images. A Low-Resolution Image Prior (LRIP) network is designed to alleviate the Boundary Inconsistency problem, and we also propose new designs for the training and inference process. Extensive experiments have demonstrated the effectiveness of our method compared with state-of-the-art methods. Furthermore, some interesting and practical applications of the proposed method are explored. Our code is available at https://github.com/WindVChen/INR-Harmonization.Comment: Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT

arXiv.org e-Print Archive

Implicit Ray-Transformers for Multi-view Remote Sensing Image Segmentation

Author: Chen Hao
Liu Chenyang
Qi Zipeng
Shi Zhenwei
Zou Zhengxia
Publication venue
Publication date: 15/03/2023
Field of study

The mainstream CNN-based remote sensing (RS) image semantic segmentation approaches typically rely on massive labeled training data. Such a paradigm struggles with the problem of RS multi-view scene segmentation with limited labeled views due to the lack of considering 3D information within the scene. In this paper, we propose ''Implicit Ray-Transformer (IRT)'' based on Implicit Neural Representation (INR), for RS scene semantic segmentation with sparse labels (such as 4-6 labels per 100 images). We explore a new way of introducing multi-view 3D structure priors to the task for accurate and view-consistent semantic segmentation. The proposed method includes a two-stage learning process. In the first stage, we optimize a neural field to encode the color and 3D structure of the remote sensing scene based on multi-view images. In the second stage, we design a Ray Transformer to leverage the relations between the neural field 3D features and 2D texture features for learning better semantic representations. Different from previous methods that only consider 3D prior or 2D features, we incorporate additional 2D texture information and 3D prior by broadcasting CNN features to different point features along the sampled ray. To verify the effectiveness of the proposed method, we construct a challenging dataset containing six synthetic sub-datasets collected from the Carla platform and three real sub-datasets from Google Maps. Experiments show that the proposed method outperforms the CNN-based methods and the state-of-the-art INR-based segmentation methods in quantitative and qualitative metrics

arXiv.org e-Print Archive

Continuous Cross-resolution Remote Sensing Image Change Detection

Author: Chen Hao
Chen Keyan
Chen Song
Shi Zhenwei
Zhang Haotian
Zhou Chenyao
Zou Zhengxia
Publication venue
Publication date: 21/10/2023
Field of study

Most contemporary supervised Remote Sensing (RS) image Change Detection (CD) approaches are customized for equal-resolution bitemporal images. Real-world applications raise the need for cross-resolution change detection, aka, CD based on bitemporal images with different spatial resolutions. Given training samples of a fixed bitemporal resolution difference (ratio) between the high-resolution (HR) image and the low-resolution (LR) one, current cross-resolution methods may fit a certain ratio but lack adaptation to other resolution differences. Toward continuous cross-resolution CD, we propose scale-invariant learning to enforce the model consistently predicting HR results given synthesized samples of varying resolution differences. Concretely, we synthesize blurred versions of the HR image by random downsampled reconstructions to reduce the gap between HR and LR images. We introduce coordinate-based representations to decode per-pixel predictions by feeding the coordinate query and corresponding multi-level embedding features into an MLP that implicitly learns the shape of land cover changes, therefore benefiting recognizing blurred objects in the LR image. Moreover, considering that spatial resolution mainly affects the local textures, we apply local-window self-attention to align bitemporal features during the early stages of the encoder. Extensive experiments on two synthesized and one real-world different-resolution CD datasets verify the effectiveness of the proposed method. Our method significantly outperforms several vanilla CD methods and two cross-resolution CD methods on the three datasets both in in-distribution and out-of-distribution settings. The empirical results suggest that our method could yield relatively consistent HR change predictions regardless of varying bitemporal resolution ratios. Our code is available at \url{https://github.com/justchenhao/SILI_CD}.Comment: 21 pages, 11 figures. Accepted article by IEEE TGR

arXiv.org e-Print Archive

Continuous Remote Sensing Image Super-Resolution based on Context Interaction in Implicit Function Space

Author: Chen Jianqi
Chen Keyan
Jiang Xiaolong
Lei Sen
Li Wenyuan
Shi Zhenwei
Zou Zhengxia
Publication venue
Publication date: 15/02/2023
Field of study

Despite its fruitful applications in remote sensing, image super-resolution is troublesome to train and deploy as it handles different resolution magnifications with separate models. Accordingly, we propose a highly-applicable super-resolution framework called FunSR, which settles different magnifications with a unified model by exploiting context interaction within implicit function space. FunSR composes a functional representor, a functional interactor, and a functional parser. Specifically, the representor transforms the low-resolution image from Euclidean space to multi-scale pixel-wise function maps; the interactor enables pixel-wise function expression with global dependencies; and the parser, which is parameterized by the interactor's output, converts the discrete coordinates with additional attributes to RGB values. Extensive experimental results demonstrate that FunSR reports state-of-the-art performance on both fixed-magnification and continuous-magnification settings, meanwhile, it provides many friendly applications thanks to its unified nature

arXiv.org e-Print Archive

Intelligent Exploration for User Interface Modules of Mobile App with Collective Learning

Author: Ge Xiang
Tang Zhenwei
Xiong Hui
Yang Chenglei
Zhao Min
Zhou Jingbo
Zhou Meng
Zhuang Fuzhen
Zou Liming
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 31/08/2020
Field of study

A mobile app interface usually consists of a set of user interface modules. How to properly design these user interface modules is vital to achieving user satisfaction for a mobile app. However, there are few methods to determine design variables for user interface modules except for relying on the judgment of designers. Usually, a laborious post-processing step is necessary to verify the key change of each design variable. Therefore, there is a only very limited amount of design solutions that can be tested. It is timeconsuming and almost impossible to figure out the best design solutions as there are many modules. To this end, we introduce FEELER, a framework to fast and intelligently explore design solutions of user interface modules with a collective machine learning approach. FEELER can help designers quantitatively measure the preference score of different design solutions, aiming to facilitate the designers to conveniently and quickly adjust user interface module. We conducted extensive experimental evaluations on two real-life datasets to demonstrate its applicability in real-life cases of user interface module design in the Baidu App, which is one of the most popular mobile apps in China.Comment: 10 pages, accepted as a full paper in KDD 202

arXiv.org e-Print Archive

Crossref

Constructing quantum dots@flake g-C3N4 isotype heterojunctions for enhanced visible-light-driven NADH regeneration and enzymatic hydrogenation

Author: Han Pingping
Jiang Zhongyi
Shi Jiafu
Tong Zhenwei
Wang Xiaodong
Wu Yizhou
Yang Dong
Zhang Shaohua
Zou Hongjian
Publication venue: 'American Chemical Society (ACS)'
Publication date: 10/05/2017
Field of study

The authors thank the financial support from National Natural Science Funds of China (21406163, 91534126, 21621004), Tianjin Research Program of Application Foundation and Advanced Technology (15JCQNJC10000), Open Funding Project of the National Key Laboratory of Biochemical Engineering (2015KF-03), and the Program of Introducing Talents of Discipline to Universities (B06006). X.W. also acknowledges financial support from The Carnegie Trust for the Universities of Scotland (70265) and The Royal Society (RG150001 and IE150611).Peer reviewedPostprin

Aberdeen University Research

Crossref

Institutional Repository of Institute of Process Engineering, CAS (IPE-IR）

Lancaster E-Prints