Search CORE

698 research outputs found

Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report

Author: Bai Furui
Cai Mingxuan
Chen Zewen
Cho Minhyeok
Choi Ui-Jin
Conde Marcos V.
Dong Mengchuan
Ershov Egor
Fan Zhihao
Feng Chaoyu
Hui Zheng
Ignatov Andrey
Kong Dehui
Kwon Minsu
Lei Lei
Li Ran
Li Shaoqing
Liu Shuai
Liu Zibin
Lou Xin
No Albert
Pang Cong
Perevozchikov Georgy
Qin Haina
Shi Keming
Timofte Radu
Wang Juan
Wang Xiaotao
Wang Zhiming
Wu Xun
Wu Yaqi
Xiang Yan
Xu Ke
Yi Ziyao
Zhang Feng
Zhang Xiaze
Zheng Jiesi
Zhou Wei
Publication venue
Publication date: 07/11/2022
Field of study

The role of mobile cameras increased dramatically over the past few years, leading to more and more research in automatic image quality enhancement and RAW photo processing. In this Mobile AI challenge, the target was to develop an efficient end-to-end AI-based image signal processing (ISP) pipeline replacing the standard mobile ISPs that can run on modern smartphone GPUs using TensorFlow Lite. The participants were provided with a large-scale Fujifilm UltraISP dataset consisting of thousands of paired photos captured with a normal mobile camera sensor and a professional 102MP medium-format FujiFilm GFX100 camera. The runtime of the resulting models was evaluated on the Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the majority of common deep learning ops. The proposed solutions are compatible with all recent mobile GPUs, being able to process Full HD photos in less than 20-50 milliseconds while achieving high fidelity results. A detailed description of all models developed in this challenge is provided in this paper

arXiv.org e-Print Archive

Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions

Author: Dou Dejing
Gao Junyu
Hu Di
Hua Yuansheng
Mou Lichao
Wang Qingzhong
Zhu Xiao Xiang
Publication venue
Publication date: 16/05/2020
Field of study

Visual crowd counting has been recently studied as a way to enable people counting in crowd scenes from images. Albeit successful, vision-based crowd counting approaches could fail to capture informative features in extreme conditions, e.g., imaging at night and occlusion. In this work, we introduce a novel task of audiovisual crowd counting, in which visual and auditory information are integrated for counting purposes. We collect a large-scale benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of 1,935 images and the corresponding audio clips, and 170,270 annotated instances. In order to fuse the two modalities, we make use of a linear feature-wise fusion module that carries out an affine transformation on visual and auditory features. Finally, we conduct extensive experiments using the proposed dataset and approach. Experimental results show that introducing auditory information can benefit crowd counting under different illumination, noise, and occlusion conditions. The dataset and code will be released. Code and data have been made availabl

arXiv.org e-Print Archive

Institute of Transport Research:Publications