698 research outputs found
Learned Smartphone ISP on Mobile GPUs with Deep Learning, Mobile AI & AIM 2022 Challenge: Report
The role of mobile cameras increased dramatically over the past few years,
leading to more and more research in automatic image quality enhancement and
RAW photo processing. In this Mobile AI challenge, the target was to develop an
efficient end-to-end AI-based image signal processing (ISP) pipeline replacing
the standard mobile ISPs that can run on modern smartphone GPUs using
TensorFlow Lite. The participants were provided with a large-scale Fujifilm
UltraISP dataset consisting of thousands of paired photos captured with a
normal mobile camera sensor and a professional 102MP medium-format FujiFilm
GFX100 camera. The runtime of the resulting models was evaluated on the
Snapdragon's 8 Gen 1 GPU that provides excellent acceleration results for the
majority of common deep learning ops. The proposed solutions are compatible
with all recent mobile GPUs, being able to process Full HD photos in less than
20-50 milliseconds while achieving high fidelity results. A detailed
description of all models developed in this challenge is provided in this
paper
Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions
Visual crowd counting has been recently studied as a way to enable people
counting in crowd scenes from images. Albeit successful, vision-based crowd
counting approaches could fail to capture informative features in extreme
conditions, e.g., imaging at night and occlusion. In this work, we introduce a
novel task of audiovisual crowd counting, in which visual and auditory
information are integrated for counting purposes. We collect a large-scale
benchmark, named auDiovISual Crowd cOunting (DISCO) dataset, consisting of
1,935 images and the corresponding audio clips, and 170,270 annotated
instances. In order to fuse the two modalities, we make use of a linear
feature-wise fusion module that carries out an affine transformation on visual
and auditory features. Finally, we conduct extensive experiments using the
proposed dataset and approach. Experimental results show that introducing
auditory information can benefit crowd counting under different illumination,
noise, and occlusion conditions. The dataset and code will be released. Code
and data have been made availabl
- …