8,396 research outputs found
s-LWSR: Super Lightweight Super-Resolution Network
Deep learning (DL) architectures for superresolution (SR) normally contain
tremendous parameters, which has been regarded as the crucial advantage for
obtaining satisfying performance. However, with the widespread use of mobile
phones for taking and retouching photos, this character greatly hampers the
deployment of DL-SR models on the mobile devices. To address this problem, in
this paper, we propose a super lightweight SR network: s-LWSR. There are mainly
three contributions in our work. Firstly, in order to efficiently abstract
features from the low resolution image, we build an information pool to mix
multi-level information from the first half part of the pipeline. Accordingly,
the information pool feeds the second half part with the combination of
hierarchical features from the previous layers. Secondly, we employ a
compression module to further decrease the size of parameters. Intensive
analysis confirms its capacity of trade-off between model complexity and
accuracy. Thirdly, by revealing the specific role of activation in deep models,
we remove several activation layers in our SR model to retain more information
for performance improvement. Extensive experiments show that our s-LWSR, with
limited parameters and operations, can achieve similar performance to other
cumbersome DL-SR methods
Online Video Super-Resolution with Convolutional Kernel Bypass Graft
Deep learning-based models have achieved remarkable performance in video
super-resolution (VSR) in recent years, but most of these models are less
applicable to online video applications. These methods solely consider the
distortion quality and ignore crucial requirements for online applications,
e.g., low latency and low model complexity. In this paper, we focus on online
video transmission, in which VSR algorithms are required to generate
high-resolution video sequences frame by frame in real time. To address such
challenges, we propose an extremely low-latency VSR algorithm based on a novel
kernel knowledge transfer method, named convolutional kernel bypass graft
(CKBG). First, we design a lightweight network structure that does not require
future frames as inputs and saves extra time costs for caching these frames.
Then, our proposed CKBG method enhances this lightweight base model by
bypassing the original network with ``kernel grafts'', which are extra
convolutional kernels containing the prior knowledge of external pretrained
image SR models. In the testing phase, we further accelerate the grafted
multi-branch network by converting it into a simple single-path structure.
Experiment results show that our proposed method can process online video
sequences up to 110 FPS, with very low model complexity and competitive SR
performance
Human-Machine Interface for Remote Training of Robot Tasks
Regardless of their industrial or research application, the streamlining of
robot operations is limited by the proximity of experienced users to the actual
hardware. Be it massive open online robotics courses, crowd-sourcing of robot
task training, or remote research on massive robot farms for machine learning,
the need to create an apt remote Human-Machine Interface is quite prevalent.
The paper at hand proposes a novel solution to the programming/training of
remote robots employing an intuitive and accurate user-interface which offers
all the benefits of working with real robots without imposing delays and
inefficiency. The system includes: a vision-based 3D hand detection and gesture
recognition subsystem, a simulated digital twin of a robot as visual feedback,
and the "remote" robot learning/executing trajectories using dynamic motion
primitives. Our results indicate that the system is a promising solution to the
problem of remote training of robot tasks.Comment: Accepted in IEEE International Conference on Imaging Systems and
Techniques - IST201
LHDR: HDR Reconstruction for Legacy Content using a Lightweight DNN
High dynamic range (HDR) image is widely-used in graphics and photography due
to the rich information it contains. Recently the community has started using
deep neural network (DNN) to reconstruct standard dynamic range (SDR) images
into HDR. Albeit the superiority of current DNN-based methods, their
application scenario is still limited: (1) heavy model impedes real-time
processing, and (2) inapplicable to legacy SDR content with more degradation
types. Therefore, we propose a lightweight DNN-based method trained to tackle
legacy SDR. For better design, we reform the problem modeling and emphasize
degradation model. Experiments show that our method reached appealing
performance with minimal computational cost compared with others.Comment: Accepted in ACCV202
MetaISP -- Exploiting Global Scene Structure for Accurate Multi-Device Color Rendition
Image signal processors (ISPs) are historically grown legacy software systems
for reconstructing color images from noisy raw sensor measurements. Each
smartphone manufacturer has developed its ISPs with its own characteristic
heuristics for improving the color rendition, for example, skin tones and other
visually essential colors. The recent interest in replacing the historically
grown ISP systems with deep-learned pipelines to match DSLR's image quality
improves structural features in the image. However, these works ignore the
superior color processing based on semantic scene analysis that distinguishes
mobile phone ISPs from DSLRs. Here, we present MetaISP, a single model designed
to learn how to translate between the color and local contrast characteristics
of different devices. MetaISP takes the RAW image from device A as input and
translates it to RGB images that inherit the appearance characteristics of
devices A, B, and C. We achieve this result by employing a lightweight deep
learning technique that conditions its output appearance based on the device of
interest. In this approach, we leverage novel attention mechanisms inspired by
cross-covariance to learn global scene semantics. Additionally, we use the
metadata that typically accompanies RAW images and estimate scene illuminants
when they are unavailable.Comment: VMV 2023, Project page: https://www.github.com/vccimaging/MetaIS
Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search
Though recent years have witnessed remarkable progress in single image
super-resolution (SISR) tasks with the prosperous development of deep neural
networks (DNNs), the deep learning methods are confronted with the computation
and memory consumption issues in practice, especially for resource-limited
platforms such as mobile devices. To overcome the challenge and facilitate the
real-time deployment of SISR tasks on mobile, we combine neural architecture
search with pruning search and propose an automatic search framework that
derives sparse super-resolution (SR) models with high image quality while
satisfying the real-time inference requirement. To decrease the search cost, we
leverage the weight sharing strategy by introducing a supernet and decouple the
search problem into three stages, including supernet construction,
compiler-aware architecture and pruning search, and compiler-aware pruning
ratio search. With the proposed framework, we are the first to achieve
real-time SR inference (with only tens of milliseconds per frame) for
implementing 720p resolution with competitive image quality (in terms of PSNR
and SSIM) on mobile platforms (Samsung Galaxy S20)
- …