10 research outputs found
An overview of video super-resolution algorithms
We investigate some excellent algorithms in the field of video space super-resolution based on artificial intelligence, structurally analyze the network structure of the algorithm and the commonly used loss functions. We also analyze the characteristics of algorithms in the new field of video space-time super-resolution. This work helps researchers to deeply understand the video super-resolution technology based on artificial intelligence
Video Face Super-Resolution with Motion-Adaptive Feedback Cell
Video super-resolution (VSR) methods have recently achieved a remarkable
success due to the development of deep convolutional neural networks (CNN).
Current state-of-the-art CNN methods usually treat the VSR problem as a large
number of separate multi-frame super-resolution tasks, at which a batch of low
resolution (LR) frames is utilized to generate a single high resolution (HR)
frame, and running a slide window to select LR frames over the entire video
would obtain a series of HR frames. However, duo to the complex temporal
dependency between frames, with the number of LR input frames increase, the
performance of the reconstructed HR frames become worse. The reason is in that
these methods lack the ability to model complex temporal dependencies and hard
to give an accurate motion estimation and compensation for VSR process. Which
makes the performance degrade drastically when the motion in frames is complex.
In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but
effective block, which can efficiently capture the motion compensation and feed
it back to the network in an adaptive way. Our approach efficiently utilizes
the information of the inter-frame motion, the dependence of the network on
motion estimation and compensation method can be avoid. In addition, benefiting
from the excellent nature of MAFC, the network can achieve better performance
in the case of extremely complex motion scenarios. Extensive evaluations and
comparisons validate the strengths of our approach, and the experimental
results demonstrated that the proposed framework is outperform the
state-of-the-art methods.Comment: To appear in AAAI 202
A Lightweight Recurrent Grouping Attention Network for Video Super-Resolution
Effective aggregation of temporal information of consecutive frames is the
core of achieving video super-resolution. Many scholars have utilized
structures such as sliding windows and recurrent to gather spatio-temporal
information of frames. However, although the performance of the constructed VSR
models is improving, the size of the models is also increasing, exacerbating
the demand on the equipment. Thus, to reduce the stress on the device, we
propose a novel lightweight recurrent grouping attention network. The
parameters of this model are only 0.878M, which is much lower than the current
mainstream model for studying video super-resolution. We design forward feature
extraction module and backward feature extraction module to collect temporal
information between consecutive frames from two directions. Moreover, a new
grouping mechanism is proposed to efficiently collect spatio-temporal
information of the reference frame and its neighboring frames. The attention
supplementation module is presented to further enhance the information
gathering range of the model. The feature reconstruction module aims to
aggregate information from different directions to reconstruct high-resolution
features. Experiments demonstrate that our model achieves state-of-the-art
performance on multiple datasets
TMP: Temporal Motion Propagation for Online Video Super-Resolution
Online video super-resolution (online-VSR) highly relies on an effective
alignment module to aggregate temporal information, while the strict latency
requirement makes accurate and efficient alignment very challenging. Though
much progress has been achieved, most of the existing online-VSR methods
estimate the motion fields of each frame separately to perform alignment, which
is computationally redundant and ignores the fact that the motion fields of
adjacent frames are correlated. In this work, we propose an efficient Temporal
Motion Propagation (TMP) method, which leverages the continuity of motion field
to achieve fast pixel-level alignment among consecutive frames. Specifically,
we first propagate the offsets from previous frames to the current frame, and
then refine them in the neighborhood, which significantly reduces the matching
space and speeds up the offset estimation process. Furthermore, to enhance the
robustness of alignment, we perform spatial-wise weighting on the warped
features, where the positions with more precise offsets are assigned higher
importance. Experiments on benchmark datasets demonstrate that the proposed TMP
method achieves leading online-VSR accuracy as well as inference speed. The
source code of TMP can be found at https://github.com/xtudbxk/TMP
Detection of dish manufacturing defects using a deep learning-based approach
Quality control is essential to ensure the smooth running of an industrial process. This work
proposes to use and adapt a deep learning-based algorithm that will integrate an automatic
quality control system at a porcelain dish factory. This system will receive images acquired in
real time by high resolution cameras directly placed on production line. The algorithm proposed
in this research work will classify the dishes presented in the images as "defective" or "without
defect". Therefore, the objective of the system will be the detection of defective dishes, causing
fewer defective dishes to reach the market, thus contributing to a better reputation of the factory.
This system is based on the application of an algorithm called Convolutional Neural
Network. This algorithm requires a large amount of data to be trained and to perform the image
classification. Since the COVID-19 pandemic was felt on a larger scale in Portugal at the time
of the development of this research work, it was impossible to obtain data directly from the
factory. Due to this setback, the data used in this work was artificially generated. By providing
the complete images of dishes to the algorithm, it achieved a defect detection accuracy of 92.7%
with the first dataset and 91.9%. with the second. When providing the algorithm 100x100 pixel
segments of the original images, using the second created dataset, it reached 91.6% accuracy in
the classification of these segments, which translated into a 52.0% accuracy rate in the
classification of the complete dish images.O controlo de qualidade é fundamental para assegurar o bom funcionamento de um processo
industrial. Este trabalho propõe a utilização e adaptação de um algoritmo, baseado em
aprendizagem profunda, como parte integrante de um sistema automático de controlo de
qualidade numa fábrica de pratos de porcelana. Este sistema receberá imagens adquiridas em
tempo real por câmaras fotográficas colocadas diretamente sobre a linha de produção. O
algoritmo utilizado classificará os pratos presentes nas imagens como "defeituoso" ou "sem
defeito". O objetivo do sistema será, portanto, a deteção de pratos defeituosos, fazendo com
que menos pratos com defeito cheguem ao mercado, contribuindo assim para uma melhor
reputação da fábrica.
Este sistema é baseado na aplicação de uma rede neuronal convolucional. Este tipo de redes
requer um elevado número de dados para ser treinado de modo a conseguir realizar a
classificação de imagens. Uma vez que a pandemia de COVID-19 se fez sentir em maior escala
em Portugal na altura do desenvolvimento deste trabalho, foi impossível a obtenção de imagens
provenientes da fábrica. Devido a este contratempo, os dados utilizados neste trabalho foram
gerados artificialmente. Ao fornecer imagens completas de pratos ao algoritmo, o mesmo
atingiu uma taxa de acerto da deteção de defeitos de 92,7% com o primeiro conjunto de dados
e 91,9% com o segundo. Ao fornecer ao algoritmo segmentos de 100x100 pixéis da imagem
original, o mesmo atingiu 91,6% de taxa de acerto, o que se traduziu numa taxa de acerto de
52,0% na classificação das imagens completas de pratos
Interpretability and Generalization of Deep Low-Level Vision Models
The low-level vision task is an important type of task in computer vision, including various image
restoration tasks, such as image super-resolution, image denoising, image deraining, etc. In recent
years, deep learning technology has become the de facto method for solving low-level vision
problems, relying on its excellent performance and ease of use. By training on large amounts of
paired data, it is anticipated that deep low-level vision models can learn rich semantic knowledge and
process images in an intelligent manner for real-world applications. However, because our
understanding of deep learning models and low-level vision tasks is not deep enough, we cannot
explain the success and failure of these deep low-level vision models. Deep learning models are
widely acknowledged as ``black boxes'' due to their complexity and non-linearity. We cannot know
what information the model used when processing the input or whether it learned what we wanted.
When there is a problem with the model, we cannot identify the underlying source of the problem,
such as the generalization problem of the low-level vision model. This research proposes
interpretability analysis of deep low-level vision models to gain a more profound insight into the deep
learning models for low-level vision tasks. I aim to elucidate the mechanisms of the deep learning
approach and to discern insights regarding the successes or shortcomings of these methods. This is
the first study to perform interpretability analysis on the deep low-level vision model