10 research outputs found

    An overview of video super-resolution algorithms

    Get PDF
    We investigate some excellent algorithms in the field of video space super-resolution based on artificial intelligence, structurally analyze the network structure of the algorithm and the commonly used loss functions. We also analyze the characteristics of algorithms in the new field of video space-time super-resolution. This work helps researchers to deeply understand the video super-resolution technology based on artificial intelligence

    Video Face Super-Resolution with Motion-Adaptive Feedback Cell

    Full text link
    Video super-resolution (VSR) methods have recently achieved a remarkable success due to the development of deep convolutional neural networks (CNN). Current state-of-the-art CNN methods usually treat the VSR problem as a large number of separate multi-frame super-resolution tasks, at which a batch of low resolution (LR) frames is utilized to generate a single high resolution (HR) frame, and running a slide window to select LR frames over the entire video would obtain a series of HR frames. However, duo to the complex temporal dependency between frames, with the number of LR input frames increase, the performance of the reconstructed HR frames become worse. The reason is in that these methods lack the ability to model complex temporal dependencies and hard to give an accurate motion estimation and compensation for VSR process. Which makes the performance degrade drastically when the motion in frames is complex. In this paper, we propose a Motion-Adaptive Feedback Cell (MAFC), a simple but effective block, which can efficiently capture the motion compensation and feed it back to the network in an adaptive way. Our approach efficiently utilizes the information of the inter-frame motion, the dependence of the network on motion estimation and compensation method can be avoid. In addition, benefiting from the excellent nature of MAFC, the network can achieve better performance in the case of extremely complex motion scenarios. Extensive evaluations and comparisons validate the strengths of our approach, and the experimental results demonstrated that the proposed framework is outperform the state-of-the-art methods.Comment: To appear in AAAI 202

    A Lightweight Recurrent Grouping Attention Network for Video Super-Resolution

    Full text link
    Effective aggregation of temporal information of consecutive frames is the core of achieving video super-resolution. Many scholars have utilized structures such as sliding windows and recurrent to gather spatio-temporal information of frames. However, although the performance of the constructed VSR models is improving, the size of the models is also increasing, exacerbating the demand on the equipment. Thus, to reduce the stress on the device, we propose a novel lightweight recurrent grouping attention network. The parameters of this model are only 0.878M, which is much lower than the current mainstream model for studying video super-resolution. We design forward feature extraction module and backward feature extraction module to collect temporal information between consecutive frames from two directions. Moreover, a new grouping mechanism is proposed to efficiently collect spatio-temporal information of the reference frame and its neighboring frames. The attention supplementation module is presented to further enhance the information gathering range of the model. The feature reconstruction module aims to aggregate information from different directions to reconstruct high-resolution features. Experiments demonstrate that our model achieves state-of-the-art performance on multiple datasets

    TMP: Temporal Motion Propagation for Online Video Super-Resolution

    Full text link
    Online video super-resolution (online-VSR) highly relies on an effective alignment module to aggregate temporal information, while the strict latency requirement makes accurate and efficient alignment very challenging. Though much progress has been achieved, most of the existing online-VSR methods estimate the motion fields of each frame separately to perform alignment, which is computationally redundant and ignores the fact that the motion fields of adjacent frames are correlated. In this work, we propose an efficient Temporal Motion Propagation (TMP) method, which leverages the continuity of motion field to achieve fast pixel-level alignment among consecutive frames. Specifically, we first propagate the offsets from previous frames to the current frame, and then refine them in the neighborhood, which significantly reduces the matching space and speeds up the offset estimation process. Furthermore, to enhance the robustness of alignment, we perform spatial-wise weighting on the warped features, where the positions with more precise offsets are assigned higher importance. Experiments on benchmark datasets demonstrate that the proposed TMP method achieves leading online-VSR accuracy as well as inference speed. The source code of TMP can be found at https://github.com/xtudbxk/TMP

    Detection of dish manufacturing defects using a deep learning-based approach

    Get PDF
    Quality control is essential to ensure the smooth running of an industrial process. This work proposes to use and adapt a deep learning-based algorithm that will integrate an automatic quality control system at a porcelain dish factory. This system will receive images acquired in real time by high resolution cameras directly placed on production line. The algorithm proposed in this research work will classify the dishes presented in the images as "defective" or "without defect". Therefore, the objective of the system will be the detection of defective dishes, causing fewer defective dishes to reach the market, thus contributing to a better reputation of the factory. This system is based on the application of an algorithm called Convolutional Neural Network. This algorithm requires a large amount of data to be trained and to perform the image classification. Since the COVID-19 pandemic was felt on a larger scale in Portugal at the time of the development of this research work, it was impossible to obtain data directly from the factory. Due to this setback, the data used in this work was artificially generated. By providing the complete images of dishes to the algorithm, it achieved a defect detection accuracy of 92.7% with the first dataset and 91.9%. with the second. When providing the algorithm 100x100 pixel segments of the original images, using the second created dataset, it reached 91.6% accuracy in the classification of these segments, which translated into a 52.0% accuracy rate in the classification of the complete dish images.O controlo de qualidade é fundamental para assegurar o bom funcionamento de um processo industrial. Este trabalho propõe a utilização e adaptação de um algoritmo, baseado em aprendizagem profunda, como parte integrante de um sistema automático de controlo de qualidade numa fábrica de pratos de porcelana. Este sistema receberá imagens adquiridas em tempo real por câmaras fotográficas colocadas diretamente sobre a linha de produção. O algoritmo utilizado classificará os pratos presentes nas imagens como "defeituoso" ou "sem defeito". O objetivo do sistema será, portanto, a deteção de pratos defeituosos, fazendo com que menos pratos com defeito cheguem ao mercado, contribuindo assim para uma melhor reputação da fábrica. Este sistema é baseado na aplicação de uma rede neuronal convolucional. Este tipo de redes requer um elevado número de dados para ser treinado de modo a conseguir realizar a classificação de imagens. Uma vez que a pandemia de COVID-19 se fez sentir em maior escala em Portugal na altura do desenvolvimento deste trabalho, foi impossível a obtenção de imagens provenientes da fábrica. Devido a este contratempo, os dados utilizados neste trabalho foram gerados artificialmente. Ao fornecer imagens completas de pratos ao algoritmo, o mesmo atingiu uma taxa de acerto da deteção de defeitos de 92,7% com o primeiro conjunto de dados e 91,9% com o segundo. Ao fornecer ao algoritmo segmentos de 100x100 pixéis da imagem original, o mesmo atingiu 91,6% de taxa de acerto, o que se traduziu numa taxa de acerto de 52,0% na classificação das imagens completas de pratos

    Interpretability and Generalization of Deep Low-Level Vision Models

    Get PDF
    The low-level vision task is an important type of task in computer vision, including various image restoration tasks, such as image super-resolution, image denoising, image deraining, etc. In recent years, deep learning technology has become the de facto method for solving low-level vision problems, relying on its excellent performance and ease of use. By training on large amounts of paired data, it is anticipated that deep low-level vision models can learn rich semantic knowledge and process images in an intelligent manner for real-world applications. However, because our understanding of deep learning models and low-level vision tasks is not deep enough, we cannot explain the success and failure of these deep low-level vision models. Deep learning models are widely acknowledged as ``black boxes'' due to their complexity and non-linearity. We cannot know what information the model used when processing the input or whether it learned what we wanted. When there is a problem with the model, we cannot identify the underlying source of the problem, such as the generalization problem of the low-level vision model. This research proposes interpretability analysis of deep low-level vision models to gain a more profound insight into the deep learning models for low-level vision tasks. I aim to elucidate the mechanisms of the deep learning approach and to discern insights regarding the successes or shortcomings of these methods. This is the first study to perform interpretability analysis on the deep low-level vision model
    corecore