Search CORE

2,576 research outputs found

License Plate Super-Resolution Using Diffusion Models

Author: AlHalawani Sawsan
Ali Anas M.
Ammar Adel
Benjdira Bilel
Koubaa Anis
Publication venue
Publication date: 21/09/2023
Field of study

In surveillance, accurately recognizing license plates is hindered by their often low quality and small dimensions, compromising recognition precision. Despite advancements in AI-based image super-resolution, methods like Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs) still fall short in enhancing license plate images. This study leverages the cutting-edge diffusion model, which has consistently outperformed other deep learning techniques in image restoration. By training this model using a curated dataset of Saudi license plates, both in low and high resolutions, we discovered the diffusion model's superior efficacy. The method achieves a 12.55\% and 37.32% improvement in Peak Signal-to-Noise Ratio (PSNR) over SwinIR and ESRGAN, respectively. Moreover, our method surpasses these techniques in terms of Structural Similarity Index (SSIM), registering a 4.89% and 17.66% improvement over SwinIR and ESRGAN, respectively. Furthermore, 92% of human evaluators preferred our images over those from other algorithms. In essence, this research presents a pioneering solution for license plate super-resolution, with tangible potential for surveillance systems

arXiv.org e-Print Archive

Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers

Author: Lambert Jorge de A.
Laroca Rayson
Menotti David
Nascimento Valfride
Schwartz William Robson
Publication venue: 'Elsevier BV'
Publication date: 26/05/2023
Field of study

Recent years have seen significant developments in the field of License Plate Recognition (LPR) through the integration of deep learning techniques and the increasing availability of training data. Nevertheless, reconstructing license plates (LPs) from low-resolution (LR) surveillance footage remains challenging. To address this issue, we introduce a Single-Image Super-Resolution (SISR) approach that integrates attention and transformer modules to enhance the detection of structural and textural features in LR images. Our approach incorporates sub-pixel convolution layers (also known as PixelShuffle) and a loss function that uses an Optical Character Recognition (OCR) model for feature extraction. We trained the proposed architecture on synthetic images created by applying heavy Gaussian noise to high-resolution LP images from two public datasets, followed by bicubic downsampling. As a result, the generated images have a Structural Similarity Index Measure (SSIM) of less than 0.10. Our results show that our approach for reconstructing these low-resolution synthesized images outperforms existing ones in both quantitative and qualitative measures. Our code is publicly available at https://github.com/valfride/lpr-rsr-ext

arXiv.org e-Print Archive

Leveraging Model Fusion for Improved License Plate Recognition

Author: Estevam Valter
Laroca Rayson
Menotti David
Minetto Rodrigo
Zanlorensi Luiz A.
Publication venue
Publication date: 08/09/2023
Field of study

License Plate Recognition (LPR) plays a critical role in various applications, such as toll collection, parking management, and traffic law enforcement. Although LPR has witnessed significant advancements through the development of deep learning, there has been a noticeable lack of studies exploring the potential improvements in results by fusing the outputs from multiple recognition models. This research aims to fill this gap by investigating the combination of up to 12 different models using straightforward approaches, such as selecting the most confident prediction or employing majority vote-based strategies. Our experiments encompass a wide range of datasets, revealing substantial benefits of fusion approaches in both intra- and cross-dataset setups. Essentially, fusing multiple models reduces considerably the likelihood of obtaining subpar performance on a particular dataset/scenario. We also found that combining models based on their speed is an appealing approach. Specifically, for applications where the recognition task can tolerate some additional time, though not excessively, an effective strategy is to combine 4-6 models. These models may not be the most accurate individually, but their fusion strikes an optimal balance between accuracy and speed.Comment: Accepted for presentation at the Iberoamerican Congress on Pattern Recognition (CIARP) 202

arXiv.org e-Print Archive

Video and Image Super-Resolution via Deep Learning with Attention Mechanism

Author: Xu Xuan
Publication venue: The Research Repository @ WVU
Publication date: 01/01/2020
Field of study

Image demosaicing, image super-resolution and video super-resolution are three important tasks in color imaging pipeline. Demosaicing deals with the recovery of missing color information and generation of full-resolution color images from so-called Color filter Array (CFA) such as Bayer pattern. Image super-resolution aims at increasing the spatial resolution and enhance important structures (e.g., edges and textures) in super-resolved images. Both spatial and temporal dependency are important to the task of video super-resolution, which has received increasingly more attention in recent years. Traditional solutions to these three low-level vision tasks lack generalization capability especially for real-world data. Recently, deep learning methods have achieved great success in vision problems including image demosaicing and image/video super-resolution. Conceptually similar to adaptation in model-based approaches, attention has received increasing more usage in deep learning recently. As a tool to reallocate limited computational resources based on the importance of informative components, attention mechanism which includes channel attention, spatial attention, non-local attention, etc. has found successful applications in both highlevel and low-level vision tasks. However, to the best of our knowledge, 1) most approaches independently studied super-resolution and demosaicing; little is known about the potential benefit of formulating a joint demosaicing and super-resolution (JDSR) problem; 2) attention mechanism has not been studied for spectral channels of color images in the open literature; 3) current approaches for video super-resolution implement deformable convolution based frame alignment methods and naive spatial attention mechanism. How to exploit attention mechanism in spectral and temporal domains sets up the stage for the research in this dissertation. In this dissertation, we conduct a systematic study about those two issues and make the following contributions: 1) we propose a spatial color attention network (SCAN) designed to jointly exploit the spatial and spectral dependency within color images for single image super-resolution (SISR) problem. We present a spatial color attention module that calibrates important color information for individual color components from output feature maps of residual groups. Experimental results have shown that SCAN has achieved superior performance in terms of both subjective and objective qualities on the NTIRE2019 dataset; 2) we propose two competing end-to-end joint optimization solutions to the JDSR problem: Densely-Connected Squeeze-and-Excitation Residual Network (DSERN) vs. Residual-Dense Squeeze-and-Excitation Network (RDSEN). Experimental results have shown that an enhanced design RDSEN can significantly improve both subjective and objective performance over DSERN; 3) we propose a novel deep learning based framework, Deformable Kernel Spatial Attention Network (DKSAN) to super-resolve videos with a scale factor as large as 16 (the extreme SR situation). Thanks to newly designed Deformable Kernel Convolution Alignment (DKC Align) and Deformable Kernel Spatial Attention (DKSA) modules, DKSAN can get both better subjective and objective results when compared with the existing state-of-the-art approach enhanced deformable convolutional network (EDVR)

The Research Repository @ WVU (West Virginia University)

Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition

Author: Britto Jr. Alceu S.
Estevam Valter
Laroca Rayson
Menotti David
Minetto Rodrigo
Publication venue
Publication date: 10/04/2023
Field of study

This work draws attention to the large fraction of near-duplicates in the training and test sets of datasets widely adopted in License Plate Recognition (LPR) research. These duplicates refer to images that, although different, show the same license plate. Our experiments, conducted on the two most popular datasets in the field, show a substantial decrease in recognition rate when six well-known models are trained and tested under fair splits, that is, in the absence of duplicates in the training and test sets. Moreover, in one of the datasets, the ranking of models changed considerably when they were trained and tested under duplicate-free splits. These findings suggest that such duplicates have significantly biased the evaluation and development of deep learning-based models for LPR. The list of near-duplicates we have found and proposals for fair splits are publicly available for further research at https://raysonlaroca.github.io/supp/lpr-train-on-test/Comment: Accepted for presentation at the International Joint Conference on Neural Networks (IJCNN) 202

arXiv.org e-Print Archive

Does Deep Learning-Based Super-Resolution Help Humans With Face Recognition?

Author: Erik Velan
Marco Fontani
Martino Jerian
Sergio Carrato
Publication venue
Publication date: 01/01/2022
Field of study

The last decade witnessed a renaissance of machine learning for image processing. Super-resolution (SR) is one of the areas where deep learning techniques have achieved impressive results, with a speci ﬁ c focus on the SR of facial images. Examining and comparing facial images is one of the critical activities in forensic video analysis; a compelling question is thus whether recent SR techniques could help face recognition (FR) made by a human operator, especially in the challenging scenario where very low resolution images are available, which is typical of surveillance recordings. This paper addresses such a question through a simple yet insightful experiment: we used two state- of-the-art deep learning-based SR algorithms to enhance some very low-resolution faces of 30 worldwide celebrities. We then asked a heterogeneous group of more than 130 individuals to recognize them and compared the recognition accuracy against the one achieved by presenting a simple bicubic-interpolated version of the same faces. Results are somehow surprising: despite an undisputed general superiority of SR-enhanced images in terms of visual appearance, SR techniques brought no considerable advantage in overall recognition accuracy

Archivio istituzionale della ricerca - Università di Trieste