2,576 research outputs found
License Plate Super-Resolution Using Diffusion Models
In surveillance, accurately recognizing license plates is hindered by their
often low quality and small dimensions, compromising recognition precision.
Despite advancements in AI-based image super-resolution, methods like
Convolutional Neural Networks (CNNs) and Generative Adversarial Networks (GANs)
still fall short in enhancing license plate images. This study leverages the
cutting-edge diffusion model, which has consistently outperformed other deep
learning techniques in image restoration. By training this model using a
curated dataset of Saudi license plates, both in low and high resolutions, we
discovered the diffusion model's superior efficacy. The method achieves a
12.55\% and 37.32% improvement in Peak Signal-to-Noise Ratio (PSNR) over SwinIR
and ESRGAN, respectively. Moreover, our method surpasses these techniques in
terms of Structural Similarity Index (SSIM), registering a 4.89% and 17.66%
improvement over SwinIR and ESRGAN, respectively. Furthermore, 92% of human
evaluators preferred our images over those from other algorithms. In essence,
this research presents a pioneering solution for license plate
super-resolution, with tangible potential for surveillance systems
Super-Resolution of License Plate Images Using Attention Modules and Sub-Pixel Convolution Layers
Recent years have seen significant developments in the field of License Plate
Recognition (LPR) through the integration of deep learning techniques and the
increasing availability of training data. Nevertheless, reconstructing license
plates (LPs) from low-resolution (LR) surveillance footage remains challenging.
To address this issue, we introduce a Single-Image Super-Resolution (SISR)
approach that integrates attention and transformer modules to enhance the
detection of structural and textural features in LR images. Our approach
incorporates sub-pixel convolution layers (also known as PixelShuffle) and a
loss function that uses an Optical Character Recognition (OCR) model for
feature extraction. We trained the proposed architecture on synthetic images
created by applying heavy Gaussian noise to high-resolution LP images from two
public datasets, followed by bicubic downsampling. As a result, the generated
images have a Structural Similarity Index Measure (SSIM) of less than 0.10. Our
results show that our approach for reconstructing these low-resolution
synthesized images outperforms existing ones in both quantitative and
qualitative measures. Our code is publicly available at
https://github.com/valfride/lpr-rsr-ext
Leveraging Model Fusion for Improved License Plate Recognition
License Plate Recognition (LPR) plays a critical role in various
applications, such as toll collection, parking management, and traffic law
enforcement. Although LPR has witnessed significant advancements through the
development of deep learning, there has been a noticeable lack of studies
exploring the potential improvements in results by fusing the outputs from
multiple recognition models. This research aims to fill this gap by
investigating the combination of up to 12 different models using
straightforward approaches, such as selecting the most confident prediction or
employing majority vote-based strategies. Our experiments encompass a wide
range of datasets, revealing substantial benefits of fusion approaches in both
intra- and cross-dataset setups. Essentially, fusing multiple models reduces
considerably the likelihood of obtaining subpar performance on a particular
dataset/scenario. We also found that combining models based on their speed is
an appealing approach. Specifically, for applications where the recognition
task can tolerate some additional time, though not excessively, an effective
strategy is to combine 4-6 models. These models may not be the most accurate
individually, but their fusion strikes an optimal balance between accuracy and
speed.Comment: Accepted for presentation at the Iberoamerican Congress on Pattern
Recognition (CIARP) 202
Video and Image Super-Resolution via Deep Learning with Attention Mechanism
Image demosaicing, image super-resolution and video super-resolution are three important tasks in color imaging pipeline. Demosaicing deals with the recovery of missing color information and generation of full-resolution color images from so-called Color filter Array (CFA) such as Bayer pattern. Image super-resolution aims at increasing the spatial resolution and enhance important structures (e.g., edges and textures) in super-resolved images. Both spatial and temporal dependency are important to the task of video super-resolution, which has received increasingly more attention in recent years. Traditional solutions to these three low-level vision tasks lack generalization capability especially for real-world data. Recently, deep learning methods have achieved great success in vision problems including image demosaicing and image/video super-resolution. Conceptually similar to adaptation in model-based approaches, attention has received increasing more usage in deep learning recently. As a tool to reallocate limited computational resources based on the importance of informative components, attention mechanism which includes channel attention, spatial attention, non-local attention, etc. has found successful applications in both highlevel and low-level vision tasks. However, to the best of our knowledge, 1) most approaches independently studied super-resolution and demosaicing; little is known about the potential benefit of formulating a joint demosaicing and super-resolution (JDSR) problem; 2) attention mechanism has not been studied for spectral channels of color images in the open literature; 3) current approaches for video super-resolution implement deformable convolution based frame alignment methods and naive spatial attention mechanism. How to exploit attention mechanism in spectral and temporal domains sets up the stage for the research in this dissertation. In this dissertation, we conduct a systematic study about those two issues and make the following contributions: 1) we propose a spatial color attention network (SCAN) designed to jointly exploit the spatial and spectral dependency within color images for single image super-resolution (SISR) problem. We present a spatial color attention module that calibrates important color information for individual color components from output feature maps of residual groups. Experimental results have shown that SCAN has achieved superior performance in terms of both subjective and objective qualities on the NTIRE2019 dataset; 2) we propose two competing end-to-end joint optimization solutions to the JDSR problem: Densely-Connected Squeeze-and-Excitation Residual Network (DSERN) vs. Residual-Dense Squeeze-and-Excitation Network (RDSEN). Experimental results have shown that an enhanced design RDSEN can significantly improve both subjective and objective performance over DSERN; 3) we propose a novel deep learning based framework, Deformable Kernel Spatial Attention Network (DKSAN) to super-resolve videos with a scale factor as large as 16 (the extreme SR situation). Thanks to newly designed Deformable Kernel Convolution Alignment (DKC Align) and Deformable Kernel Spatial Attention (DKSA) modules, DKSAN can get both better subjective and objective results when compared with the existing state-of-the-art approach enhanced deformable convolutional network (EDVR)
Do We Train on Test Data? The Impact of Near-Duplicates on License Plate Recognition
This work draws attention to the large fraction of near-duplicates in the
training and test sets of datasets widely adopted in License Plate Recognition
(LPR) research. These duplicates refer to images that, although different, show
the same license plate. Our experiments, conducted on the two most popular
datasets in the field, show a substantial decrease in recognition rate when six
well-known models are trained and tested under fair splits, that is, in the
absence of duplicates in the training and test sets. Moreover, in one of the
datasets, the ranking of models changed considerably when they were trained and
tested under duplicate-free splits. These findings suggest that such duplicates
have significantly biased the evaluation and development of deep learning-based
models for LPR. The list of near-duplicates we have found and proposals for
fair splits are publicly available for further research at
https://raysonlaroca.github.io/supp/lpr-train-on-test/Comment: Accepted for presentation at the International Joint Conference on
Neural Networks (IJCNN) 202
Does Deep Learning-Based Super-Resolution Help Humans With Face Recognition?
The last decade witnessed a renaissance of machine learning for image processing.
Super-resolution (SR) is one of the areas where deep learning techniques have achieved
impressive results, with a speci fi c focus on the SR of facial images. Examining and
comparing facial images is one of the critical activities in forensic video analysis; a
compelling question is thus whether recent SR techniques could help face recognition
(FR) made by a human operator, especially in the challenging scenario where very low
resolution images are available, which is typical of surveillance recordings. This paper
addresses such a question through a simple yet insightful experiment: we used two state-
of-the-art deep learning-based SR algorithms to enhance some very low-resolution faces
of 30 worldwide celebrities. We then asked a heterogeneous group of more than 130
individuals to recognize them and compared the recognition accuracy against the one
achieved by presenting a simple bicubic-interpolated version of the same faces. Results
are somehow surprising: despite an undisputed general superiority of SR-enhanced
images in terms of visual appearance, SR techniques brought no considerable
advantage in overall recognition accuracy
- …