439 research outputs found
RCRN: Real-world Character Image Restoration Network via Skeleton Extraction
Constructing high-quality character image datasets is challenging because
real-world images are often affected by image degradation. There are
limitations when applying current image restoration methods to such real-world
character images, since (i) the categories of noise in character images are
different from those in general images; (ii) real-world character images
usually contain more complex image degradation, e.g., mixed noise at different
noise levels. To address these problems, we propose a real-world character
restoration network (RCRN) to effectively restore degraded character images,
where character skeleton information and scale-ensemble feature extraction are
utilized to obtain better restoration performance. The proposed method consists
of a skeleton extractor (SENet) and a character image restorer (CiRNet). SENet
aims to preserve the structural consistency of the character and normalize
complex noise. Then, CiRNet reconstructs clean images from degraded character
images and their skeletons. Due to the lack of benchmarks for real-world
character image restoration, we constructed a dataset containing 1,606
character images with real-world degradation to evaluate the validity of the
proposed method. The experimental results demonstrate that RCRN outperforms
state-of-the-art methods quantitatively and qualitatively.Comment: Accepted to ACM MM 202
DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF
For capturing colored document images, e.g. posters and magazines, it is
common that multiple degradations such as shadows, wrinkles, etc., are
simultaneously introduced due to external factors. Restoring multi-degraded
colored document images is a great challenge, yet overlooked, as most existing
algorithms focus on enhancing color-ignored document images via binarization.
Thus, we propose DocStormer, a novel algorithm designed to restore
multi-degraded colored documents to their potential pristine PDF. The
contributions are: firstly, we propose a "Perceive-then-Restore" paradigm with
a reinforced transformer block, which more effectively encodes and utilizes the
distribution of degradations. Secondly, we are the first to utilize GAN and
pristine PDF magazine images to narrow the distribution gap between the
enhanced results and PDF images, in pursuit of less degradation and better
visual quality. Thirdly, we propose a non-parametric strategy, PFILI, which
enables a smaller training scale and larger testing resolutions with acceptable
detail trade-off, while saving memory and inference time. Fourthly, we are the
first to propose a novel Multi-Degraded Colored Document image Enhancing
dataset, named MD-CDE, for both training and evaluation. Experimental results
show that the DocStormer exhibits superior performance, capable of revitalizing
multi-degraded colored documents into their potential pristine digital
versions, which fills the current academic gap from the perspective of method,
data, and task
CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization
To efficiently extract the textual information from color degraded document
images is an important research topic. Long-term imperfect preservation of
ancient documents has led to various types of degradation such as page
staining, paper yellowing, and ink bleeding; these degradations badly impact
the image processing for information extraction. In this paper, we present
CCDWT-GAN, a generative adversarial network (GAN) that utilizes the discrete
wavelet transform (DWT) on RGB (red, green, blue) channel splited images. The
proposed method comprises three stages: image preprocessing, image enhancement,
and image binarization. This work conducts comparative experiments in the image
preprocessing stage to determine the optimal selection of DWT with
normalization. Additionally, we perform an ablation study on the results of the
image enhancement stage and the image binarization stage to validate their
positive effect on the model performance. This work compares the performance of
the proposed method with other state-of-the-art (SOTA) methods on DIBCO and
H-DIBCO ((Handwritten) Document Image Binarization Competition) datasets. The
experimental results demonstrate that CCDWT-GAN achieves a top two performance
on multiple benchmark datasets, and outperforms other SOTA methods
CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising
Degraded images commonly exist in the general sources of character images,
leading to unsatisfactory character recognition results. Existing methods have
dedicated efforts to restoring degraded character images. However, the
denoising results obtained by these methods do not appear to improve character
recognition performance. This is mainly because current methods only focus on
pixel-level information and ignore critical features of a character, such as
its glyph, resulting in character-glyph damage during the denoising process. In
this paper, we introduce a novel generic framework based on glyph fusion and
attention mechanisms, i.e., CharFormer, for precisely recovering character
images without changing their inherent glyphs. Unlike existing frameworks,
CharFormer introduces a parallel target task for capturing additional
information and injecting it into the image denoising backbone, which will
maintain the consistency of character glyphs during character image denoising.
Moreover, we utilize attention-based networks for global-local feature
interaction, which will help to deal with blind denoising and enhance denoising
performance. We compare CharFormer with state-of-the-art methods on multiple
datasets. The experimental results show the superiority of CharFormer
quantitatively and qualitatively.Comment: Accepted by ACM MM 202
Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization
In this paper, we propose an end-to-end trainable framework for restoring
historical documents content that follows the correct reading order. In this
framework, two branches named character branch and layout branch are added
behind the feature extraction network. The character branch localizes
individual characters in a document image and recognizes them simultaneously.
Then we adopt a post-processing method to group them into text lines. The
layout branch based on fully convolutional network outputs a binary mask. We
then use Hough transform for line detection on the binary mask and combine
character results with the layout information to restore document content.
These two branches can be trained in parallel and are easy to train.
Furthermore, we propose a re-score mechanism to minimize recognition error.
Experiment results on the extended Chinese historical document MTHv2 dataset
demonstrate the effectiveness of the proposed framework.Comment: 6 pages, 6 figure
A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement
Document image enhancement is a fundamental and important stage for attaining
the best performance in any document analysis assignment because there are many
degradation situations that could harm document images, making it more
difficult to recognize and analyze them. In this paper, we propose
\textbf{T2T-BinFormer} which is a novel document binarization encoder-decoder
architecture based on a Tokens-to-token vision transformer. Each image is
divided into a set of tokens with a defined length using the ViT model, which
is then applied several times to model the global relationship between the
tokens. However, the conventional tokenization of input data does not
adequately reflect the crucial local structure between adjacent pixels of the
input image, which results in low efficiency. Instead of using a simple ViT and
hard splitting of images for the document image enhancement task, we employed a
progressive tokenization technique to capture this local information from an
image to achieve more effective results. Experiments on various DIBCO and
H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing
CNN and ViT-based state-of-the-art methods. In this research, the primary area
of examination is the application of the proposed architecture to the task of
document binarization. The source code will be made available at
https://github.com/RisabBiswas/T2T-BinFormer.Comment: arXiv admin note: text overlap with arXiv:2312.0356
CT-Net:Cascade T-shape deep fusion networks for document binarization
Document binarization is a key step in most document analysis tasks. However, historical-document images usually suffer from various degradations, making this a very challenging processing stage. The performance of document image binarization has improved dramatically in recent years by the use of Convolutional Neural Networks (CNNs). In this paper, a dual-task, T-shaped neural network is proposed that has the main task of binarization and an auxiliary task of image enhancement. The neural network for enhancement learns the degradations in document images and the specific CNN-kernel features can be adapted towards the binarization task in the training process. In addition, the enhancement image can be considered as an improved version of the input image, which can be fed into the network for fine-tuning, making it possible to design a chained-cascade network (CT-Net). Experimental results on document binarization competition datasets (DIBCO datasets) and MCS dataset show that our proposed method outperforms competing state-of-the-art methods in most cases
Artificial Intelligence in the Creative Industries: A Review
This paper reviews the current state of the art in Artificial Intelligence
(AI) technologies and applications in the context of the creative industries. A
brief background of AI, and specifically Machine Learning (ML) algorithms, is
provided including Convolutional Neural Network (CNNs), Generative Adversarial
Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement
Learning (DRL). We categorise creative applications into five groups related to
how AI technologies are used: i) content creation, ii) information analysis,
iii) content enhancement and post production workflows, iv) information
extraction and enhancement, and v) data compression. We critically examine the
successes and limitations of this rapidly advancing technology in each of these
areas. We further differentiate between the use of AI as a creative tool and
its potential as a creator in its own right. We foresee that, in the near
future, machine learning-based AI will be adopted widely as a tool or
collaborative assistant for creativity. In contrast, we observe that the
successes of machine learning in domains with fewer constraints, where AI is
the `creator', remain modest. The potential of AI (or its developers) to win
awards for its original creations in competition with human creatives is also
limited, based on contemporary technologies. We therefore conclude that, in the
context of creative industries, maximum benefit from AI will be derived where
its focus is human centric -- where it is designed to augment, rather than
replace, human creativity
Detection of counterfeit coins based on 3D Height-Map Image Analysis
Analyzing 3-D height-map images leads to the discovery of a new set of features that cannot be extracted or even seen in 2-D images. To the best of our knowledge, there was no research in the literature analyzing height-map images to detect counterfeit coins or to classify coins. The main goal of this thesis is to propose a new comprehensive method for analyzing 3D height-map images to detect counterfeit of any type of coins regardless of their country of origin, language, shape, and quality. Therefore, we applied a precise 3-D scanner to produce coin height-map images, since detecting a counterfeit coin using 2D image processing is nearly impossible in some cases, especially when the coin is damaged, corroded or worn out. In this research, we propose some 3-D approaches to model and analyze several large datasets. In our first and second methods, we aimed to solve the degradation problem of shiny coin images due to the scanning process. To solve this problem, first, the characters of the coin images were straightened by a proposed straightening algorithm. The height-map image, then, was decomposed row-wise to a set of 1-D signals, which were analyzed separately and restored by two different proposed methods. These approaches produced remarkable results.
We also proposed a 3-D approach to detect and analyze the precipice borders from the coin surface and extract significant features that ignored the degradation problem. To extract the features, we also proposed Binned Borders in Spherical Coordinates (BBSC) to analyze different parts of precipice borders at different polar and azimuthal angles. We also took advantage of stack generalization to classify the coins and add a reject option to increase the reliability of the system. The results illustrate that the proposed method outperforms other counterfeit coin detectors.
Since there are traces of deep learning in most recent research related to image processing, it is worthwhile to benefit from deep learning approaches in our study. In another proposed method of this thesis, we applied deep learning algorithms in two steps to detect counterfeit coins. As Generative Adversarial Network is being used for generating fake images in image processing applications, we proposed a novel method based on this network to augment our fake coin class and compensate for the lack of fake coins for training the classifier. We also decomposed the coin height-map image into three types of Steep, Moderate, and Gentle slopes. Therefore, the grayscale height-map image is turned to the proposed SMG height-map channel. Then, we proposed a hybrid CNN-based deep neural network to train and classify these new SMG images. The results illustrated that a deep neural network trained with the proposed SMG images outperforms the system trained by the grayscale images. In this research, the proposed methods were trained and tested with four types of Danish and two types of Chinese coins with encouraging results
- …