439 research outputs found

    RCRN: Real-world Character Image Restoration Network via Skeleton Extraction

    Full text link
    Constructing high-quality character image datasets is challenging because real-world images are often affected by image degradation. There are limitations when applying current image restoration methods to such real-world character images, since (i) the categories of noise in character images are different from those in general images; (ii) real-world character images usually contain more complex image degradation, e.g., mixed noise at different noise levels. To address these problems, we propose a real-world character restoration network (RCRN) to effectively restore degraded character images, where character skeleton information and scale-ensemble feature extraction are utilized to obtain better restoration performance. The proposed method consists of a skeleton extractor (SENet) and a character image restorer (CiRNet). SENet aims to preserve the structural consistency of the character and normalize complex noise. Then, CiRNet reconstructs clean images from degraded character images and their skeletons. Due to the lack of benchmarks for real-world character image restoration, we constructed a dataset containing 1,606 character images with real-world degradation to evaluate the validity of the proposed method. The experimental results demonstrate that RCRN outperforms state-of-the-art methods quantitatively and qualitatively.Comment: Accepted to ACM MM 202

    DocStormer: Revitalizing Multi-Degraded Colored Document Images to Pristine PDF

    Full text link
    For capturing colored document images, e.g. posters and magazines, it is common that multiple degradations such as shadows, wrinkles, etc., are simultaneously introduced due to external factors. Restoring multi-degraded colored document images is a great challenge, yet overlooked, as most existing algorithms focus on enhancing color-ignored document images via binarization. Thus, we propose DocStormer, a novel algorithm designed to restore multi-degraded colored documents to their potential pristine PDF. The contributions are: firstly, we propose a "Perceive-then-Restore" paradigm with a reinforced transformer block, which more effectively encodes and utilizes the distribution of degradations. Secondly, we are the first to utilize GAN and pristine PDF magazine images to narrow the distribution gap between the enhanced results and PDF images, in pursuit of less degradation and better visual quality. Thirdly, we propose a non-parametric strategy, PFILI, which enables a smaller training scale and larger testing resolutions with acceptable detail trade-off, while saving memory and inference time. Fourthly, we are the first to propose a novel Multi-Degraded Colored Document image Enhancing dataset, named MD-CDE, for both training and evaluation. Experimental results show that the DocStormer exhibits superior performance, capable of revitalizing multi-degraded colored documents into their potential pristine digital versions, which fills the current academic gap from the perspective of method, data, and task

    CCDWT-GAN: Generative Adversarial Networks Based on Color Channel Using Discrete Wavelet Transform for Document Image Binarization

    Full text link
    To efficiently extract the textual information from color degraded document images is an important research topic. Long-term imperfect preservation of ancient documents has led to various types of degradation such as page staining, paper yellowing, and ink bleeding; these degradations badly impact the image processing for information extraction. In this paper, we present CCDWT-GAN, a generative adversarial network (GAN) that utilizes the discrete wavelet transform (DWT) on RGB (red, green, blue) channel splited images. The proposed method comprises three stages: image preprocessing, image enhancement, and image binarization. This work conducts comparative experiments in the image preprocessing stage to determine the optimal selection of DWT with normalization. Additionally, we perform an ablation study on the results of the image enhancement stage and the image binarization stage to validate their positive effect on the model performance. This work compares the performance of the proposed method with other state-of-the-art (SOTA) methods on DIBCO and H-DIBCO ((Handwritten) Document Image Binarization Competition) datasets. The experimental results demonstrate that CCDWT-GAN achieves a top two performance on multiple benchmark datasets, and outperforms other SOTA methods

    CharFormer: A Glyph Fusion based Attentive Framework for High-precision Character Image Denoising

    Full text link
    Degraded images commonly exist in the general sources of character images, leading to unsatisfactory character recognition results. Existing methods have dedicated efforts to restoring degraded character images. However, the denoising results obtained by these methods do not appear to improve character recognition performance. This is mainly because current methods only focus on pixel-level information and ignore critical features of a character, such as its glyph, resulting in character-glyph damage during the denoising process. In this paper, we introduce a novel generic framework based on glyph fusion and attention mechanisms, i.e., CharFormer, for precisely recovering character images without changing their inherent glyphs. Unlike existing frameworks, CharFormer introduces a parallel target task for capturing additional information and injecting it into the image denoising backbone, which will maintain the consistency of character glyphs during character image denoising. Moreover, we utilize attention-based networks for global-local feature interaction, which will help to deal with blind denoising and enhance denoising performance. We compare CharFormer with state-of-the-art methods on multiple datasets. The experimental results show the superiority of CharFormer quantitatively and qualitatively.Comment: Accepted by ACM MM 202

    Joint Layout Analysis, Character Detection and Recognition for Historical Document Digitization

    Full text link
    In this paper, we propose an end-to-end trainable framework for restoring historical documents content that follows the correct reading order. In this framework, two branches named character branch and layout branch are added behind the feature extraction network. The character branch localizes individual characters in a document image and recognizes them simultaneously. Then we adopt a post-processing method to group them into text lines. The layout branch based on fully convolutional network outputs a binary mask. We then use Hough transform for line detection on the binary mask and combine character results with the layout information to restore document content. These two branches can be trained in parallel and are easy to train. Furthermore, we propose a re-score mechanism to minimize recognition error. Experiment results on the extended Chinese historical document MTHv2 dataset demonstrate the effectiveness of the proposed framework.Comment: 6 pages, 6 figure

    A Layer-Wise Tokens-to-Token Transformer Network for Improved Historical Document Image Enhancement

    Full text link
    Document image enhancement is a fundamental and important stage for attaining the best performance in any document analysis assignment because there are many degradation situations that could harm document images, making it more difficult to recognize and analyze them. In this paper, we propose \textbf{T2T-BinFormer} which is a novel document binarization encoder-decoder architecture based on a Tokens-to-token vision transformer. Each image is divided into a set of tokens with a defined length using the ViT model, which is then applied several times to model the global relationship between the tokens. However, the conventional tokenization of input data does not adequately reflect the crucial local structure between adjacent pixels of the input image, which results in low efficiency. Instead of using a simple ViT and hard splitting of images for the document image enhancement task, we employed a progressive tokenization technique to capture this local information from an image to achieve more effective results. Experiments on various DIBCO and H-DIBCO benchmarks demonstrate that the proposed model outperforms the existing CNN and ViT-based state-of-the-art methods. In this research, the primary area of examination is the application of the proposed architecture to the task of document binarization. The source code will be made available at https://github.com/RisabBiswas/T2T-BinFormer.Comment: arXiv admin note: text overlap with arXiv:2312.0356

    CT-Net:Cascade T-shape deep fusion networks for document binarization

    Get PDF
    Document binarization is a key step in most document analysis tasks. However, historical-document images usually suffer from various degradations, making this a very challenging processing stage. The performance of document image binarization has improved dramatically in recent years by the use of Convolutional Neural Networks (CNNs). In this paper, a dual-task, T-shaped neural network is proposed that has the main task of binarization and an auxiliary task of image enhancement. The neural network for enhancement learns the degradations in document images and the specific CNN-kernel features can be adapted towards the binarization task in the training process. In addition, the enhancement image can be considered as an improved version of the input image, which can be fed into the network for fine-tuning, making it possible to design a chained-cascade network (CT-Net). Experimental results on document binarization competition datasets (DIBCO datasets) and MCS dataset show that our proposed method outperforms competing state-of-the-art methods in most cases

    Artificial Intelligence in the Creative Industries: A Review

    Full text link
    This paper reviews the current state of the art in Artificial Intelligence (AI) technologies and applications in the context of the creative industries. A brief background of AI, and specifically Machine Learning (ML) algorithms, is provided including Convolutional Neural Network (CNNs), Generative Adversarial Networks (GANs), Recurrent Neural Networks (RNNs) and Deep Reinforcement Learning (DRL). We categorise creative applications into five groups related to how AI technologies are used: i) content creation, ii) information analysis, iii) content enhancement and post production workflows, iv) information extraction and enhancement, and v) data compression. We critically examine the successes and limitations of this rapidly advancing technology in each of these areas. We further differentiate between the use of AI as a creative tool and its potential as a creator in its own right. We foresee that, in the near future, machine learning-based AI will be adopted widely as a tool or collaborative assistant for creativity. In contrast, we observe that the successes of machine learning in domains with fewer constraints, where AI is the `creator', remain modest. The potential of AI (or its developers) to win awards for its original creations in competition with human creatives is also limited, based on contemporary technologies. We therefore conclude that, in the context of creative industries, maximum benefit from AI will be derived where its focus is human centric -- where it is designed to augment, rather than replace, human creativity

    Detection of counterfeit coins based on 3D Height-Map Image Analysis

    Get PDF
    Analyzing 3-D height-map images leads to the discovery of a new set of features that cannot be extracted or even seen in 2-D images. To the best of our knowledge, there was no research in the literature analyzing height-map images to detect counterfeit coins or to classify coins. The main goal of this thesis is to propose a new comprehensive method for analyzing 3D height-map images to detect counterfeit of any type of coins regardless of their country of origin, language, shape, and quality. Therefore, we applied a precise 3-D scanner to produce coin height-map images, since detecting a counterfeit coin using 2D image processing is nearly impossible in some cases, especially when the coin is damaged, corroded or worn out. In this research, we propose some 3-D approaches to model and analyze several large datasets. In our first and second methods, we aimed to solve the degradation problem of shiny coin images due to the scanning process. To solve this problem, first, the characters of the coin images were straightened by a proposed straightening algorithm. The height-map image, then, was decomposed row-wise to a set of 1-D signals, which were analyzed separately and restored by two different proposed methods. These approaches produced remarkable results. We also proposed a 3-D approach to detect and analyze the precipice borders from the coin surface and extract significant features that ignored the degradation problem. To extract the features, we also proposed Binned Borders in Spherical Coordinates (BBSC) to analyze different parts of precipice borders at different polar and azimuthal angles. We also took advantage of stack generalization to classify the coins and add a reject option to increase the reliability of the system. The results illustrate that the proposed method outperforms other counterfeit coin detectors. Since there are traces of deep learning in most recent research related to image processing, it is worthwhile to benefit from deep learning approaches in our study. In another proposed method of this thesis, we applied deep learning algorithms in two steps to detect counterfeit coins. As Generative Adversarial Network is being used for generating fake images in image processing applications, we proposed a novel method based on this network to augment our fake coin class and compensate for the lack of fake coins for training the classifier. We also decomposed the coin height-map image into three types of Steep, Moderate, and Gentle slopes. Therefore, the grayscale height-map image is turned to the proposed SMG height-map channel. Then, we proposed a hybrid CNN-based deep neural network to train and classify these new SMG images. The results illustrated that a deep neural network trained with the proposed SMG images outperforms the system trained by the grayscale images. In this research, the proposed methods were trained and tested with four types of Danish and two types of Chinese coins with encouraging results
    corecore