24,332 research outputs found

    Subjective and objective quality assessment of ancient degraded documents

    Get PDF
    Archiving, restoration and analysis of damaged manuscripts have been largely increased in recent decades. Usually, these documents are physically degraded because of aging and improper handing. They also cannot be processed manually because a massive volume of these documents exist in libraries and archives around the world. Therefore, automatic methodologies are needed to preserve and to process their content. These documents are usually processed through their images. Degraded document image processing is a difficult task mainly because of the existing physical degradations. While it can be very difficult to accurately locate and remove such distortions, analyzing the severity and type(s) of these distortions is feasible. This analysis provides useful information on the type and severity of degradations with a number of applications. The main contributions of this thesis are to propose models for objectively assessing the physical condition of document images and to classify their degradations. In this thesis, three datasets of degraded document images along with the subjective ratings for each image are developed. In addition, three no-reference document image quality assessment (NR-DIQA) metrics are proposed for historical and medieval document images. It should be mentioned that degraded medieval document images are a subset of the historical document images and may contain both graphical and textual content. Finally, we propose a degradation classification model in order to identify common distortion types in old document images. Essentially, existing no reference image quality assessment (NR-IQA) metrics are not designed to assess physical document distortions. In the first contribution, we propose the first dataset of degraded document images along with the human opinion scores for each document image. This dataset is introduced to evaluate the quality of historical document images. We also propose an objective NR-DIQA metric based on the statistics of the mean subtracted contrast normalized (MSCN) coefficients computed from segmented layers of each document image. The segmentation into four layers of foreground and background is done based on an analysis of the log-Gabor filters. This segmentation is based on the assumption that the sensitivity of the human visual system (HVS) is different at the locations of text and non-text. Experimental results show that the proposed metric has comparable or better performance than the state-of-the-art metrics, while it has a moderate complexity. Degradation identification and quality assessment can complement each other to provide information on both type and severity of degradations in document images. Therefore, we introduced, in the second contribution, a multi-distortion historical document image database that can be used for the research on quality assessment of degraded documents as well as degradation classification. The developed dataset contains historical document images which are classified into four categories based on their distortion types, namely, paper translucency, stain, readers’ annotations, and worn holes. An efficient NR-DIQA metric is then proposed based on three sets of spatial and frequency image features extracted from two layers of text and non-text. In addition, these features are used to estimate the probability of the four aforementioned physical distortions for the first time in the literature. Both proposed quality assessment and degradation classification models deliver a very promising performance. Finally, we develop in the third contribution a dataset and a quality assessment metric for degraded medieval document (DMD) images. This type of degraded images contains both textual and pictorial information. The introduced DMD dataset is the first dataset in its category that also provides human ratings. Also, we propose a new no-reference metric in order to evaluate the quality of DMD images in the developed dataset. The proposed metric is based on the extraction of several statistical features from three layers of text, non-text, and graphics. The segmentation is based on color saliency with assumption that pictorial parts are colorful. It also follows HVS that gives different weights to each layer. The experimental results validate the effectiveness of the proposed NR-DIQA strategy for DMD images

    Quantifying image distortion based on Gabor filter bank and multiple regression analysis

    Get PDF
    Image quality assessment is indispensable for image-based applications. The approaches towards image quality assessment fall into two main categories: subjective and objective methods. Subjective assessment has been widely used. However, careful subjective assessments are experimentally difficult and lengthy, and the results obtained may vary depending on the test conditions. On the other hand, objective image quality assessment would not only alleviate the difficulties described above but would also help to expand the application field. Therefore, several works have been developed for quantifying the distortion presented on a image achieving goodness of fit between subjective and objective scores up to 92%. Nevertheless, current methodologies are designed assuming that the nature of the distortion is known. Generally, this is a limiting assumption for practical applications, since in a majority of cases the distortions in the image are unknown. Therefore, we believe that the current methods of image quality assessment should be adapted in order to identify and quantify the distortion of images at the same time. That combination can improve processes such as enhancement, restoration, compression, transmission, among others. We present an approach based on the power of the experimental design and the joint localization of the Gabor filters for studying the influence of the spatial/frequencies on image quality assessment. Therefore, we achieve a correct identification and quantification of the distortion affecting images. This method provides accurate scores and differentiability between distortions

    The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

    Full text link
    While it is nearly effortless for humans to quickly assess the perceptual similarity between two images, the underlying processes are thought to be quite complex. Despite this, the most widely used perceptual metrics today, such as PSNR and SSIM, are simple, shallow functions, and fail to account for many nuances of human perception. Recently, the deep learning community has found that features of the VGG network trained on ImageNet classification has been remarkably useful as a training loss for image synthesis. But how perceptual are these so-called "perceptual losses"? What elements are critical for their success? To answer these questions, we introduce a new dataset of human perceptual similarity judgments. We systematically evaluate deep features across different architectures and tasks and compare them with classic metrics. We find that deep features outperform all previous metrics by large margins on our dataset. More surprisingly, this result is not restricted to ImageNet-trained VGG features, but holds across different deep architectures and levels of supervision (supervised, self-supervised, or even unsupervised). Our results suggest that perceptual similarity is an emergent property shared across deep visual representations.Comment: Accepted to CVPR 2018; Code and data available at https://www.github.com/richzhang/PerceptualSimilarit

    Towards a Semantic Perceptual Image Metric

    Full text link
    We present a full reference, perceptual image metric based on VGG-16, an artificial neural network trained on object classification. We fit the metric to a new database based on 140k unique images annotated with ground truth by human raters who received minimal instruction. The resulting metric shows competitive performance on TID 2013, a database widely used to assess image quality assessments methods. More interestingly, it shows strong responses to objects potentially carrying semantic relevance such as faces and text, which we demonstrate using a visualization technique and ablation experiments. In effect, the metric appears to model a higher influence of semantic context on judgments, which we observe particularly in untrained raters. As the vast majority of users of image processing systems are unfamiliar with Image Quality Assessment (IQA) tasks, these findings may have significant impact on real-world applications of perceptual metrics

    Terahertz Security Image Quality Assessment by No-reference Model Observers

    Full text link
    To provide the possibility of developing objective image quality assessment (IQA) algorithms for THz security images, we constructed the THz security image database (THSID) including a total of 181 THz security images with the resolution of 127*380. The main distortion types in THz security images were first analyzed for the design of subjective evaluation criteria to acquire the mean opinion scores. Subsequently, the existing no-reference IQA algorithms, which were 5 opinion-aware approaches viz., NFERM, GMLF, DIIVINE, BRISQUE and BLIINDS2, and 8 opinion-unaware approaches viz., QAC, SISBLIM, NIQE, FISBLIM, CPBD, S3 and Fish_bb, were executed for the evaluation of the THz security image quality. The statistical results demonstrated the superiority of Fish_bb over the other testing IQA approaches for assessing the THz image quality with PLCC (SROCC) values of 0.8925 (-0.8706), and with RMSE value of 0.3993. The linear regression analysis and Bland-Altman plot further verified that the Fish__bb could substitute for the subjective IQA. Nonetheless, for the classification of THz security images, we tended to use S3 as a criterion for ranking THz security image grades because of the relatively low false positive rate in classifying bad THz image quality into acceptable category (24.69%). Interestingly, due to the specific property of THz image, the average pixel intensity gave the best performance than the above complicated IQA algorithms, with the PLCC, SROCC and RMSE of 0.9001, -0.8800 and 0.3857, respectively. This study will help the users such as researchers or security staffs to obtain the THz security images of good quality. Currently, our research group is attempting to make this research more comprehensive.Comment: 13 pages, 8 figures, 4 table
    • …
    corecore