111 research outputs found

    A Comprehensive Study of ImageNet Pre-Training for Historical Document Image Analysis

    Full text link
    Automatic analysis of scanned historical documents comprises a wide range of image analysis tasks, which are often challenging for machine learning due to a lack of human-annotated learning samples. With the advent of deep neural networks, a promising way to cope with the lack of training data is to pre-train models on images from a different domain and then fine-tune them on historical documents. In the current research, a typical example of such cross-domain transfer learning is the use of neural networks that have been pre-trained on the ImageNet database for object recognition. It remains a mostly open question whether or not this pre-training helps to analyse historical documents, which have fundamentally different image properties when compared with ImageNet. In this paper, we present a comprehensive empirical survey on the effect of ImageNet pre-training for diverse historical document analysis tasks, including character recognition, style classification, manuscript dating, semantic segmentation, and content-based retrieval. While we obtain mixed results for semantic segmentation at pixel-level, we observe a clear trend across different network architectures that ImageNet pre-training has a positive effect on classification as well as content-based retrieval

    Offline signature verification using writer-dependent ensembles and static classifier selection with handcraft features

    Get PDF
    Orientador: Eduardo TodtDissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa : Curitiba, 17/02/2022Inclui referências: p. 85-94Área de concentração: Ciência da ComputaçãoResumo: Reconhecimento e identificação de assinaturas em documentos e manuscritos são tarefas desafiadoras que ao longo do tempo vêm sendo estudadas, em especial na questão de discernir assinaturas genuínas de falsificações. Com o recente avanço das tecnologias, principalmente no campo da computação, pesquisas nesta área têm se tornado cada vez mais frequentes, possibilitando o uso de novos métodos de análise das assinaturas, aumentando a precisão e a confiança na verificação delas. Ainda há muito o que se explorar em pesquisas desta área dentro da computação. Verificações de assinaturas consistem, de forma geral, em obter características acerca de um a assinatura e utilizá-las para discerni-la das demais. Estudos propondo variados tipos de métodos foram realizados nos últimos anos a fim de aprimorar os resultados obtidos por sistemas de verificação e identificação de assinaturas. Diferentes formas de extrair características têm sido exploradas, com o o uso de redes neurais artificiais voltadas especificam ente para verificação de assinaturas, como a ResNet e a SigNet, representando o estado-da-arte nesta área de pesquisa. Apesar disso, métodos mais simples de extração de características ainda são muito utilizados, como o histograma de gradientes orientados (HOG), o Local Binary Patterns (LBP) e Local Phase Quantization (LPQ) por exemplo, apresentando, em muitos casos, resultados similares ao estado-da-arte. Não apenas isso, mas diferentes formas de combinar informações de extratores de características e resultados de classificadores têm sido propostos, como é o caso dos seletores de características, métodos de comitê de máquinas e algoritmos de análise da qualidade das características. D esta form a, o trabalho realizado consiste em explorar diferentes métodos de extração de características com binados em um conjunto de classificadores, de maneira que cada conjunto seja construído de forma dependente do autor e seja especificam ente adaptado para reconhecer as melhores características para cada autor, aprendendo quais com binações de classificadores com determinado grupo de características melhor se adaptam para reconhecer suas assinaturas. O desempenho e a funcionalidade do sistema foram comparados com os principais trabalhos da área desenvolvidos nos últimos anos, tendo sido realizados testes com as databases CEDAR, M CYT e UTSig. A pesar de não superar o estado-da-arte, o sistema apresentou bom desempenho, podendo ser com parado com alguns outros trabalhos importantes na área. Além disso, o sistema mostrou a eficiência dos classificadores Support Vector M achine(SVM ) e votadores para a realização da meta-classificação, bem como o potencial de alguns extratores de características para a área de verificação de assinaturas, com o foi o caso do Compound Local Binary Pattern(CLBP).Abstract: Signature recognition and identification in documents and manuscripts are challenging tasks that have been studied over time, especially in the matter of discerning genuine signatures from forgeries. With the recent advancement of technologies, especially in the field of computing, research in this area has become increasingly frequent, enabling the use of new methods of analysis of signatures, increasing accuracy and confidence in their verification. There is still much to be explored in research in this area within computing. Signature verification generally consists in obtaining features about a signature and using them to distinguish it from others. Studies proposing different types o f methods have been carried out in recent years in order to improve the results obtained by signature verification and identification systems. Different ways of extracting features have been explored, such as the use of artificial neural networks specifically aimed at verifying signatures, like ResNet and SigNet, representing the state-of-the-art in this research area. Despite this, simpler methods of feature extraction are still widely used, such as the Histogram of Oriented Gradients (HOG), the Local Binary Patterns (LBP) and the Local Phase Quantization (LPQ) for example, presenting, in many cases, similar results to the state-of-the-art. Not only that, but different ways of combining information from feature extractors and results from classifiers have been proposed, such as feature selectors, machine committee methods and feature quality analysis algorithms. In this way, the developed work consists in exploring different methods of features extractors combined in an ensemble, so that each ensemble is built in a writer-dependent way and is specifically adapted to recognize the best features for each author, learning which combinations of classifiers with a certain group of characteristics is better adapted to recognize their signatures. The performance and functionality of the system were compared w ith the m ain works in the area developed in recent years, w ith tests having been carried out with the CEDAR, M CYT and UTSig databases. Despite not overcoming the state-of-the-art, the system presented good performance, being able to be compared with some other important works in the area. In addition, the system showed the efficiency of Support Vector Machine(SVM ) classifiers and voters to perform the meta-classification, as well as the potential of some feature extractors for the signature verification area, such as the Compound Local Binary Pattern(CLBP)

    Fine-grained Incident Video Retrieval with Video Similarity Learning.

    Get PDF
    PhD ThesesIn this thesis, we address the problem of Fine-grained Incident Video Retrieval (FIVR) using video similarity learning methods. FIVR is a video retrieval task that aims to retrieve all videos that depict the same incident given a query video { related video retrieval tasks adopt either very narrow or very broad scopes, considering only nearduplicate or same event videos. To formulate the case of same incident videos, we de ne three video associations taking into account the spatio-temporal spans captured by video pairs. To cover the benchmarking needs of FIVR, we construct a large-scale dataset, called FIVR-200K, consisting of 225,960 YouTube videos from major news events crawled from Wikipedia. The dataset contains four annotation labels according to FIVR de nitions; hence, it can simulate several retrieval scenarios with the same video corpus. To address FIVR, we propose two video-level approaches leveraging features extracted from intermediate layers of Convolutional Neural Networks (CNN). The rst is an unsupervised method that relies on a modi ed Bag-of-Word scheme, which generates video representations from the aggregation of the frame descriptors based on learned visual codebooks. The second is a supervised method based on Deep Metric Learning, which learns an embedding function that maps videos in a feature space where relevant video pairs are closer than the irrelevant ones. However, videolevel approaches generate global video representations, losing all spatial and temporal relations between compared videos. Therefore, we propose a video similarity learning approach that captures ne-grained relations between videos for accurate similarity calculation. We train a CNN architecture to compute video-to-video similarity from re ned frame-to-frame similarity matrices derived from a pairwise region-level similarity function. The proposed approaches have been extensively evaluated on FIVR- 200K and other large-scale datasets, demonstrating their superiority over other video retrieval methods and highlighting the challenging aspect of the FIVR problem

    Spotting Keywords in Offline Handwritten Documents Using Hausdorff Edit Distance

    Get PDF
    Keyword spotting has become a crucial topic in handwritten document recognition, by enabling content-based retrieval of scanned documents using search terms. With a query keyword, one can search and index the digitized handwriting which in turn facilitates understanding of manuscripts. Common automated techniques address the keyword spotting problem through statistical representations. Structural representations such as graphs apprehend the complex structure of handwriting. However, they are rarely used, particularly for keyword spotting techniques, due to high computational costs. The graph edit distance, a powerful and versatile method for matching any type of labeled graph, has exponential time complexity to calculate the similarities of graphs. Hence, the use of graph edit distance is constrained to small size graphs. The recently developed Hausdorff edit distance algorithm approximates the graph edit distance with quadratic time complexity by efficiently matching local substructures. This dissertation speculates using Hausdorff edit distance could be a promising alternative to other template-based keyword spotting approaches in term of computational time and accuracy. Accordingly, the core contribution of this thesis is investigation and development of a graph-based keyword spotting technique based on the Hausdorff edit distance algorithm. The high representational power of graphs combined with the efficiency of the Hausdorff edit distance for graph matching achieves remarkable speedup as well as accuracy. In a comprehensive experimental evaluation, we demonstrate the solid performance of the proposed graph-based method when compared with state of the art, both, concerning precision and speed. The second contribution of this thesis is a keyword spotting technique which incorporates dynamic time warping and Hausdorff edit distance approaches. The structural representation of graph-based approach combined with statistical geometric features representation compliments each other in order to provide a more accurate system. The proposed system has been extensively evaluated with four types of handwriting graphs and geometric features vectors on benchmark datasets. The experiments demonstrate a performance boost in which outperforms individual systems

    LIPIcs, Volume 244, ESA 2022, Complete Volume

    Get PDF
    LIPIcs, Volume 244, ESA 2022, Complete Volum
    corecore