1,086 research outputs found

    Historical Document Enhancement Using LUT Classification

    Get PDF
    The fast evolution of scanning and computing technologies in recent years has led to the creation of large collections of scanned historical documents. It is almost always the case that these scanned documents suffer from some form of degradation. Large degradations make documents hard to read and substantially deteriorate the performance of automated document processing systems. Enhancement of degraded document images is normally performed assuming global degradation models. When the degradation is large, global degradation models do not perform well. In contrast, we propose to learn local degradation models and use them in enhancing degraded document images. Using a semi-automated enhancement system, we have labeled a subset of the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). This labeled subset was then used to train classifiers based on lookup tables in conjunction with the approximated nearest neighbor algorithm. The resulting algorithm is highly efficient and effective. Experimental evaluation results are provided using the Frieder diaries collection (The diaries of Rabbi Dr. Avraham Abba Frieder. http://ir.iit.edu/collections/). © Springer-Verlag 2009

    Knowledge Elicitation in Deep Learning Models

    Get PDF
    Embora o aprendizado profundo (mais conhecido como deep learning) tenha se tornado uma ferramenta popular na solução de problemas modernos em vários domínios, ele apresenta um desafio significativo - a interpretabilidade. Esta tese percorre um cenário de elicitação de conhecimento em modelos de deep learning, lançando luz sobre a visualização de características, mapas de saliência e técnicas de destilação. Estas técnicas foram aplicadas a duas arquiteturas: redes neurais convolucionais (CNNs) e um modelo de pacote (Google Vision). A nossa investigação forneceu informações valiosas sobre a sua eficácia na elicitação e interpretação do conhecimento codificado. Embora tenham demonstrado potencial, também foram observadas limitações, sugerindo espaço para mais desenvolvimento neste campo. Este trabalho não só realça a necessidade de modelos de deep learning mais transparentes e explicáveis, como também impulsiona o desenvolvimento de técnicas para extrair conhecimento. Trata-se de garantir uma implementação responsável e enfatizar a importância da transparência e compreensão no aprendizado de máquina. Além de avaliar os métodos existentes, esta tese explora também o potencial de combinar múltiplas técnicas para melhorar a interpretabilidade dos modelos de deep learning. Uma mistura de visualização de características, mapas de saliência e técnicas de destilação de modelos foi usada de uma maneira complementar para extrair e interpretar o conhecimento das arquiteturas escolhidas. Os resultados experimentais destacam a utilidade desta abordagem combinada, revelando uma compreensão mais abrangente dos processos de tomada de decisão dos modelos. Além disso, propomos um novo modelo para a elicitação sistemática de conhecimento em deep learning, que integra de forma coesa estes métodos. Este quadro demonstra o valor de uma abordagem holística para a interpretabilidade do modelo, em vez de se basear num único método. Por fim, discutimos as implicações éticas do nosso trabalho. À medida que os modelos de deep learning continuam a permear vários setores, desde a saúde até às finanças, garantir que as suas decisões são explicáveis e justificadas torna-se cada vez mais crucial. A nossa investigação sublinha esta importância, preparando o terreno para a criação de sistemas de inteligência artificial mais transparentes e responsáveis no futuro.Though a buzzword in modern problem-solving across various domains, deep learning presents a significant challenge - interpretability. This thesis journeys through a landscape of knowledge elicitation in deep learning models, shedding light on feature visualization, saliency maps, and model distillation techniques. These techniques were applied to two deep learning architectures: convolutional neural networks (CNNs) and a black box package model (Google Vision). Our investigation provided valuable insights into their effectiveness in eliciting and interpreting the encoded knowledge. While they demonstrated potential, limitations were also observed, suggesting room for further development in this field. This work does not just highlight the need for more transparent, more explainable deep learning models, it gives a gentle nudge to developing innovative techniques to extract knowledge. It is all about ensuring responsible deployment and emphasizing the importance of transparency and comprehension in machine learning. In addition to evaluating existing methods, this thesis also explores the potential for combining multiple techniques to enhance the interpretability of deep learning models. A blend of feature visualization, saliency maps, and model distillation techniques was used in a complementary manner to extract and interpret the knowledge from our chosen architectures. Experimental results highlight the utility of this combined approach, revealing a more comprehensive understanding of the models' decision-making processes. Furthermore, we propose a novel framework for systematic knowledge elicitation in deep learning, which cohesively integrates these methods. This framework showcases the value of a holistic approach toward model interpretability rather than relying on a single method. Lastly, we discuss the ethical implications of our work. As deep learning models continue to permeate various sectors, from healthcare to finance, ensuring their decisions are explainable and justified becomes increasingly crucial. Our research underscores this importance, laying the groundwork for creating more transparent, accountable AI systems in the future

    Text detection in natural scenes through weighted majority voting of DCT high pass filters, line removal, and color consistency filtering

    Get PDF
    Detecting text in images presents the unique challenge of finding both in-scene and superimposed text of various sizes, fonts, colors, and textures in complex backgrounds. The goal of this system is not to recognize specific letters or words but only to determine if a pixel is text or not. This pixel level decision is made by applying a set of weighted classifiers created using a set of high pass filters, and a series of image processing techniques. It is our assertion that the learned weighted combination of frequency filters in conjunction with image processing techniques may show better pixel level text detection performance in terms of precision, recall, and f-metric, than any of the components do individually. Qualitatively, our algorithm performs well and shows promising results. Quantitative numbers are not as high as is desired, but not unreasonable. For the complete ensemble, the f-metric was found to be 0.36

    Modern Tools for Old Content - in Search of Named Entities in a Finnish OCRed Historical Newspaper Collection 1771-1910

    Get PDF
    Named entity recognition (NER), search, classification and tagging of names and name like frequent informational elements in texts, has become a standard information extraction procedure for textual data. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein families, animals etc. In general a NER system’s performance is genre and domain dependent and also used entity categories vary [1]. The most general set of named entities is usually some version of three partite categorization of locations, persons and organizations. In this paper we report first trials and evaluation of NER with data out of a digitized Finnish historical newspaper collection Digi. Digi collection contains 1,960,921 pages of newspaper material from years 1771– 1910 both in Finnish and Swedish. We use only material of Finnish documents in our evaluation. The OCRed newspaper collection has lots of OCR errors; its estimated word level correctness is about 74–75 % [2]. Our principal NER tagger is a rule-based tagger of Finnish, FiNER, provided by the FIN-CLARIN consortium. We show also results of limited category semantic tagging with tools of the Semantic Computing Research Group (SeCo) of the Aalto University. FiNER is able to achieve up to 60.0 F-score with named entities in the evaluation data. Seco’s tools achieve 30.0–60.0 F-score with locations and persons. Performance of FiNER and SeCo’s tools with the data shows that at best about half of named entities can be recognized even in a quite erroneous OCRed textNamed entity recognition (NER), search, classification and tagging of names and name like frequent informational elements in texts, has become a standard information extraction procedure for textual data. NER has been applied to many types of texts and different types of entities: newspapers, fiction, historical records, persons, locations, chemical compounds, protein families, animals etc. In general a NER system’s performance is genre and domain dependent and also used entity categories vary [1]. The most general set of named entities is usually some version of three partite categorization of locations, persons and organizations. In this paper we report first trials and evaluation of NER with data out of a digitized Finnish historical newspaper collection Digi. Digi collection contains 1,960,921 pages of newspaper material from years 1771– 1910 both in Finnish and Swedish. We use only material of Finnish documents in our evaluation. The OCRed newspaper collection has lots of OCR errors; its estimated word level correctness is about 74–75 % [2]. Our principal NER tagger is a rule-based tagger of Finnish, FiNER, provided by the FIN-CLARIN consortium. We show also results of limited category semantic tagging with tools of the Semantic Computing Research Group (SeCo) of the Aalto University. FiNER is able to achieve up to 60.0 F-score with named entities in the evaluation data. Seco’s tools achieve 30.0–60.0 F-score with locations and persons. Performance of FiNER and SeCo’s tools with the data shows that at best about half of named entities can be recognized even in a quite erroneous OCRed text.Peer reviewe

    Off-line Arabic Handwriting Recognition System Using Fast Wavelet Transform

    Get PDF
    In this research, off-line handwriting recognition system for Arabic alphabet is introduced. The system contains three main stages: preprocessing, segmentation and recognition stage. In the preprocessing stage, Radon transform was used in the design of algorithms for page, line and word skew correction as well as for word slant correction. In the segmentation stage, Hough transform approach was used for line extraction. For line to words and word to characters segmentation, a statistical method using mathematic representation of the lines and words binary image was used. Unlike most of current handwriting recognition system, our system simulates the human mechanism for image recognition, where images are encoded and saved in memory as groups according to their similarity to each other. Characters are decomposed into a coefficient vectors, using fast wavelet transform, then, vectors, that represent a character in different possible shapes, are saved as groups with one representative for each group. The recognition is achieved by comparing a vector of the character to be recognized with group representatives. Experiments showed that the proposed system is able to achieve the recognition task with 90.26% of accuracy. The system needs only 3.41 seconds a most to recognize a single character in a text of 15 lines where each line has 10 words on average

    Adaptive Methods for Robust Document Image Understanding

    Get PDF
    A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy

    Recognition of Characters from Streaming Videos

    Get PDF
    Non
    corecore