931 research outputs found

    DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning

    Full text link
    Twenty-five hundred years ago, the paperwork of the Achaemenid Empire was recorded on clay tablets. In 1933, archaeologists from the University of Chicago's Oriental Institute (OI) found tens of thousands of these tablets and fragments during the excavation of Persepolis. Many of these tablets have been painstakingly photographed and annotated by expert cuneiformists, and now provide a rich dataset consisting of over 5,000 annotated tablet images and 100,000 cuneiform sign bounding boxes. We leverage this dataset to develop DeepScribe, a modular computer vision pipeline capable of localizing cuneiform signs and providing suggestions for the identity of each sign. We investigate the difficulty of learning subtasks relevant to cuneiform tablet transcription on ground-truth data, finding that a RetinaNet object detector can achieve a localization mAP of 0.78 and a ResNet classifier can achieve a top-5 sign classification accuracy of 0.89. The end-to-end pipeline achieves a top-5 classification accuracy of 0.80. As part of the classification module, DeepScribe groups cuneiform signs into morphological clusters. We consider how this automatic clustering approach differs from the organization of standard, printed sign lists and what we may learn from it. These components, trained individually, are sufficient to produce a system that can analyze photos of cuneiform tablets from the Achaemenid period and provide useful transliteration suggestions to researchers. We evaluate the model's end-to-end performance on locating and classifying signs, providing a roadmap to a linguistically-aware transliteration system, then consider the model's potential utility when applied to other periods of cuneiform writing.Comment: Currently under review in the ACM JOCC

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Understanding Optical Music Recognition

    Get PDF
    For over 50 years, researchers have been trying to teach computers to read music notation, referred to as Optical Music Recognition (OMR). However, this field is still difficult to access for new researchers, especially those without a significant musical background: Few introductory materials are available, and, furthermore, the field has struggled with defining itself and building a shared terminology. In this work, we address these shortcomings by (1) providing a robust definition of OMR and its relationship to related fields, (2) analyzing how OMR inverts the music encoding process to recover the musical notation and the musical semantics from documents, and (3) proposing a taxonomy of OMR, with most notably a novel taxonomy of applications. Additionally, we discuss how deep learning affects modern OMR research, as opposed to the traditional pipeline. Based on this work, the reader should be able to attain a basic understanding of OMR: its objectives, its inherent structure, its relationship to other fields, the state of the art, and the research opportunities it affords

    Advanced document data extraction techniques to improve supply chain performance

    Get PDF
    In this thesis, a novel machine learning technique to extract text-based information from scanned images has been developed. This information extraction is performed in the context of scanned invoices and bills used in financial transactions. These financial transactions contain a considerable amount of data that must be extracted, refined, and stored digitally before it can be used for analysis. Converting this data into a digital format is often a time-consuming process. Automation and data optimisation show promise as methods for reducing the time required and the cost of Supply Chain Management (SCM) processes, especially Supplier Invoice Management (SIM), Financial Supply Chain Management (FSCM) and Supply Chain procurement processes. This thesis uses a cross-disciplinary approach involving Computer Science and Operational Management to explore the benefit of automated invoice data extraction in business and its impact on SCM. The study adopts a multimethod approach based on empirical research, surveys, and interviews performed on selected companies.The expert system developed in this thesis focuses on two distinct areas of research: Text/Object Detection and Text Extraction. For Text/Object Detection, the Faster R-CNN model was analysed. While this model yields outstanding results in terms of object detection, it is limited by poor performance when image quality is low. The Generative Adversarial Network (GAN) model is proposed in response to this limitation. The GAN model is a generator network that is implemented with the help of the Faster R-CNN model and a discriminator that relies on PatchGAN. The output of the GAN model is text data with bonding boxes. For text extraction from the bounding box, a novel data extraction framework consisting of various processes including XML processing in case of existing OCR engine, bounding box pre-processing, text clean up, OCR error correction, spell check, type check, pattern-based matching, and finally, a learning mechanism for automatizing future data extraction was designed. Whichever fields the system can extract successfully are provided in key-value format.The efficiency of the proposed system was validated using existing datasets such as SROIE and VATI. Real-time data was validated using invoices that were collected by two companies that provide invoice automation services in various countries. Currently, these scanned invoices are sent to an OCR system such as OmniPage, Tesseract, or ABBYY FRE to extract text blocks and later, a rule-based engine is used to extract relevant data. While the system’s methodology is robust, the companies surveyed were not satisfied with its accuracy. Thus, they sought out new, optimized solutions. To confirm the results, the engines were used to return XML-based files with text and metadata identified. The output XML data was then fed into this new system for information extraction. This system uses the existing OCR engine and a novel, self-adaptive, learning-based OCR engine. This new engine is based on the GAN model for better text identification. Experiments were conducted on various invoice formats to further test and refine its extraction capabilities. For cost optimisation and the analysis of spend classification, additional data were provided by another company in London that holds expertise in reducing their clients' procurement costs. This data was fed into our system to get a deeper level of spend classification and categorisation. This helped the company to reduce its reliance on human effort and allowed for greater efficiency in comparison with the process of performing similar tasks manually using excel sheets and Business Intelligence (BI) tools.The intention behind the development of this novel methodology was twofold. First, to test and develop a novel solution that does not depend on any specific OCR technology. Second, to increase the information extraction accuracy factor over that of existing methodologies. Finally, it evaluates the real-world need for the system and the impact it would have on SCM. This newly developed method is generic and can extract text from any given invoice, making it a valuable tool for optimizing SCM. In addition, the system uses a template-matching approach to ensure the quality of the extracted information

    Using contour information and segmentation for object registration, modeling and retrieval

    Get PDF
    This thesis considers different aspects of the utilization of contour information and syntactic and semantic image segmentation for object registration, modeling and retrieval in the context of content-based indexing and retrieval in large collections of images. Target applications include retrieval in collections of closed silhouettes, holistic w ord recognition in handwritten historical manuscripts and shape registration. Also, the thesis explores the feasibility of contour-based syntactic features for improving the correspondence of the output of bottom-up segmentation to semantic objects present in the scene and discusses the feasibility of different strategies for image analysis utilizing contour information, e.g. segmentation driven by visual features versus segmentation driven by shape models or semi-automatic in selected application scenarios. There are three contributions in this thesis. The first contribution considers structure analysis based on the shape and spatial configuration of image regions (socalled syntactic visual features) and their utilization for automatic image segmentation. The second contribution is the study of novel shape features, matching algorithms and similarity measures. Various applications of the proposed solutions are presented throughout the thesis providing the basis for the third contribution which is a discussion of the feasibility of different recognition strategies utilizing contour information. In each case, the performance and generality of the proposed approach has been analyzed based on extensive rigorous experimentation using as large as possible test collections

    Neighborhood Label Extension for Handwritten/Printed Text Separation in Arabic Documents

    Get PDF
    International audienceThis paper addresses the problem of handwritten and printed text separation in Arabic document images. The objective is to extract handwritten text from other parts of the document. This allows the application, in a second time, of a specialized processing on the extracted handwritten part or even on the printed one. Documents are first preprocessed in order to remove eventual noise and correct document orientation. Then, the document is segmented into pseudo-lines that are segmented in turn into pseudo-words. A local classification step, using a Gaussian kernel SVM, associates each pseudo-word into handwritten or printed classes. This label is then propagated in the pseudo-word's neighborhood in order to recover from classification errors. The proposed methodology has been tested on a set of public real Arabic documents achieving a separation rate of around 90%
    corecore