556 research outputs found

    A step towards understanding paper documents

    Get PDF
    This report focuses on analysis steps necessary for a paper document processing. It is divided in three major parts: a document image preprocessing, a knowledge-based geometric classification of the image, and a expectation-driven text recognition. It first illustrates the several low level image processing procedures providing the physical document structure of a scanned document image. Furthermore, it describes a knowledge-based approach, developed for the identification of logical objects (e.g., sender or the footnote of a letter) in a document image. The logical identifiers provide a context-restricted consideration of the containing text. While using specific logical dictionaries, a expectation-driven text recognition is possible to identify text parts of specific interest. The system has been implemented for the analysis of single-sided business letters in Common Lisp on a SUN 3/60 Workstation. It is running for a large population of different letters. The report also illustrates and discusses examples of typical results obtained by the system

    UNDERSTANDING HANDWRITTEN TEXT IN A STRUCTURED ENVIRONMENT: DETERMINING ZIP CODES FROM ADDRESSES

    Full text link

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Multi-script handwritten character recognition:Using feature descriptors and machine learning

    Get PDF

    Offline printed Arabic character recognition

    Get PDF
    Optical Character Recognition (OCR) shows great potential for rapid data entry, but has limited success when applied to the Arabic language. Normal OCR problems are compounded by the right-to-left nature of Arabic and because the script is largely connected. This research investigates current approaches to the Arabic character recognition problem and innovates a new approach. The main work involves a Haar-Cascade Classifier (HCC) approach modified for the first time for Arabic character recognition. This technique eliminates the problematic steps in the pre-processing and recognition phases in additional to the character segmentation stage. A classifier was produced for each of the 61 Arabic glyphs that exist after the removal of diacritical marks. These 61 classifiers were trained and tested on an average of about 2,000 images each. A Multi-Modal Arabic Corpus (MMAC) has also been developed to support this work. MMAC makes innovative use of the new concept of connected segments of Arabic words (PAWs) with and without diacritics marks. These new tokens have significance for linguistic as well as OCR research and applications and have been applied here in the post-processing phase. A complete Arabic OCR application has been developed to manipulate the scanned images and extract a list of detected words. It consists of the HCC to extract glyphs, systems for parsing and correcting these glyphs and the MMAC to apply linguistic constrains. The HCC produces a recognition rate for Arabic glyphs of 87%. MMAC is based on 6 million words, is published on the web and has been applied and validated both in research and commercial use

    CAPTCHA Types and Breaking Techniques: Design Issues, Challenges, and Future Research Directions

    Full text link
    The proliferation of the Internet and mobile devices has resulted in malicious bots access to genuine resources and data. Bots may instigate phishing, unauthorized access, denial-of-service, and spoofing attacks to mention a few. Authentication and testing mechanisms to verify the end-users and prohibit malicious programs from infiltrating the services and data are strong defense systems against malicious bots. Completely Automated Public Turing test to tell Computers and Humans Apart (CAPTCHA) is an authentication process to confirm that the user is a human hence, access is granted. This paper provides an in-depth survey on CAPTCHAs and focuses on two main things: (1) a detailed discussion on various CAPTCHA types along with their advantages, disadvantages, and design recommendations, and (2) an in-depth analysis of different CAPTCHA breaking techniques. The survey is based on over two hundred studies on the subject matter conducted since 2003 to date. The analysis reinforces the need to design more attack-resistant CAPTCHAs while keeping their usability intact. The paper also highlights the design challenges and open issues related to CAPTCHAs. Furthermore, it also provides useful recommendations for breaking CAPTCHAs

    Choreographing and Reinventing Chinese Diasporic Identities - An East-West Collaboration

    Get PDF
    In demonstrating Eastern- and Western-based Chinese diasporic dances as equally critical and question-provoking in Chinese identity reconstructions, this research compares choreographic implications in the Hong Kong-Taiwan and Toronto-Vancouver dance milieus of recent decades (1990s 2010s). An auto-ethnographic study of Yuri Ngs (Hong Kong) and Lin Hwai-mins (Taiwan) works versus my own (Toronto) and Wen Wei Wangs (Vancouver), it probes identities choreographed in place-constituted third spaces between Chinese selves and Euro-American Others. I suggest that these identities perpetrate hybrid movements and aesthetics of geo-cultural-political distinctness from the Chinese ancestral land ones manifesting ultimate glocalization intersecting global political economies and local cultural-creative experiences. Echoing the diasporic habitats cultural and socio-historical specificities, they are constantly (re) appropriated and reinvented via translation, interpretation, negotiation, and integration of East-West cultural-artistic and socio-political ingredients. The event unfolds such identities placial uniqueness that indicates the same Chinese roots yet divergent diasporic routes. In reviewing Ngs balletic and contemporary photo-choreographic productions of post-British colonial Hong Kong-ness alongside Lins repertories of Chinese traditional, Taiwan indigenous, American modern and Other artistic impacts noting Taiwanese-ness, the study unearths cultural roots as the core source of Chinese identity rebuilding from East Asian displacements. It traces an ingrained third space between Chinese historic-social values, Western cultural elements, and Other performing artistries of Hong Kong and Taiwanese belongings. Juxtaposing my Chinese traditional-based and transcultural Toronto dance projects with Wangs Vancouver balletic-contemporary fusions of Chinese iconicity, Chinese-Canadian identities marked by a hyphenated (third/in-between) space are associated as varying North American self-generated routes of social and artistic possibilities in a Canadian mosaic-cosmopolitical setting the persistent state of Canadian becoming. My conclusion resolves the examined choreographic cases as continually developed through third-space instigated East-West cultural-political crossings plus interpenetrative local creativities and global receptivity. Of gains or losses, struggles or rebirths, the cases of placial-temporal significations elicit multiple questions on Chinese diasporic cultural infusions, social sustenance, artistic integrity, and identity representations amid East-West negotiations my experiential reflection on the dance role and potency in the reimagining and remaking of Chinese diasporic identities

    Bayesian Action–Perception Computational Model: Interaction of Production and Recognition of Cursive Letters

    Get PDF
    In this paper, we study the collaboration of perception and action representations involved in cursive letter recognition and production. We propose a mathematical formulation for the whole perception–action loop, based on probabilistic modeling and Bayesian inference, which we call the Bayesian Action–Perception (BAP) model. Being a model of both perception and action processes, the purpose of this model is to study the interaction of these processes. More precisely, the model includes a feedback loop from motor production, which implements an internal simulation of movement. Motor knowledge can therefore be involved during perception tasks. In this paper, we formally define the BAP model and show how it solves the following six varied cognitive tasks using Bayesian inference: i) letter recognition (purely sensory), ii) writer recognition, iii) letter production (with different effectors), iv) copying of trajectories, v) copying of letters, and vi) letter recognition (with internal simulation of movements). We present computer simulations of each of these cognitive tasks, and discuss experimental predictions and theoretical developments
    • …
    corecore