67 research outputs found

    Computationally Efficient Implementation of Convolution-based Locally Adaptive Binarization Techniques

    Full text link
    One of the most important steps of document image processing is binarization. The computational requirements of locally adaptive binarization techniques make them unsuitable for devices with limited computing facilities. In this paper, we have presented a computationally efficient implementation of convolution based locally adaptive binarization techniques keeping the performance comparable to the original implementation. The computational complexity has been reduced from O(W2N2) to O(WN2) where WxW is the window size and NxN is the image size. Experiments over benchmark datasets show that the computation time has been reduced by 5 to 15 times depending on the window size while memory consumption remains the same with respect to the state-of-the-art algorithmic implementation

    Eyes-Free Vision-Based Scanning of Aligned Barcodes and Information Extraction from Aligned Nutrition Tables

    Get PDF
    Visually impaired (VI) individuals struggle with grocery shopping and have to rely on either friends, family or grocery store associates for shopping. ShopMobile 2 is a proof-of-concept system that allows VI shoppers to shop independently in a grocery store using only their smartphone. Unlike other assistive shopping systems that use dedicated hardware, this system is a software only solution that relies on fast computer vision algorithms. It consists of three modules - an eyes free barcode scanner, an optical character recognition (OCR) module, and a tele-assistance module. The eyes-free barcode scanner allows VI shoppers to locate and retrieve products by scanning barcodes on shelves and on products. The OCR module allows shoppers to read nutrition facts on products and the tele-assistance module allows them to obtain help from sighted individuals at remote locations. This dissertation discusses, provides implementations of, and presents laboratory and real-world experiments related to all three modules

    FingerReader: A Wearable Device to Explore Printed Text on the Go

    Get PDF
    Accessing printed text in a mobile context is a major challenge for the blind. A preliminary study with blind people reveals numerous difficulties with existing state-of-the-art technologies including problems with alignment, focus, accuracy, mobility and efficiency. In this paper, we present a finger-worn device, FingerReader, that assists blind users with reading printed text on the go. We introduce a novel computer vision algorithm for local-sequential text scanning that enables reading single lines, blocks of text or skimming the text with complementary, multimodal feedback. This system is implemented in a small finger-worn form factor, that enables a more manageable eyes-free operation with trivial setup. We offer findings from three studies performed to determine the usability of the FingerReader.SUTD-MIT International Design Centr

    Preprocessing for Images Captured by Cameras

    Get PDF

    Named-Entity Recognition in Business Card Images

    Get PDF
    We are surrounded by text everywhere: window signs, commercial logos and phone numbers plastered on trucks, flyers, take-away menus - and yet to capture and use all this information we essentially resort to typing these phone numbers and websites manually into a phone or computing device. We thought we should help change that, with the help of the mobile phone camera and OCR applications extracting the relevant textual information in these images. Basically, the problem can be seen as a two step process: ‱ Extract characters/words from the image by OCR ‱ Classify the words as Name, Email, Phone No, etc. Our work was more focussed on the first step - to reduce/minimize the time needed to perform the step given we want to make it usable for mobile computing devices. However, computing under handheld devices involves a number of challenges. Because of the non-contact nature of digital cameras attached to handheld devices, acquired images very often suffer from skew and perspective distortion. Since we have to separate text from graphics/background, segmentation/binarization algorithms play a vital role in the process, we studied, analyzed and impelemented existing standard algorithms. A number of thresholding techniques have been previously proposed using global and local techniques. OCR is done using Tesseract, which is an open-source OCR engine that was developed at HP between 1984 and 1994. The second step involves applying appropriate heuristics in order to achieve correct classification. Given a line of text, Named-Entity Recognition(NER) is in itself a different domain of research. We have come up heuristics to identify named-entities, the output of Step 1 is given as input and it displays the information in the relevant field

    Print Culture, Digital Culture, Poetics and Hermeneutics: Discussion with J. Hillis Miller

    Get PDF
    This paper is a response to Hillis Miller’s query on the author’s essay “Hillis Miller on the End of Literature.” The author basically agrees with Miller’s view on the shift from print culture to digital culture, explaining the special cultural context under which Chinese scholars emphasize the visual turn. Based on the rapid development of Chinese online literature, the author points out that print culture does not rival but coexists with digital culture. On the other hand, drawing on Aristotle’s Poetics and insights of several leading figures of contemporary hermeneutics, the author contends that Miller’s dichotomy of poetics (form) and hermeneutics (content) is one-sided, since the two are compatible and integral, with concern for both content and form

    DocMIR: An automatic document-based indexing system for meeting retrieval

    Get PDF
    This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically apply the above-mentioned procedures to a lecture and automatically index the event according to the presented slides and their contents. For indexing, the system requires neither specific software installed on the presenter's computer nor any conscious intervention of the speaker throughout the presentation. The only material required by the system is the electronic presentation file of the speaker. Even if not provided, the system would temporally segment the presentation and offer a simple storyboard-like browsing interface. The system runs on several capture boxes connected to cameras and microphones that records events, synchronously. Once the recording is over, indexing is automatically performed by analyzing the content of the captured video containing projected documents and detects the scene changes, identifies the documents, computes their duration and extracts their textual content. Each of the captured images is identified from a repository containing all original electronic documents, captured audio-visual data and metadata created during post-production. The identification is based on documents' signatures, which hierarchically structure features from both layout structure and color distributions of the document images. Video segments are finally enriched with textual content of the identified original documents, which further facilitate the query and retrieval without using OCR. The signature-based indexing method proposed in this article is robust and works with low-resolution images and can be applied to several other applications including real-time document recognition, multimedia IR and augmented reality system

    Multibiometric security in wireless communication systems

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 05/08/2010.This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present one’s fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user. The following three steps then involve speaker recognition including the user responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user. In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and sliding neighborhood) have been followed with further two steps for embedding, and extracting the watermark into the enhanced fingerprint image utilising Discrete Wavelet Transform (DWT). In the speaker recognition stage, the limitations of this technique in wireless communication have been addressed by sending voice feature (cepstral coefficients) instead of raw sample. This scheme is to reap the advantages of reducing the transmission time and dependency of the data on communication channel, together with no loss of packet. Finally, the obtained results have verified the claims
    • 

    corecore