68 research outputs found

    Named-Entity Recognition in Business Card Images

    Get PDF
    We are surrounded by text everywhere: window signs, commercial logos and phone numbers plastered on trucks, flyers, take-away menus - and yet to capture and use all this information we essentially resort to typing these phone numbers and websites manually into a phone or computing device. We thought we should help change that, with the help of the mobile phone camera and OCR applications extracting the relevant textual information in these images. Basically, the problem can be seen as a two step process: ‱ Extract characters/words from the image by OCR ‱ Classify the words as Name, Email, Phone No, etc. Our work was more focussed on the first step - to reduce/minimize the time needed to perform the step given we want to make it usable for mobile computing devices. However, computing under handheld devices involves a number of challenges. Because of the non-contact nature of digital cameras attached to handheld devices, acquired images very often suffer from skew and perspective distortion. Since we have to separate text from graphics/background, segmentation/binarization algorithms play a vital role in the process, we studied, analyzed and impelemented existing standard algorithms. A number of thresholding techniques have been previously proposed using global and local techniques. OCR is done using Tesseract, which is an open-source OCR engine that was developed at HP between 1984 and 1994. The second step involves applying appropriate heuristics in order to achieve correct classification. Given a line of text, Named-Entity Recognition(NER) is in itself a different domain of research. We have come up heuristics to identify named-entities, the output of Step 1 is given as input and it displays the information in the relevant field

    Accessible Shopping System for Blind and Visually Impaired Individuals

    Get PDF
    Self-Dependency of disabled persons is very important in their daily lives. This is a cost effective prototype system that help blind persons to shop independently. This paper presents a camera-based label reader for blind persons to read names of labels on the products. The proposed framework can be classified as image capturing, data processing audio output. In Image capturing web Camera is use to capture the image of object or product packaging and captured image is send to the image processing Platform. In Data processing the image is processed internally and the text will get filtered from the image. Finally, the filtered texts are output to blind users in the form voice. The Tesseract library is used to get the plain text from text region and flite library is used to get audio output. This proposed framework is implemented using Raspberry Pi board

    Vision Based Extraction of Nutrition Information from Skewed Nutrition Labels

    Get PDF
    An important component of a healthy diet is the comprehension and retention of nutritional information and understanding of how different food items and nutritional constituents affect our bodies. In the U.S. and many other countries, nutritional information is primarily conveyed to consumers through nutrition labels (NLs) which can be found in all packaged food products. However, sometimes it becomes really challenging to utilize all this information available in these NLs even for consumers who are health conscious as they might not be familiar with nutritional terms or find it difficult to integrate nutritional data collection into their daily activities due to lack of time, motivation, or training. So it is essential to automate this data collection and interpretation process by integrating Computer Vision based algorithms to extract nutritional information from NLs because it improves the user’s ability to engage in continuous nutritional data collection and analysis. To make nutritional data collection more manageable and enjoyable for the users, we present a Proactive NUTrition Management System (PNUTS). PNUTS seeks to shift current research and clinical practices in nutrition management toward persuasion, automated nutritional information processing, and context-sensitive nutrition decision support. PNUTS consists of two modules, firstly a barcode scanning module which runs on smart phones and is capable of vision-based localization of One Dimensional (1D) Universal Product Code (UPC) and International Article Number (EAN) barcodes with relaxed pitch, roll, and yaw camera alignment constraints. The algorithm localizes barcodes in images by computing Dominant Orientations of Gradients (DOGs) of image segments and grouping smaller segments with similar DOGs into larger connected components. Connected components that pass given morphological criteria are marked as potential barcodes. The algorithm is implemented in a distributed, cloud-based system. The system’s front end is a smartphone application that runs on Android smartphones with Android 4.2 or higher. The system’s back end is deployed on a five node Linux cluster where images are processed. The algorithm was evaluated on a corpus of 7,545 images extracted from 506 videos of bags, bottles, boxes, and cans in a supermarket. The DOG algorithm was coupled to our in-place scanner for 1D UPC and EAN barcodes. The scanner receives from the DOG algorithm the rectangular planar dimensions of a connected component and the component’s dominant gradient orientation angle referred to as the skew angle. The scanner draws several scan lines at that skew angle within the component to recognize the barcode in place without any rotations. The scanner coupled to the localizer was tested on the same corpus of 7,545 images. Laboratory experiments indicate that the system can localize and scan barcodes of any orientation in the yaw plane, of up to 73.28 degrees in the pitch plane, and of up to 55.5 degrees in the roll plane. The videos have been made public for all interested research communities to replicate our findings or to use them in their own research. The front end Android application is available for free download at Google Play under the title of NutriGlass. This module is also coupled to a comprehensive NL database from which nutritional information can be retrieved on demand. Currently our NL database consists of more than 230,000 products. The second module of PNUTS is an algorithm whose objective is to determine the text skew angle of an NL image without constraining the angle’s magnitude. The horizontal, vertical, and diagonal matrices of the (Two Dimensional) 2D Haar Wavelet Transform are used to identify 2D points with significant intensity changes. The set of points is bounded with a minimum area rectangle whose rotation angle is the text’s skew. The algorithm’s performance is compared with the performance of five text skew detection algorithms on 1001 U.S. nutrition label images and 2200 single- and multi-column document images in multiple languages. To ensure the reproducibility of the reported results, the source code of the algorithm and the image data have been made publicly available. If the skew angle is estimated correctly, optical character recognition (OCR) techniques can be used to extract nutrition information

    FingerReader: A Wearable Device to Explore Printed Text on the Go

    Get PDF
    Accessing printed text in a mobile context is a major challenge for the blind. A preliminary study with blind people reveals numerous difficulties with existing state-of-the-art technologies including problems with alignment, focus, accuracy, mobility and efficiency. In this paper, we present a finger-worn device, FingerReader, that assists blind users with reading printed text on the go. We introduce a novel computer vision algorithm for local-sequential text scanning that enables reading single lines, blocks of text or skimming the text with complementary, multimodal feedback. This system is implemented in a small finger-worn form factor, that enables a more manageable eyes-free operation with trivial setup. We offer findings from three studies performed to determine the usability of the FingerReader.SUTD-MIT International Design Centr

    Multibiometric security in wireless communication systems

    Get PDF
    This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University, 05/08/2010.This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present one’s fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user. The following three steps then involve speaker recognition including the user responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user. In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and sliding neighborhood) have been followed with further two steps for embedding, and extracting the watermark into the enhanced fingerprint image utilising Discrete Wavelet Transform (DWT). In the speaker recognition stage, the limitations of this technique in wireless communication have been addressed by sending voice feature (cepstral coefficients) instead of raw sample. This scheme is to reap the advantages of reducing the transmission time and dependency of the data on communication channel, together with no loss of packet. Finally, the obtained results have verified the claims

    Skip Trie Matching for Real-Time OCR Output Error Corrrection on Smartphones

    Get PDF
    Many Visually Impaired individuals are managing their daily activities with the help of smartphones. While there are many vision-based mobile applications to identify products, there is a relative dearth of applications for extracting useful nutrition information. In this report, we study the performance of existing OCR systems available for the Android platform, and choose the best to extract the nutrition facts information from U.S grocery store packages. We then provide approaches to improve the results of text strings produced by the Tesseract OCR engine on image segments of nutrition tables automatically extracted by an Android 2.3.6 smartphone application using real-time video streams of grocery products. We also present an algorithm, called Skip Trie Matching (STM), for real-time OCR output error correction on smartphones. The algorithm’s performance is compared with Apache Lucene’s spell checker. Our evaluation indicates that the average run time of the STM algorithm is lower than Lucene’s. (68 pages

    Parking lot monitoring system using an autonomous quadrotor UAV

    Get PDF
    The main goal of this thesis is to develop a drone-based parking lot monitoring system using low-cost hardware and open-source software. Similar to wall-mounted surveillance cameras, a drone-based system can monitor parking lots without affecting the flow of traffic while also offering the mobility of patrol vehicles. The Parrot AR Drone 2.0 is the quadrotor drone used in this work due to its modularity and cost efficiency. Video and navigation data (including GPS) are communicated to a host computer using a Wi-Fi connection. The host computer analyzes navigation data using a custom flight control loop to determine control commands to be sent to the drone. A new license plate recognition pipeline is used to identify license plates of vehicles from video received from the drone

    Image-Based Mobile Service: Automatic Text Extraction and Translation

    Get PDF
    We present a new mobile service for the translation of text from images taken by consumer-grade cell-phone cameras. Such capability represents a new paradigm for users where a simple image provides the basis for a service. The ubiquity and ease of use of cell-phone cameras enables acquisition and transmission of images anywhere and at any time a user wishes, delivering rapid and accurate translation over the phone’s MMS and SMS facilities. Target text is extracted completely automatically, requiring no bounding box delineation or related user intervention. The service uses localization, binarization, text deskewing, and optical character recognition (OCR) in its analysis. Once the text is translated, an SMS message is sent to the user with the result. Further novelties include that no software installation is required on the handset, any service provider or camera phone can be used, and the entire service is implemented on the server side
    • 

    corecore