54 research outputs found

    Skip Trie Matching for Real-Time OCR Output Error Corrrection on Smartphones

    Get PDF
    Many Visually Impaired individuals are managing their daily activities with the help of smartphones. While there are many vision-based mobile applications to identify products, there is a relative dearth of applications for extracting useful nutrition information. In this report, we study the performance of existing OCR systems available for the Android platform, and choose the best to extract the nutrition facts information from U.S grocery store packages. We then provide approaches to improve the results of text strings produced by the Tesseract OCR engine on image segments of nutrition tables automatically extracted by an Android 2.3.6 smartphone application using real-time video streams of grocery products. We also present an algorithm, called Skip Trie Matching (STM), for real-time OCR output error correction on smartphones. The algorithm’s performance is compared with Apache Lucene’s spell checker. Our evaluation indicates that the average run time of the STM algorithm is lower than Lucene’s. (68 pages

    Modelling distances between genetically related languages using an extended weighted Levenshtein distance

    Get PDF
    Abstract: This article proposes the use of an extended weighted Levenshtein distance to model the time depth between parent and direct descendant languages and also the dialectal separation between sibling languages. The parent language is usually a proto-language, a hypothetical reconstructed language, whose precise date is usually conjectural. Phonology is used as an indicator of language difference, which is modelled by means of an extended weighted Levenshtein distance. This idea is applied specifically to the Iranian language family

    Synchronization using insertion/deletion correcting permutation codes

    Get PDF
    Abstract: In this paper, we present a fast synchronization coding scheme, which uses single insertion/deletion error correcting permutation codes. A possible application to M-ary FSK for the CENELEC A band power-line communications (PLC) is considered. Compared to conventional timing recovery schemes, no redundancies for preamble sequences and no processing delays from decision devices are needed

    JFLEG: A Fluency Corpus and Benchmark for Grammatical Error Correction

    Full text link
    We present a new parallel corpus, JHU FLuency-Extended GUG corpus (JFLEG) for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding. We describe the types of corrections made and benchmark four leading GEC systems on this corpus, identifying specific areas in which they do well and how they can improve. JFLEG fulfills the need for a new gold standard to properly assess the current state of GEC.Comment: To appear in EACL 2017 (short papers

    A simple way to estimate similarity between pairs of eye movement sequences

    Get PDF
    We propose a novel algorithm to estimate the similarity between a pair of eye movement sequences. The proposed algorithm relies on a straight-forward geometric representation of eye movement data. The algorithm is considerably simpler to implement and apply than existing similarity measures, and is particularly suited for exploratory analyses. To validate the algorithm, we conducted a benchmark experiment using realistic artificial eye movement data. Based on similarity ratings obtained from the proposed algorithm, we defined two clusters in an unlabelled set of eye movement sequences. As a measure of the algorithm's sensitivity, we quantified the extent to which these data-driven clusters matched two pre-defined groups (i.e., the 'real' clusters). The same analysis was performed using two other, commonly used similarity measures. The results show that the proposed algorithm is a viable similarity measure

    Evaluierung von signalnahen Spracherkennungssystemen für deutsche Spontansprache

    Get PDF

    Secure approximation of edit distance on genomic data

    Get PDF
    © 2017 The Author(s). Background: Edit distance is a well established metric to quantify how dissimilar two strings are by counting the minimum number of operations required to transform one string into the other. It is utilized in the domain of human genomic sequence similarity as it captures the requirements and leads to a better diagnosis of diseases. However, in addition to the computational complexity due to the large genomic sequence length, the privacy of these sequences are highly important. As these genomic sequences are unique and can identify an individual, these cannot be shared in a plaintext. Methods: In this paper, we propose two different approximation methods to securely compute the edit distance among genomic sequences. We use shingling, private set intersection methods, the banded alignment algorithm, and garbled circuits to implement these methods. We experimentally evaluate these methods and discuss both advantages and limitations. Results: Experimental results show that our first approximation method is fast and achieves similar accuracy compared to existing techniques. However, for longer genomic sequences, both the existing techniques and our proposed first method are unable to achieve a good accuracy. On the other hand, our second approximation method is able to achieve higher accuracy on such datasets. However, the second method is relatively slower than the first proposed method. Conclusion: The proposed algorithms are generally accurate, time-efficient and can be applied individually and jointly as they have complimentary properties (runtime vs. accuracy) on different types of datasets
    • …
    corecore