51 research outputs found

    Large vocabulary recognition for online Turkish handwriting with sublexical units

    Get PDF
    We present a system for large vocabulary recognition of online Turkish handwriting, using hidden Markov models. While using a traditional approach for the recognizer, we have identified and developed solutions for the main problems specific to Turkish handwriting recognition. First, since large amounts of Turkish handwriting samples are not available, the system is trained and optimized using the large UNIPEN dataset of English handwriting, before extending it to Turkish using a small Turkish dataset. The delayed strokes, which pose a significant source of variation in writing order due to the large number of diacritical marks in Turkish, are removed during preprocessing. Finally, as a solution to the high out-of-vocabulary rates encountered when using a fixed size lexicon in general purpose recognition, a lexicon is constructed from sublexical units (stems and endings) learned from a large Turkish corpus. A statistical bigram language model learned from the same corpus is also applied during the decoding process. The system obtains a 91.7% word recognition rate when tested on a small Turkish handwritten word dataset using a medium sized (1950 words) lexicon corresponding to the vocabulary of the test set and 63.8% using a large, general purpose lexicon (130,000 words). However, with the proposed stem+ending lexicon (12,500 words) and bigram language model with lattice expansion, a 67.9% word recognition accuracy is obtained, surpassing the results obtained with the general purpose lexicon while using a much smaller one

    Document image processing using irregular pyramid structure

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A large vocabulary online handwriting recognition system for Turkish

    Get PDF
    Handwriting recognition in general and online handwriting recognition in particular has been an active research area for several decades. Most of the research have been focused on English and recently on other scripts like Arabic and Chinese. There is a lack of research on recognition in Turkish text and this work primarily fills that gap with a state-of-the-art recognizer for the first time. It contains design and implementation details of a complete recognition system for recognition of Turkish isolated words. Based on the Hidden Markov Models, the system comprises pre-processing, feature extraction, optical modeling and language modeling modules. It considers the recognition of unconstrained handwriting with a limited vocabulary size first and then evolves to a large vocabulary system. Turkish script has many similarities with other Latin scripts, like English, which makes it possible to adapt strategies that work for them. However, there are some other issues which are particular to Turkish that should be taken into consideration separately. Two of the challenging issues in recognition of Turkish text are determined as delayed strokes which introduce an extra source of variation in the sequence order of the handwritten input and high Out-of-Vocabulary (OOV) rate of Turkish when words are used as vocabulary units in the decoding process. This work examines the problems and alternative solutions at depth and proposes suitable solutions for Turkish script particularly. In delayed stroke handling, first a clear definition of the delayed strokes is developed and then using that definition some alternative handling methods are evaluated extensively on the UNIPEN and Turkish datasets. The best results are obtained by removing all delayed strokes, with up to 2.13% and 2.03% points recognition accuracy increases, over the respective baselines of English and Turkish. The overall system performances are assessed as 86.1% with a 1,000-word lexicon and 83.0% with a 3,500-word lexicon on the UNIPEN dataset and 91.7% on the Turkish dataset. Alternative decoding vocabularies are designed with grammatical sub-lexical units in order to solve the problem of high OOV rate. Additionally, statistical bi-gram and tri-gram language models are applied during the decoding process. The best performance, 67.9% is obtained by the large stem-ending vocabulary that is expanded with a bi-gram model on the Turkish dataset. This result is superior to the accuracy of the word-based vocabulary (63.8%) with the same coverage of 95% on the BOUN Web Corpus

    Table recognition in mathematical documents

    Get PDF
    While a number of techniques have been developed for table recognition in ordinary text documents, when dealing with tables in mathematical documents these techniques are often ineffective as tables containing mathematical structures can differ quite significantly from ordinary text tables. In fact, it is even difficult to clearly distinguish table recognition in mathematics from layout analysis of mathematical formulas. Again, it is not straight forward to adapt general layout analysis techniques for mathematical formulas. However, a reliable understanding of formula layout is often a necessary prerequisite to further semantic interpretation of the represented formulae. In this thesis, we present the necessary preprocessing steps towards a table recognition technique that specialises on tables in mathematical documents. It is based on our novel robust line recognition technique for mathematical expressions, which is fully independent of understanding the content or specialist fonts of expressions. We also present a graph representation for complex mathematical table structures. A set of rewriting rules applied to the graph allows for reliable re-composition of cells in order to identify several valid table interpretations. We demonstrate the effectiveness of our technique by applying them to a set of mathematical tables from standard text book that has been manually ground-truthed

    Feature Extraction Methods for Character Recognition

    Get PDF
    Not Include

    Methoden der lexikalischen Nachkorrektur OCR-erfasster Dokumente

    Get PDF
    Das maschinelle Lesen, d. h. die Umwandlung gedruckter Dokumente via Pixelrepräsentation in eine Symbolfolgen, erfolgt mit heute verfügbaren, kommerziellen OCR-Engines für viele Dokumentklassen fast schon fehlerfrei. Trotzdem gilt für die meisten OCR-Anwendungen die Devise, je weniger Fehler, desto besser. Beispielsweise kann ein falsch erkannter Name innerhalb eines Geschäftsbriefes in einem automatisierten System zur Eingangsspostverteilung unnötige Kosten durch Fehlzuordnungen o.ä. verursachen. Eine lexikalische Nachkorrektur hilft, verbleibende Fehler von OCR-Engines aufzuspüren, zu korrigieren oder auch mit einer interaktiven Korrektur zu beseitigen. Neben einer Realisierung als nachgelagerte, externe Komponente, kann eine lexikalische Nachkorrektur auch direkt in eine OCR-Engine integriert werden. Meinen Beitrag zur lexikalischen Nachkorrektur habe ich in zehn Thesen untergliedert: These T1: Für eine Nachkorrektur von OCR-gelesenen Fachtexten können Lexika, die aus thematisch verwandten Web-Dokumenten stammen, gewinnbringend eingesetzt werden. These T2: Das Vokabular eines Fachtexts wird von großen Standardlexika unzureichend abgedeckt. Durch Textextraktion aus thematisch verwandten Web-Dokumenten lassen sich Lexika mit einer höheren Abdeckungsrate gewinnen. Zudem spiegeln die Frequenzinformationen aus diesen Web-Dokumenten die des Fachtexts besser wider als Frequenzinformationen aus Standardkorpora. These T3: Automatisierte Anfragen an Suchmaschinen bieten einen geeigneten Zugang zu den einschlägigen Web-Dokumenten eines Fachgebiets. These T4: Eine feingliedrige Fehlerklassifikation erlaubt die Lokalisierung der beiden Hauptfehlerquellen der webgestützten Nachkorrektur: • falsche Freunde, d. h. Fehler, die unentdeckt bleiben, da sie lexikalisch sind • unglückliche Korrekturen hin zu Orthographie- oder Flexions-Varianten These T5: Falsche Freunde werden durch eine Kombination mehrerer OCR-Engines deutlich vermindert. These T6: Mit einfachen Heuristiken wird ein unglücklicher Variantenaustausch der Nachkorrekturkomponente vermieden. These T7: Mit einer Vereinheitlichung zu Scores lassen sich diverse OCR-Nachkorrekturhilfen wie etwa Wort-Abstandsmaße, Frequenz- und Kontextinformationen kombinieren und zur Kandidaten- sowie Grenzbestimmung einsetzen. These T8: OCR-Nachkorrektur ist ein multidimensionales Parameteroptimierungsproblem, wie z. B. Auswahl der Scores, deren Kombination und Gewichtung, Grenzbestimmung oder Lexikonauswahl. Eine graphische Oberfläche eignet sich für eine Untersuchung der Parameter und deren Adjustierung auf Trainingsdaten. These T9: Die Software zur Parameteroptimierung der Nachkorrektur der Resultate einer OCR-Engine kann für die Kombination mehrerer OCR-Engines wiederverwendet werden, indem die Einzelresultate der Engines wieder zu Scores vereinheitlicht werden. These T10: Eine Wort-zu-Wort-Alignierung, wie sie für die Groundtruth-Erstellung und die Kombination von OCR-Engines notwendig ist, kann durch eine Verallgemeinerung des Levenshtein-Abstands auf Wortebene effizient realisiert werden

    Methoden der lexikalischen Nachkorrektur OCR-erfasster Dokumente

    Get PDF
    Das maschinelle Lesen, d. h. die Umwandlung gedruckter Dokumente via Pixelrepräsentation in eine Symbolfolgen, erfolgt mit heute verfügbaren, kommerziellen OCR-Engines für viele Dokumentklassen fast schon fehlerfrei. Trotzdem gilt für die meisten OCR-Anwendungen die Devise, je weniger Fehler, desto besser. Beispielsweise kann ein falsch erkannter Name innerhalb eines Geschäftsbriefes in einem automatisierten System zur Eingangsspostverteilung unnötige Kosten durch Fehlzuordnungen o.ä. verursachen. Eine lexikalische Nachkorrektur hilft, verbleibende Fehler von OCR-Engines aufzuspüren, zu korrigieren oder auch mit einer interaktiven Korrektur zu beseitigen. Neben einer Realisierung als nachgelagerte, externe Komponente, kann eine lexikalische Nachkorrektur auch direkt in eine OCR-Engine integriert werden. Meinen Beitrag zur lexikalischen Nachkorrektur habe ich in zehn Thesen untergliedert: These T1: Für eine Nachkorrektur von OCR-gelesenen Fachtexten können Lexika, die aus thematisch verwandten Web-Dokumenten stammen, gewinnbringend eingesetzt werden. These T2: Das Vokabular eines Fachtexts wird von großen Standardlexika unzureichend abgedeckt. Durch Textextraktion aus thematisch verwandten Web-Dokumenten lassen sich Lexika mit einer höheren Abdeckungsrate gewinnen. Zudem spiegeln die Frequenzinformationen aus diesen Web-Dokumenten die des Fachtexts besser wider als Frequenzinformationen aus Standardkorpora. These T3: Automatisierte Anfragen an Suchmaschinen bieten einen geeigneten Zugang zu den einschlägigen Web-Dokumenten eines Fachgebiets. These T4: Eine feingliedrige Fehlerklassifikation erlaubt die Lokalisierung der beiden Hauptfehlerquellen der webgestützten Nachkorrektur: • falsche Freunde, d. h. Fehler, die unentdeckt bleiben, da sie lexikalisch sind • unglückliche Korrekturen hin zu Orthographie- oder Flexions-Varianten These T5: Falsche Freunde werden durch eine Kombination mehrerer OCR-Engines deutlich vermindert. These T6: Mit einfachen Heuristiken wird ein unglücklicher Variantenaustausch der Nachkorrekturkomponente vermieden. These T7: Mit einer Vereinheitlichung zu Scores lassen sich diverse OCR-Nachkorrekturhilfen wie etwa Wort-Abstandsmaße, Frequenz- und Kontextinformationen kombinieren und zur Kandidaten- sowie Grenzbestimmung einsetzen. These T8: OCR-Nachkorrektur ist ein multidimensionales Parameteroptimierungsproblem, wie z. B. Auswahl der Scores, deren Kombination und Gewichtung, Grenzbestimmung oder Lexikonauswahl. Eine graphische Oberfläche eignet sich für eine Untersuchung der Parameter und deren Adjustierung auf Trainingsdaten. These T9: Die Software zur Parameteroptimierung der Nachkorrektur der Resultate einer OCR-Engine kann für die Kombination mehrerer OCR-Engines wiederverwendet werden, indem die Einzelresultate der Engines wieder zu Scores vereinheitlicht werden. These T10: Eine Wort-zu-Wort-Alignierung, wie sie für die Groundtruth-Erstellung und die Kombination von OCR-Engines notwendig ist, kann durch eine Verallgemeinerung des Levenshtein-Abstands auf Wortebene effizient realisiert werden

    Computer vision based classification of fruits and vegetables for self-checkout at supermarkets

    Get PDF
    The field of machine learning, and, in particular, methods to improve the capability of machines to perform a wider variety of generalised tasks are among the most rapidly growing research areas in today’s world. The current applications of machine learning and artificial intelligence can be divided into many significant fields namely computer vision, data sciences, real time analytics and Natural Language Processing (NLP). All these applications are being used to help computer based systems to operate more usefully in everyday contexts. Computer vision research is currently active in a wide range of areas such as the development of autonomous vehicles, object recognition, Content Based Image Retrieval (CBIR), image segmentation and terrestrial analysis from space (i.e. crop estimation). Despite significant prior research, the area of object recognition still has many topics to be explored. This PhD thesis focuses on using advanced machine learning approaches to enable the automated recognition of fresh produce (i.e. fruits and vegetables) at supermarket self-checkouts. This type of complex classification task is one of the most recently emerging applications of advanced computer vision approaches and is a productive research topic in this field due to the limited means of representing the features and machine learning techniques for classification. Fruits and vegetables offer significant inter and intra class variance in weight, shape, size, colour and texture which makes the classification challenging. The applications of effective fruit and vegetable classification have significant importance in daily life e.g. crop estimation, fruit classification, robotic harvesting, fruit quality assessment, etc. One potential application for this fruit and vegetable classification capability is for supermarket self-checkouts. Increasingly, supermarkets are introducing self-checkouts in stores to make the checkout process easier and faster. However, there are a number of challenges with this as all goods cannot readily be sold with packaging and barcodes, for instance loose fresh items (e.g. fruits and vegetables). Adding barcodes to these types of items individually is impractical and pre-packaging limits the freedom of choice when selecting fruits and vegetables and creates additional waste, hence reducing customer satisfaction. The current situation, which relies on customers correctly identifying produce themselves leaves open the potential for incorrect billing either due to inadvertent error, or due to intentional fraudulent misclassification resulting in financial losses for the store. To address this identified problem, the main goals of this PhD work are: (a) exploring the types of visual and non-visual sensors that could be incorporated into a self-checkout system for classification of fruits and vegetables, (b) determining a suitable feature representation method for fresh produce items available at supermarkets, (c) identifying optimal machine learning techniques for classification within this context and (d) evaluating our work relative to the state-of-the-art object classification results presented in the literature. An in-depth analysis of related computer vision literature and techniques is performed to identify and implement the possible solutions. A progressive process distribution approach is used for this project where the task of computer vision based fruit and vegetables classification is divided into pre-processing and classification techniques. Different classification techniques have been implemented and evaluated as possible solution for this problem. Both visual and non-visual features of fruit and vegetables are exploited to perform the classification. Novel classification techniques have been carefully developed to deal with the complex and highly variant physical features of fruit and vegetables while taking advantages of both visual and non-visual features. The capability of classification techniques is tested in individual and ensemble manner to achieved the higher effectiveness. Significant results have been obtained where it can be concluded that the fruit and vegetables classification is complex task with many challenges involved. It is also observed that a larger dataset can better comprehend the complex variant features of fruit and vegetables. Complex multidimensional features can be extracted from the larger datasets to generalise on higher number of classes. However, development of a larger multiclass dataset is an expensive and time consuming process. The effectiveness of classification techniques can be significantly improved by subtracting the background occlusions and complexities. It is also worth mentioning that ensemble of simple and less complicated classification techniques can achieve effective results even if applied to less number of features for smaller number of classes. The combination of visual and nonvisual features can reduce the struggle of a classification technique to deal with higher number of classes with similar physical features. Classification of fruit and vegetables with similar physical features (i.e. colour and texture) needs careful estimation and hyper-dimensional embedding of visual features. Implementing rigorous classification penalties as loss function can achieve this goal at the cost of time and computational requirements. There is a significant need to develop larger datasets for different fruit and vegetables related computer vision applications. Considering more sophisticated loss function penalties and discriminative hyper-dimensional features embedding techniques can significantly improve the effectiveness of the classification techniques for the fruit and vegetables applications

    Making Presentation Math Computable

    Get PDF
    This Open-Access-book addresses the issue of translating mathematical expressions from LaTeX to the syntax of Computer Algebra Systems (CAS). Over the past decades, especially in the domain of Sciences, Technology, Engineering, and Mathematics (STEM), LaTeX has become the de-facto standard to typeset mathematical formulae in publications. Since scientists are generally required to publish their work, LaTeX has become an integral part of today's publishing workflow. On the other hand, modern research increasingly relies on CAS to simplify, manipulate, compute, and visualize mathematics. However, existing LaTeX import functions in CAS are limited to simple arithmetic expressions and are, therefore, insufficient for most use cases. Consequently, the workflow of experimenting and publishing in the Sciences often includes time-consuming and error-prone manual conversions between presentational LaTeX and computational CAS formats. To address the lack of a reliable and comprehensive translation tool between LaTeX and CAS, this thesis makes the following three contributions. First, it provides an approach to semantically enhance LaTeX expressions with sufficient semantic information for translations into CAS syntaxes. Second, it demonstrates the first context-aware LaTeX to CAS translation framework LaCASt. Third, the thesis provides a novel approach to evaluate the performance for LaTeX to CAS translations on large-scaled datasets with an automatic verification of equations in digital mathematical libraries. This is an open access book
    corecore