1,933 research outputs found

    Word Recognition in Nutrition Labels with Convolutional Neural Network

    Get PDF
    Nowadays, everyone is very busy and running around trying to maintain a balance between their work life and family, as the working hours are increasing day by day. In such hassled life people either ignore or do not give enough attention to a healthy diet. An imperative part of a healthy eating routine is the cognizance and maintenance of nourishing data and comprehension of how extraordinary sustenance and nutritious constituents influence our bodies. Besides in the USA, in many other countries, nutritional information is fundamentally passed on to consumers through nutrition labels (NLs) which can be found in all packaged food products in the form of nutrition table. However, sometimes it turns out to be challenging to utilize this information available in these NLs notwithstanding for consumers who are health conscious as they may not be familiar with nutritional terms and discover it hard to relate nutritional information into their day by day activities because of lack of time, inspiration, or training. So it is essential to automate this information gathering and interpretation procedure by incorporating Machine Learning based algorithm to abstract nutritional information from NLs on the grounds that it enhances the consumer’s capacity to participate in nonstop nutritional information gathering and analysis

    Vision Based Extraction of Nutrition Information from Skewed Nutrition Labels

    Get PDF
    An important component of a healthy diet is the comprehension and retention of nutritional information and understanding of how different food items and nutritional constituents affect our bodies. In the U.S. and many other countries, nutritional information is primarily conveyed to consumers through nutrition labels (NLs) which can be found in all packaged food products. However, sometimes it becomes really challenging to utilize all this information available in these NLs even for consumers who are health conscious as they might not be familiar with nutritional terms or find it difficult to integrate nutritional data collection into their daily activities due to lack of time, motivation, or training. So it is essential to automate this data collection and interpretation process by integrating Computer Vision based algorithms to extract nutritional information from NLs because it improves the user’s ability to engage in continuous nutritional data collection and analysis. To make nutritional data collection more manageable and enjoyable for the users, we present a Proactive NUTrition Management System (PNUTS). PNUTS seeks to shift current research and clinical practices in nutrition management toward persuasion, automated nutritional information processing, and context-sensitive nutrition decision support. PNUTS consists of two modules, firstly a barcode scanning module which runs on smart phones and is capable of vision-based localization of One Dimensional (1D) Universal Product Code (UPC) and International Article Number (EAN) barcodes with relaxed pitch, roll, and yaw camera alignment constraints. The algorithm localizes barcodes in images by computing Dominant Orientations of Gradients (DOGs) of image segments and grouping smaller segments with similar DOGs into larger connected components. Connected components that pass given morphological criteria are marked as potential barcodes. The algorithm is implemented in a distributed, cloud-based system. The system’s front end is a smartphone application that runs on Android smartphones with Android 4.2 or higher. The system’s back end is deployed on a five node Linux cluster where images are processed. The algorithm was evaluated on a corpus of 7,545 images extracted from 506 videos of bags, bottles, boxes, and cans in a supermarket. The DOG algorithm was coupled to our in-place scanner for 1D UPC and EAN barcodes. The scanner receives from the DOG algorithm the rectangular planar dimensions of a connected component and the component’s dominant gradient orientation angle referred to as the skew angle. The scanner draws several scan lines at that skew angle within the component to recognize the barcode in place without any rotations. The scanner coupled to the localizer was tested on the same corpus of 7,545 images. Laboratory experiments indicate that the system can localize and scan barcodes of any orientation in the yaw plane, of up to 73.28 degrees in the pitch plane, and of up to 55.5 degrees in the roll plane. The videos have been made public for all interested research communities to replicate our findings or to use them in their own research. The front end Android application is available for free download at Google Play under the title of NutriGlass. This module is also coupled to a comprehensive NL database from which nutritional information can be retrieved on demand. Currently our NL database consists of more than 230,000 products. The second module of PNUTS is an algorithm whose objective is to determine the text skew angle of an NL image without constraining the angle’s magnitude. The horizontal, vertical, and diagonal matrices of the (Two Dimensional) 2D Haar Wavelet Transform are used to identify 2D points with significant intensity changes. The set of points is bounded with a minimum area rectangle whose rotation angle is the text’s skew. The algorithm’s performance is compared with the performance of five text skew detection algorithms on 1001 U.S. nutrition label images and 2200 single- and multi-column document images in multiple languages. To ensure the reproducibility of the reported results, the source code of the algorithm and the image data have been made publicly available. If the skew angle is estimated correctly, optical character recognition (OCR) techniques can be used to extract nutrition information

    Adaptive Methods for Robust Document Image Understanding

    Get PDF
    A vast amount of digital document material is continuously being produced as part of major digitization efforts around the world. In this context, generic and efficient automatic solutions for document image understanding represent a stringent necessity. We propose a generic framework for document image understanding systems, usable for practically any document types available in digital form. Following the introduced workflow, we shift our attention to each of the following processing stages in turn: quality assurance, image enhancement, color reduction and binarization, skew and orientation detection, page segmentation and logical layout analysis. We review the state of the art in each area, identify current defficiencies, point out promising directions and give specific guidelines for future investigation. We address some of the identified issues by means of novel algorithmic solutions putting special focus on generality, computational efficiency and the exploitation of all available sources of information. More specifically, we introduce the following original methods: a fully automatic detection of color reference targets in digitized material, accurate foreground extraction from color historical documents, font enhancement for hot metal typesetted prints, a theoretically optimal solution for the document binarization problem from both computational complexity- and threshold selection point of view, a layout-independent skew and orientation detection, a robust and versatile page segmentation method, a semi-automatic front page detection algorithm and a complete framework for article segmentation in periodical publications. The proposed methods are experimentally evaluated on large datasets consisting of real-life heterogeneous document scans. The obtained results show that a document understanding system combining these modules is able to robustly process a wide variety of documents with good overall accuracy

    Arbitrary Keyword Spotting in Handwritten Documents

    Get PDF
    Despite the existence of electronic media in today’s world, a considerable amount of written communications is in paper form such as books, bank cheques, contracts, etc. There is an increasing demand for the automation of information extraction, classification, search, and retrieval of documents. The goal of this research is to develop a complete methodology for the spotting of arbitrary keywords in handwritten document images. We propose a top-down approach to the spotting of keywords in document images. Our approach is composed of two major steps: segmentation and decision. In the former, we generate the word hypotheses. In the latter, we decide whether a generated word hypothesis is a specific keyword or not. We carry out the decision step through a two-level classification where first, we assign an input image to a keyword or non-keyword class; and then transcribe the image if it is passed as a keyword. By reducing the problem from the image domain to the text domain, we do not only address the search problem in handwritten documents, but also the classification and retrieval, without the need for the transcription of the whole document image. The main contribution of this thesis is the development of a generalized minimum edit distance for handwritten words, and to prove that this distance is equivalent to an Ergodic Hidden Markov Model (EHMM). To the best of our knowledge, this work is the first to present an exact 2D model for the temporal information in handwriting while satisfying practical constraints. Some other contributions of this research include: 1) removal of page margins based on corner detection in projection profiles; 2) removal of noise patterns in handwritten images using expectation maximization and fuzzy inference systems; 3) extraction of text lines based on fast Fourier-based steerable filtering; 4) segmentation of characters based on skeletal graphs; and 5) merging of broken characters based on graph partitioning. Our experiments with a benchmark database of handwritten English documents and a real-world collection of handwritten French documents indicate that, even without any word/document-level training, our results are comparable with two state-of-the-art word spotting systems for English and French documents

    Neural Networks for Document Image and Text Processing

    Full text link
    Nowadays, the main libraries and document archives are investing a considerable effort on digitizing their collections. Indeed, most of them are scanning the documents and publishing the resulting images without their corresponding transcriptions. This seriously limits the document exploitation possibilities. When the transcription is necessary, it is manually performed by human experts, which is a very expensive and error-prone task. Obtaining transcriptions to the level of required quality demands the intervention of human experts to review and correct the resulting output of the recognition engines. To this end, it is extremely useful to provide interactive tools to obtain and edit the transcription. Although text recognition is the final goal, several previous steps (known as preprocessing) are necessary in order to get a fine transcription from a digitized image. Document cleaning, enhancement, and binarization (if they are needed) are the first stages of the recognition pipeline. Historical Handwritten Documents, in addition, show several degradations, stains, ink-trough and other artifacts. Therefore, more sophisticated and elaborate methods are required when dealing with these kind of documents, even expert supervision in some cases is needed. Once images have been cleaned, main zones of the image have to be detected: those that contain text and other parts such as images, decorations, versal letters. Moreover, the relations among them and the final text have to be detected. Those preprocessing steps are critical for the final performance of the system since an error at this point will be propagated during the rest of the transcription process. The ultimate goal of the Document Image Analysis pipeline is to receive the transcription of the text (Optical Character Recognition and Handwritten Text Recognition). During this thesis we aimed to improve the main stages of the recognition pipeline, from the scanned documents as input to the final transcription. We focused our effort on applying Neural Networks and deep learning techniques directly on the document images to extract suitable features that will be used by the different tasks dealt during the following work: Image Cleaning and Enhancement (Document Image Binarization), Layout Extraction, Text Line Extraction, Text Line Normalization and finally decoding (or text line recognition). As one can see, the following work focuses on small improvements through the several Document Image Analysis stages, but also deals with some of the real challenges: historical manuscripts and documents without clear layouts or very degraded documents. Neural Networks are a central topic for the whole work collected in this document. Different convolutional models have been applied for document image cleaning and enhancement. Connectionist models have been used, as well, for text line extraction: first, for detecting interest points and combining them in text segments and, finally, extracting the lines by means of aggregation techniques; and second, for pixel labeling to extract the main body area of the text and then the limits of the lines. For text line preprocessing, i.e., to normalize the text lines before recognizing them, similar models have been used to detect the main body area and then to height-normalize the images giving more importance to the central area of the text. Finally, Convolutional Neural Networks and deep multilayer perceptrons have been combined with hidden Markov models to improve our transcription engine significantly. The suitability of all these approaches has been tested with different corpora for any of the stages dealt, giving competitive results for most of the methodologies presented.Hoy en día, las principales librerías y archivos está invirtiendo un esfuerzo considerable en la digitalización de sus colecciones. De hecho, la mayoría están escaneando estos documentos y publicando únicamente las imágenes sin transcripciones, limitando seriamente la posibilidad de explotar estos documentos. Cuando la transcripción es necesaria, esta se realiza normalmente por expertos de forma manual, lo cual es una tarea costosa y propensa a errores. Si se utilizan sistemas de reconocimiento automático se necesita la intervención de expertos humanos para revisar y corregir la salida de estos motores de reconocimiento. Por ello, es extremadamente útil para proporcionar herramientas interactivas con el fin de generar y corregir la transcripciones. Aunque el reconocimiento de texto es el objetivo final del Análisis de Documentos, varios pasos previos (preprocesamiento) son necesarios para conseguir una buena transcripción a partir de una imagen digitalizada. La limpieza, mejora y binarización de las imágenes son las primeras etapas del proceso de reconocimiento. Además, los manuscritos históricos tienen una mayor dificultad en el preprocesamiento, puesto que pueden mostrar varios tipos de degradaciones, manchas, tinta a través del papel y demás dificultades. Por lo tanto, este tipo de documentos requiere métodos de preprocesamiento más sofisticados. En algunos casos, incluso, se precisa de la supervisión de expertos para garantizar buenos resultados en esta etapa. Una vez que las imágenes han sido limpiadas, las diferentes zonas de la imagen deben de ser localizadas: texto, gráficos, dibujos, decoraciones, letras versales, etc. Por otra parte, también es importante conocer las relaciones entre estas entidades. Estas etapas del pre-procesamiento son críticas para el rendimiento final del sistema, ya que los errores cometidos en aquí se propagarán al resto del proceso de transcripción. El objetivo principal del trabajo presentado en este documento es mejorar las principales etapas del proceso de reconocimiento completo: desde las imágenes escaneadas hasta la transcripción final. Nuestros esfuerzos se centran en aplicar técnicas de Redes Neuronales (ANNs) y aprendizaje profundo directamente sobre las imágenes de los documentos, con la intención de extraer características adecuadas para las diferentes tareas: Limpieza y Mejora de Documentos, Extracción de Líneas, Normalización de Líneas de Texto y, finalmente, transcripción del texto. Como se puede apreciar, el trabajo se centra en pequeñas mejoras en diferentes etapas del Análisis y Procesamiento de Documentos, pero también trata de abordar tareas más complejas: manuscritos históricos, o documentos que presentan degradaciones. Las ANNs y el aprendizaje profundo son uno de los temas centrales de esta tesis. Diferentes modelos neuronales convolucionales se han desarrollado para la limpieza y mejora de imágenes de documentos. También se han utilizado modelos conexionistas para la extracción de líneas: primero, para detectar puntos de interés y segmentos de texto y, agregarlos para extraer las líneas del documento; y en segundo lugar, etiquetando directamente los píxeles de la imagen para extraer la zona central del texto y así definir los límites de las líneas. Para el preproceso de las líneas de texto, es decir, la normalización del texto antes del reconocimiento final, se han utilizado modelos similares a los mencionados para detectar la zona central del texto. Las imagenes se rescalan a una altura fija dando más importancia a esta zona central. Por último, en cuanto a reconocimiento de escritura manuscrita, se han combinado técnicas de ANNs y aprendizaje profundo con Modelos Ocultos de Markov, mejorando significativamente los resultados obtenidos previamente por nuestro motor de reconocimiento. La idoneidad de todos estos enfoques han sido testeados con diferentes corpus en cada una de las tareas tratadas., obtenieAvui en dia, les principals llibreries i arxius històrics estan invertint un esforç considerable en la digitalització de les seues col·leccions de documents. De fet, la majoria estan escanejant aquests documents i publicant únicament les imatges sense les seues transcripcions, fet que limita seriosament la possibilitat d'explotació d'aquests documents. Quan la transcripció del text és necessària, normalment aquesta és realitzada per experts de forma manual, la qual cosa és una tasca costosa i pot provocar errors. Si s'utilitzen sistemes de reconeixement automàtic es necessita la intervenció d'experts humans per a revisar i corregir l'eixida d'aquests motors de reconeixement. Per aquest motiu, és extremadament útil proporcionar eines interactives amb la finalitat de generar i corregir les transcripcions generades pels motors de reconeixement. Tot i que el reconeixement del text és l'objectiu final de l'Anàlisi de Documents, diversos passos previs (coneguts com preprocessament) són necessaris per a l'obtenció de transcripcions acurades a partir d'imatges digitalitzades. La neteja, millora i binarització de les imatges (si calen) són les primeres etapes prèvies al reconeixement. A més a més, els manuscrits històrics presenten una major dificultat d'analisi i preprocessament, perquè poden mostrar diversos tipus de degradacions, taques, tinta a través del paper i altres peculiaritats. Per tant, aquest tipus de documents requereixen mètodes de preprocessament més sofisticats. En alguns casos, fins i tot, es precisa de la supervisió d'experts per a garantir bons resultats en aquesta etapa. Una vegada que les imatges han sigut netejades, les diferents zones de la imatge han de ser localitzades: text, gràfics, dibuixos, decoracions, versals, etc. D'altra banda, també és important conéixer les relacions entre aquestes entitats i el text que contenen. Aquestes etapes del preprocessament són crítiques per al rendiment final del sistema, ja que els errors comesos en aquest moment es propagaran a la resta del procés de transcripció. L'objectiu principal del treball que estem presentant és millorar les principals etapes del procés de reconeixement, és a dir, des de les imatges escanejades fins a l'obtenció final de la transcripció del text. Els nostres esforços se centren en aplicar tècniques de Xarxes Neuronals (ANNs) i aprenentatge profund directament sobre les imatges de documents, amb la intenció d'extraure característiques adequades per a les diferents tasques analitzades: neteja i millora de documents, extracció de línies, normalització de línies de text i, finalment, transcripció. Com es pot apreciar, el treball realitzat aplica xicotetes millores en diferents etapes de l'Anàlisi de Documents, però també tracta d'abordar tasques més complexes: manuscrits històrics, o documents que presenten degradacions. Les ANNs i l'aprenentatge profund són un dels temes centrals d'aquesta tesi. Diferents models neuronals convolucionals s'han desenvolupat per a la neteja i millora de les dels documents. També s'han utilitzat models connexionistes per a la tasca d'extracció de línies: primer, per a detectar punts d'interés i segments de text i, agregar-los per a extraure les línies del document; i en segon lloc, etiquetant directament els pixels de la imatge per a extraure la zona central del text i així definir els límits de les línies. Per al preprocés de les línies de text, és a dir, la normalització del text abans del reconeixement final, s'han utilitzat models similars als utilitzats per a l'extracció de línies. Finalment, quant al reconeixement d'escriptura manuscrita, s'han combinat tècniques de ANNs i aprenentatge profund amb Models Ocults de Markov, que han millorat significativament els resultats obtinguts prèviament pel nostre motor de reconeixement. La idoneïtat de tots aquests enfocaments han sigut testejats amb diferents corpus en cadascuna de les tasques tractadPastor Pellicer, J. (2017). Neural Networks for Document Image and Text Processing [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/90443TESI

    Character Recognition

    Get PDF
    Character recognition is one of the pattern recognition technologies that are most widely used in practical applications. This book presents recent advances that are relevant to character recognition, from technical topics such as image processing, feature extraction or classification, to new applications including human-computer interfaces. The goal of this book is to provide a reference source for academic research and for professionals working in the character recognition field

    Automatic handwriter identification using advanced machine learning

    Get PDF
    Handwriter identification a challenging problem especially for forensic investigation. This topic has received significant attention from the research community and several handwriter identification systems were developed for various applications including forensic science, document analysis and investigation of the historical documents. This work is part of an investigation to develop new tools and methods for Arabic palaeography, which is is the study of handwritten material, particularly ancient manuscripts with missing writers, dates, and/or places. In particular, the main aim of this research project is to investigate and develop new techniques and algorithms for the classification and analysis of ancient handwritten documents to support palaeographic studies. Three contributions were proposed in this research. The first is concerned with the development of a text line extraction algorithm on colour and greyscale historical manuscripts. The idea uses a modified bilateral filtering approach to adaptively smooth the images while still preserving the edges through a nonlinear combination of neighboring image values. The proposed algorithm aims to compute a median and a separating seam and has been validated to deal with both greyscale and colour historical documents using different datasets. The results obtained suggest that our proposed technique yields attractive results when compared against a few similar algorithms. The second contribution proposes to deploy a combination of Oriented Basic Image features and the concept of graphemes codebook in order to improve the recognition performances. The proposed algorithm is capable to effectively extract the most distinguishing handwriter’s patterns. The idea consists of judiciously combining a multiscale feature extraction with the concept of grapheme to allow for the extraction of several discriminating features such as handwriting curvature, direction, wrinkliness and various edge-based features. The technique was validated for identifying handwriters using both Arabic and English writings captured as scanned images using the IAM dataset for English handwriting and ICFHR 2012 dataset for Arabic handwriting. The results obtained clearly demonstrate the effectiveness of the proposed method when compared against some similar techniques. The third contribution is concerned with an offline handwriter identification approach based on the convolutional neural network technology. At the first stage, the Alex-Net architecture was employed to learn image features (handwritten scripts) and the features obtained from the fully connected layers of the model. Then, a Support vector machine classifier is deployed to classify the writing styles of the various handwriters. In this way, the test scripts can be classified by the CNN training model for further classification. The proposed approach was evaluated based on Arabic Historical datasets; Islamic Heritage Project (IHP) and Qatar National Library (QNL). The obtained results demonstrated that the proposed model achieved superior performances when compared to some similar method
    • …
    corecore