2,234 research outputs found

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    Vision Based Extraction of Nutrition Information from Skewed Nutrition Labels

    Get PDF
    An important component of a healthy diet is the comprehension and retention of nutritional information and understanding of how different food items and nutritional constituents affect our bodies. In the U.S. and many other countries, nutritional information is primarily conveyed to consumers through nutrition labels (NLs) which can be found in all packaged food products. However, sometimes it becomes really challenging to utilize all this information available in these NLs even for consumers who are health conscious as they might not be familiar with nutritional terms or find it difficult to integrate nutritional data collection into their daily activities due to lack of time, motivation, or training. So it is essential to automate this data collection and interpretation process by integrating Computer Vision based algorithms to extract nutritional information from NLs because it improves the user’s ability to engage in continuous nutritional data collection and analysis. To make nutritional data collection more manageable and enjoyable for the users, we present a Proactive NUTrition Management System (PNUTS). PNUTS seeks to shift current research and clinical practices in nutrition management toward persuasion, automated nutritional information processing, and context-sensitive nutrition decision support. PNUTS consists of two modules, firstly a barcode scanning module which runs on smart phones and is capable of vision-based localization of One Dimensional (1D) Universal Product Code (UPC) and International Article Number (EAN) barcodes with relaxed pitch, roll, and yaw camera alignment constraints. The algorithm localizes barcodes in images by computing Dominant Orientations of Gradients (DOGs) of image segments and grouping smaller segments with similar DOGs into larger connected components. Connected components that pass given morphological criteria are marked as potential barcodes. The algorithm is implemented in a distributed, cloud-based system. The system’s front end is a smartphone application that runs on Android smartphones with Android 4.2 or higher. The system’s back end is deployed on a five node Linux cluster where images are processed. The algorithm was evaluated on a corpus of 7,545 images extracted from 506 videos of bags, bottles, boxes, and cans in a supermarket. The DOG algorithm was coupled to our in-place scanner for 1D UPC and EAN barcodes. The scanner receives from the DOG algorithm the rectangular planar dimensions of a connected component and the component’s dominant gradient orientation angle referred to as the skew angle. The scanner draws several scan lines at that skew angle within the component to recognize the barcode in place without any rotations. The scanner coupled to the localizer was tested on the same corpus of 7,545 images. Laboratory experiments indicate that the system can localize and scan barcodes of any orientation in the yaw plane, of up to 73.28 degrees in the pitch plane, and of up to 55.5 degrees in the roll plane. The videos have been made public for all interested research communities to replicate our findings or to use them in their own research. The front end Android application is available for free download at Google Play under the title of NutriGlass. This module is also coupled to a comprehensive NL database from which nutritional information can be retrieved on demand. Currently our NL database consists of more than 230,000 products. The second module of PNUTS is an algorithm whose objective is to determine the text skew angle of an NL image without constraining the angle’s magnitude. The horizontal, vertical, and diagonal matrices of the (Two Dimensional) 2D Haar Wavelet Transform are used to identify 2D points with significant intensity changes. The set of points is bounded with a minimum area rectangle whose rotation angle is the text’s skew. The algorithm’s performance is compared with the performance of five text skew detection algorithms on 1001 U.S. nutrition label images and 2200 single- and multi-column document images in multiple languages. To ensure the reproducibility of the reported results, the source code of the algorithm and the image data have been made publicly available. If the skew angle is estimated correctly, optical character recognition (OCR) techniques can be used to extract nutrition information

    CIDOC2VEC: Extracting Information from Atomized CIDOC-CRM Humanities Knowledge Graphs

    Get PDF
    The development of the field of digital humanities in recent years has led to the increased use of knowledge graphs within the community. Many digital humanities projects tend to model their data based on CIDOC-CRM ontology, which offers a wide array of classes appropriate for storing humanities and cultural heritage data. The CIDOC-CRM ontology model leads to a knowledge graph structure in which many entities are often linked to each other through chains of relations, which means that relevant information often lies many hops away from their entities. In this paper, we present a method based on graph walks and text processing to extract entity information and provide semantically relevant embeddings. In the process, we were able to generate similarity recommendations as well as explore their underlying data structure. This approach was then demonstrated on the Sphaera Dataset which was modeled according to the CIDOC-CRM data structure

    Word grouping in imaged documents using voronoi tessellation

    Get PDF
    Master'sMASTER OF SCIENC

    Image Processing Applications in Real Life: 2D Fragmented Image and Document Reassembly and Frequency Division Multiplexed Imaging

    Get PDF
    In this era of modern technology, image processing is one the most studied disciplines of signal processing and its applications can be found in every aspect of our daily life. In this work three main applications for image processing has been studied. In chapter 1, frequency division multiplexed imaging (FDMI), a novel idea in the field of computational photography, has been introduced. Using FDMI, multiple images are captured simultaneously in a single shot and can later be extracted from the multiplexed image. This is achieved by spatially modulating the images so that they are placed at different locations in the Fourier domain. Finally, a Texas Instruments digital micromirror device (DMD) based implementation of FDMI is presented and results are shown. Chapter 2 discusses the problem of image reassembly which is to restore an image back to its original form from its pieces after it has been fragmented due to different destructive reasons. We propose an efficient algorithm for 2D image fragment reassembly problem based on solving a variation of Longest Common Subsequence (LCS) problem. Our processing pipeline has three steps. First, the boundary of each fragment is extracted automatically; second, a novel boundary matching is performed by solving LCS to identify the best possible adjacency relationship among image fragment pairs; finally, a multi-piece global alignment is used to filter out incorrect pairwise matches and compose the final image. We perform experiments on complicated image fragment datasets and compare our results with existing methods to show the improved efficiency and robustness of our method. The problem of reassembling a hand-torn or machine-shredded document back to its original form is another useful version of the image reassembly problem. Reassembling a shredded document is different from reassembling an ordinary image because the geometric shape of fragments do not carry a lot of valuable information if the document has been machine-shredded rather than hand-torn. On the other hand, matching words and context can be used as an additional tool to help improve the task of reassembly. In the final chapter, document reassembly problem has been addressed through solving a graph optimization problem

    Extraction robuste de primitives géométriques 3D dans un nuage de points et alignement basé sur les primitives

    Get PDF
    Dans ce projet, nous étudions les problèmes de rétro-ingénierie et de contrôle de la qualité qui jouent un rôle important dans la fabrication industrielle. La rétro-ingénierie tente de reconstruire un modèle 3D à partir de nuages de points, qui s’apparente au problème de la reconstruction de la surface 3D. Le contrôle de la qualité est un processus dans lequel la qualité de tous les facteurs impliqués dans la production est abordée. En fait, les systèmes ci-dessus nécessitent beaucoup d’intervention de la part d’un utilisateur expérimenté, résultat souhaité est encore loin soit une automatisation complète du processus. Par conséquent, de nombreux défis doivent encore être abordés pour atteindre ce résultat hautement souhaitable en production automatisée. La première question abordée dans la thèse consiste à extraire les primitives géométriques 3D à partir de nuages de points. Un cadre complet pour extraire plusieurs types de primitives à partir de données 3D est proposé. En particulier, une nouvelle méthode de validation est proposée pour évaluer la qualité des primitives extraites. À la fin, toutes les primitives présentes dans le nuage de points sont extraites avec les points de données associés et leurs paramètres descriptifs. Ces résultats pourraient être utilisés dans diverses applications telles que la reconstruction de scènes on d’édifices, la géométrie constructive et etc. La seconde question traiée dans ce travail porte sur l’alignement de deux ensembles de données 3D à l’aide de primitives géométriques, qui sont considérées comme un nouveau descripteur robuste. L’idée d’utiliser les primitives pour l’alignement arrive à surmonter plusieurs défis rencontrés par les méthodes d’alignement existantes. Ce problème d’alignement est une étape essentielle dans la modélisation 3D, la mise en registre, la récupération de modèles. Enfin, nous proposons également une méthode automatique pour extraire les discontinutés à partir de données 3D d’objets manufacturés. En intégrant ces discontinutés au problème d’alignement, il est possible d’établir automatiquement les correspondances entre primitives en utilisant l’appariement de graphes relationnels avec attributs. Nous avons expérimenté tous les algorithmes proposés sur différents jeux de données synthétiques et réelles. Ces algorithmes ont non seulement réussi à accomplir leur tâches avec succès mais se sont aussi avérés supérieus aux méthodes proposées dans la literature. Les résultats présentés dans le thèse pourraient s’avérér utilises à plusieurs applications.In this research project, we address reverse engineering and quality control problems that play significant roles in industrial manufacturing. Reverse engineering attempts to rebuild a 3D model from the scanned data captured from a object, which is the problem similar to 3D surface reconstruction. Quality control is a process in which the quality of all factors involved in production is monitored and revised. In fact, the above systems currently require significant intervention from experienced users, and are thus still far from being fully automated. Therefore, many challenges still need to be addressed to achieve the desired performance for automated production. The first proposition of this thesis is to extract 3D geometric primitives from point clouds for reverse engineering and surface reconstruction. A complete framework to extract multiple types of primitives from 3D data is proposed. In particular, a novel validation method is also proposed to assess the quality of the extracted primitives. At the end, all primitives present in the point cloud are extracted with their associated data points and descriptive parameters. These results could be used in various applications such as scene and building reconstruction, constructive solid geometry, etc. The second proposition of the thesis is to align two 3D datasets using the extracted geometric primitives, which is introduced as a novel and robust descriptor. The idea of using primitives for alignment is addressed several challenges faced by existing registration methods. This alignment problem is an essential step in 3D modeling, registration and model retrieval. Finally, an automatic method to extract sharp features from 3D data of man-made objects is also proposed. By integrating the extracted sharp features into the alignment framework, it is possible implement automatic assignment of primitive correspondences using attribute relational graph matching. Each primitive is considered as a node of the graph and an attribute relational graph is created to provide a structural and relational description between primitives. We have experimented all the proposed algorithms on different synthetic and real scanned datasets. Our algorithms not only are successful in completing their tasks with good results but also outperform other methods. We believe that the contribution of them could be useful in many applications

    Video metadata extraction in a videoMail system

    Get PDF
    Currently the world swiftly adapts to visual communication. Online services like YouTube and Vine show that video is no longer the domain of broadcast television only. Video is used for different purposes like entertainment, information, education or communication. The rapid growth of today’s video archives with sparsely available editorial data creates a big problem of its retrieval. The humans see a video like a complex interplay of cognitive concepts. As a result there is a need to build a bridge between numeric values and semantic concepts. This establishes a connection that will facilitate videos’ retrieval by humans. The critical aspect of this bridge is video annotation. The process could be done manually or automatically. Manual annotation is very tedious, subjective and expensive. Therefore automatic annotation is being actively studied. In this thesis we focus on the multimedia content automatic annotation. Namely the use of analysis techniques for information retrieval allowing to automatically extract metadata from video in a videomail system. Furthermore the identification of text, people, actions, spaces, objects, including animals and plants. Hence it will be possible to align multimedia content with the text presented in the email message and the creation of applications for semantic video database indexing and retrieving

    Adaptive Layout for Interactive Documents

    Get PDF
    This thesis presents a novel approach to create automated layouts for rich illustrative material that could adapt according to the screen size and contextual requirements. The adaption not only considers global layout but also deals with the content and layout adaptation of individual illustrations in the layout. An unique solution has been developed that integrates constraint-based and force-directed techniques to create adaptive grid-based and non-grid layouts. A set of annotation layouts are developed which adapt the annotated illustrations to match the contextual requirements over time
    corecore