24 research outputs found

    Content Recognition and Context Modeling for Document Analysis and Retrieval

    Get PDF
    The nature and scope of available documents are changing significantly in many areas of document analysis and retrieval as complex, heterogeneous collections become accessible to virtually everyone via the web. The increasing level of diversity presents a great challenge for document image content categorization, indexing, and retrieval. Meanwhile, the processing of documents with unconstrained layouts and complex formatting often requires effective leveraging of broad contextual knowledge. In this dissertation, we first present a novel approach for document image content categorization, using a lexicon of shape features. Each lexical word corresponds to a scale and rotation invariant local shape feature that is generic enough to be detected repeatably and is segmentation free. A concise, structurally indexed shape lexicon is learned by clustering and partitioning feature types through graph cuts. Our idea finds successful application in several challenging tasks, including content recognition of diverse web images and language identification on documents composed of mixed machine printed text and handwriting. Second, we address two fundamental problems in signature-based document image retrieval. Facing continually increasing volumes of documents, detecting and recognizing unique, evidentiary visual entities (\eg, signatures and logos) provides a practical and reliable supplement to the OCR recognition of printed text. We propose a novel multi-scale framework to detect and segment signatures jointly from document images, based on the structural saliency under a signature production model. We formulate the problem of signature retrieval in the unconstrained setting of geometry-invariant deformable shape matching and demonstrate state-of-the-art performance in signature matching and verification. Third, we present a model-based approach for extracting relevant named entities from unstructured documents. In a wide range of applications that require structured information from diverse, unstructured document images, processing OCR text does not give satisfactory results due to the absence of linguistic context. Our approach enables learning of inference rules collectively based on contextual information from both page layout and text features. Finally, we demonstrate the importance of mining general web user behavior data for improving document ranking and other web search experience. The context of web user activities reveals their preferences and intents, and we emphasize the analysis of individual user sessions for creating aggregate models. We introduce a novel algorithm for estimating web page and web site importance, and discuss its theoretical foundation based on an intentional surfer model. We demonstrate that our approach significantly improves large-scale document retrieval performance

    AutoGraff: towards a computational understanding of graffiti writing and related art forms.

    Get PDF
    The aim of this thesis is to develop a system that generates letters and pictures with a style that is immediately recognizable as graffiti art or calligraphy. The proposed system can be used similarly to, and in tight integration with, conventional computer-aided geometric design tools and can be used to generate synthetic graffiti content for urban environments in games and in movies, and to guide robotic or fabrication systems that can materialise the output of the system with physical drawing media. The thesis is divided into two main parts. The first part describes a set of stroke primitives, building blocks that can be combined to generate different designs that resemble graffiti or calligraphy. These primitives mimic the process typically used to design graffiti letters and exploit well known principles of motor control to model the way in which an artist moves when incrementally tracing stylised letter forms. The second part demonstrates how these stroke primitives can be automatically recovered from input geometry defined in vector form, such as the digitised traces of writing made by a user, or the glyph outlines in a font. This procedure converts the input geometry into a seed that can be transformed into a variety of calligraphic and graffiti stylisations, which depend on parametric variations of the strokes

    Drawing, Handwriting Processing Analysis: New Advances and Challenges

    No full text
    International audienceDrawing and handwriting are communicational skills that are fundamental in geopolitical, ideological and technological evolutions of all time. drawingand handwriting are still useful in defining innovative applications in numerous fields. In this regard, researchers have to solve new problems like those related to the manner in which drawing and handwriting become an efficient way to command various connected objects; or to validate graphomotor skills as evident and objective sources of data useful in the study of human beings, their capabilities and their limits from birth to decline

    Incorporation of relational information in feature representation for online handwriting recognition of Arabic characters

    Get PDF
    Interest in online handwriting recognition is increasing due to market demand for both improved performance and for extended supporting scripts for digital devices. Robust handwriting recognition of complex patterns of arbitrary scale, orientation and location is elusive to date because reaching a target recognition rate is not trivial for most of the applications in this field. Cursive scripts such as Arabic and Persian with complex character shapes make the recognition task even more difficult. Challenges in the discrimination capability of handwriting recognition systems depend heavily on the effectiveness of the features used to represent the data, the types of classifiers deployed and inclusive databases used for learning and recognition which cover variations in writing styles that introduce natural deformations in character shapes. This thesis aims to improve the efficiency of online recognition systems for Persian and Arabic characters by presenting new formal feature representations, algorithms, and a comprehensive database for online Arabic characters. The thesis contains the development of the first public collection of online handwritten data for the Arabic complete-shape character set. New ideas for incorporating relational information in a feature representation for this type of data are presented. The proposed techniques are computationally efficient and provide compact, yet representative, feature vectors. For the first time, a hybrid classifier is used for recognition of online Arabic complete-shape characters based on the idea of decomposing the input data into variables representing factors of the complete-shape characters and the combined use of the Bayesian network inference and support vector machines. We advocate the usefulness and practicality of the features and recognition methods with respect to the recognition of conventional metrics, such as accuracy and timeliness, as well as unconventional metrics. In particular, we evaluate a feature representation for different character class instances by its level of separation in the feature space. Our evaluation results for the available databases and for our own database of the characters' main shapes confirm a higher efficiency than previously reported techniques with respect to all metrics analyzed. For the complete-shape characters, our techniques resulted in a unique recognition efficiency comparable with the state-of-the-art results for main shape characters

    Sketch interpretation using multiscale stochastic models of temporal patterns

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.Includes bibliographical references (p. 102-114).Sketching is a natural mode of interaction used in a variety of settings. For example, people sketch during early design and brainstorming sessions to guide the thought process; when we communicate certain ideas, we use sketching as an additional modality to convey ideas that can not be put in words. The emergence of hardware such as PDAs and Tablet PCs has enabled capturing freehand sketches, enabling the routine use of sketching as an additional human-computer interaction modality. But despite the availability of pen based information capture hardware, relatively little effort has been put into developing software capable of understanding and reasoning about sketches. To date, most approaches to sketch recognition have treated sketches as images (i.e., static finished products) and have applied vision algorithms for recognition. However, unlike images, sketches are produced incrementally and interactively, one stroke at a time and their processing should take advantage of this. This thesis explores ways of doing sketch recognition by extracting as much information as possible from temporal patterns that appear during sketching.(cont.) We present a sketch recognition framework based on hierarchical statistical models of temporal patterns. We show that in certain domains, stroke orderings used in the course of drawing individual objects contain temporal patterns that can aid recognition. We build on this work to show how sketch recognition systems can use knowledge of both common stroke orderings and common object orderings. We describe a statistical framework based on Dynamic Bayesian Networks that can learn temporal models of object-level and stroke-level patterns for recognition. Our framework supports multi-object strokes, multi-stroke objects, and allows interspersed drawing of objects - relaxing the assumption that objects are drawn one at a time. Our system also supports real-valued feature representations using a numerically stable recognition algorithm. We present recognition results for hand-drawn electronic circuit diagrams. The results show that modeling temporal patterns at multiple scales provides a significant increase in correct recognition rates, with no added computational penalties.by Tevfik Metin Sezgin.Ph.D

    Graphonomics and your Brain on Art, Creativity and Innovation : Proceedings of the 19th International Graphonomics Conference (IGS 2019 – Your Brain on Art)

    Get PDF
    [Italiano]: “Grafonomia e cervello su arte, creatività e innovazione”. Un forum internazionale per discutere sui recenti progressi nell'interazione tra arti creative, neuroscienze, ingegneria, comunicazione, tecnologia, industria, istruzione, design, applicazioni forensi e mediche. I contributi hanno esaminato lo stato dell'arte, identificando sfide e opportunità, e hanno delineato le possibili linee di sviluppo di questo settore di ricerca. I temi affrontati includono: strategie integrate per la comprensione dei sistemi neurali, affettivi e cognitivi in ambienti realistici e complessi; individualità e differenziazione dal punto di vista neurale e comportamentale; neuroaesthetics (uso delle neuroscienze per spiegare e comprendere le esperienze estetiche a livello neurologico); creatività e innovazione; neuro-ingegneria e arte ispirata dal cervello, creatività e uso di dispositivi di mobile brain-body imaging (MoBI) indossabili; terapia basata su arte creativa; apprendimento informale; formazione; applicazioni forensi. / [English]: “Graphonomics and your brain on art, creativity and innovation”. A single track, international forum for discussion on recent advances at the intersection of the creative arts, neuroscience, engineering, media, technology, industry, education, design, forensics, and medicine. The contributions reviewed the state of the art, identified challenges and opportunities and created a roadmap for the field of graphonomics and your brain on art. The topics addressed include: integrative strategies for understanding neural, affective and cognitive systems in realistic, complex environments; neural and behavioral individuality and variation; neuroaesthetics (the use of neuroscience to explain and understand the aesthetic experiences at the neurological level); creativity and innovation; neuroengineering and brain-inspired art, creative concepts and wearable mobile brain-body imaging (MoBI) designs; creative art therapy; informal learning; education; forensics

    Effective 3D Geometric Matching for Data Restoration and Its Forensic Application

    Get PDF
    3D geometric matching is the technique to detect the similar patterns among multiple objects. It is an important and fundamental problem and can facilitate many tasks in computer graphics and vision, including shape comparison and retrieval, data fusion, scene understanding and object recognition, and data restoration. For example, 3D scans of an object from different angles are matched and stitched together to form the complete geometry. In medical image analysis, the motion of deforming organs is modeled and predicted by matching a series of CT images. This problem is challenging and remains unsolved, especially when the similar patterns are 1) small and lack geometric saliency; 2) incomplete due to the occlusion of the scanning and damage of the data. We study the reliable matching algorithm that can tackle the above difficulties and its application in data restoration. Data restoration is the problem to restore the fragmented or damaged model to its original complete state. It is a new area and has direct applications in many scientific fields such as Forensics and Archeology. In this dissertation, we study novel effective geometric matching algorithms, including curve matching, surface matching, pairwise matching, multi-piece matching and template matching. We demonstrate its applications in an integrated digital pipeline of skull reassembly, skull completion, and facial reconstruction, which is developed to facilitate the state-of-the-art forensic skull/facial reconstruction processing pipeline in law enforcement

    Gaze-Based Human-Robot Interaction by the Brunswick Model

    Get PDF
    We present a new paradigm for human-robot interaction based on social signal processing, and in particular on the Brunswick model. Originally, the Brunswick model copes with face-to-face dyadic interaction, assuming that the interactants are communicating through a continuous exchange of non verbal social signals, in addition to the spoken messages. Social signals have to be interpreted, thanks to a proper recognition phase that considers visual and audio information. The Brunswick model allows to quantitatively evaluate the quality of the interaction using statistical tools which measure how effective is the recognition phase. In this paper we cast this theory when one of the interactants is a robot; in this case, the recognition phase performed by the robot and the human have to be revised w.r.t. the original model. The model is applied to Berrick, a recent open-source low-cost robotic head platform, where the gazing is the social signal to be considered

    Advances in Image Processing, Analysis and Recognition Technology

    Get PDF
    For many decades, researchers have been trying to make computers’ analysis of images as effective as the system of human vision is. For this purpose, many algorithms and systems have previously been created. The whole process covers various stages, including image processing, representation and recognition. The results of this work can be applied to many computer-assisted areas of everyday life. They improve particular activities and provide handy tools, which are sometimes only for entertainment, but quite often, they significantly increase our safety. In fact, the practical implementation of image processing algorithms is particularly wide. Moreover, the rapid growth of computational complexity and computer efficiency has allowed for the development of more sophisticated and effective algorithms and tools. Although significant progress has been made so far, many issues still remain, resulting in the need for the development of novel approaches
    corecore