105,127 research outputs found

    Optical Graph Recognition

    Get PDF
    Graphs are an important model for the representation of structural information between objects. One identifies objects and nodes as well as a binary relation between objects and edges. Graphs have many uses, e. g., in social sciences, life sciences and engineering. There are two primary representations: abstract and visual. The abstract representation is well suited for processing graphs by computers and is given by an adjacency list, an adjacency matrix or any abstract data structure. A visual representation is used by human users who prefer a picture. Common terms are diagram, scheme, plan, or network. The objective of Graph Drawing is to transform a graph into a visual representation called the drawing of a graph. The goal is a “nice” drawing. In this thesis we introduce Optical Graph Recognition. Optical Graph Recognition (OGR) reverses Graph Drawing and transforms a digital image of a graph into an abstract representation. Our approach consists of four phases: Preprocessing where we determine which pixels of an image are part of the graph, Segmentation where we recognize the nodes, Topology Recognition where we detect the edges and Postprocessing where we enrich the recognized graph with additional information. We apply established digital image processing methods and make use of the special property that the image contains nodes that are connected by edges. We have focused on developing algorithms that need as little parameters as possible or to automatically calibrate the parameters. Most false recognition results are caused by crossing edges as this makes tracing the edges difficult and can lead to other recognition errors. We have evaluated hand-drawn and computer-drawn graphs. Our algorithms have a very high recognition rate for computer-drawn graphs, e. g., from a set of 100000 computer-drawn graphs over 90% were correctly recognized. Most false recognition results where observed for hand-drawn graphs as they can include drawing errors and inaccuracies. For universal usability we have implemented a prototype called OGRup for mobile devices like smartphones or tablet computers. With our software it is possible to directly take a picture of a graph via a built in camera, recognize the graph, and then use the result for further processing. Furthermore, in order to gain more insight into the way a person draws a graph by hand, we have conducted a field study

    QCompere @ REPERE 2013

    No full text
    International audienceWe describe QCompere consortium submissions to the REPERE 2013 evaluation campaign. The REPERE challenge aims at gathering four communities (face recognition, speaker identification, optical character recognition and named entity detection) towards the same goal: multimodal person recognition in TV broadcast. First, four mono-modal components are introduced (one for each foregoing community) constituting the elementary building blocks of our various submissions. Then, depending on the target modality (speaker or face recognition) and on the task (supervised or unsupervised recognition), four different fusion techniques are introduced: they can be summarized as propagation-, classifier-, rule- or graph-based approaches. Finally, their performance is evaluated on REPERE 2013 test set and their advantages and limitations are discussed

    Graph Distillation for Action Detection with Privileged Modalities

    Full text link
    We propose a technique that tackles action detection in multimodal videos under a realistic and challenging condition in which only limited training data and partially observed modalities are available. Common methods in transfer learning do not take advantage of the extra modalities potentially available in the source domain. On the other hand, previous work on multimodal learning only focuses on a single domain or task and does not handle the modality discrepancy between training and testing. In this work, we propose a method termed graph distillation that incorporates rich privileged information from a large-scale multimodal dataset in the source domain, and improves the learning in the target domain where training data and modalities are scarce. We evaluate our approach on action classification and detection tasks in multimodal videos, and show that our model outperforms the state-of-the-art by a large margin on the NTU RGB+D and PKU-MMD benchmarks. The code is released at http://alan.vision/eccv18_graph/.Comment: ECCV 201

    A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units

    Full text link
    We address the design of a unified multilingual system for handwriting recognition. Most of multi- lingual systems rests on specialized models that are trained on a single language and one of them is selected at test time. While some recognition systems are based on a unified optical model, dealing with a unified language model remains a major issue, as traditional language models are generally trained on corpora composed of large word lexicons per language. Here, we bring a solution by con- sidering language models based on sub-lexical units, called multigrams. Dealing with multigrams strongly reduces the lexicon size and thus decreases the language model complexity. This makes pos- sible the design of an end-to-end unified multilingual recognition system where both a single optical model and a single language model are trained on all the languages. We discuss the impact of the language unification on each model and show that our system reaches state-of-the-art methods perfor- mance with a strong reduction of the complexity.Comment: preprin

    Semi-Supervised First-Person Activity Recognition in Body-Worn Video

    Get PDF
    Body-worn cameras are now commonly used for logging daily life, sports, and law enforcement activities, creating a large volume of archived footage. This paper studies the problem of classifying frames of footage according to the activity of the camera-wearer with an emphasis on application to real-world police body-worn video. Real-world datasets pose a different set of challenges from existing egocentric vision datasets: the amount of footage of different activities is unbalanced, the data contains personally identifiable information, and in practice it is difficult to provide substantial training footage for a supervised approach. We address these challenges by extracting features based exclusively on motion information then segmenting the video footage using a semi-supervised classification algorithm. On publicly available datasets, our method achieves results comparable to, if not better than, supervised and/or deep learning methods using a fraction of the training data. It also shows promising results on real-world police body-worn video
    • 

    corecore