105,127 research outputs found
Optical Graph Recognition
Graphs are an important model for the representation of structural information between objects. One identifies objects and nodes as well as a binary relation between objects and edges. Graphs have many uses, e. g., in social sciences, life sciences and engineering. There are two primary representations: abstract and visual. The abstract representation is well suited for processing graphs by computers and is given by an adjacency list, an adjacency matrix or any abstract data structure. A visual representation is used by human users who prefer a picture. Common terms are diagram, scheme, plan, or network. The objective of Graph Drawing is to transform a graph into a visual representation called the drawing of a graph. The goal is a âniceâ drawing.
In this thesis we introduce Optical Graph Recognition. Optical Graph Recognition (OGR) reverses Graph Drawing and transforms a digital image of a graph into an abstract representation. Our approach consists of four phases: Preprocessing where we determine which pixels of an image are part of the graph, Segmentation where we recognize the nodes, Topology Recognition where we detect the edges and Postprocessing where we enrich the recognized graph with additional information. We apply established digital image processing methods and make use of the special property that the image contains nodes that are connected by edges. We have focused on developing algorithms that need as little parameters as possible or to automatically calibrate the parameters. Most false recognition results are caused by crossing edges as this makes tracing the edges difficult and can lead to other recognition errors.
We have evaluated hand-drawn and computer-drawn graphs. Our algorithms have a very high recognition rate for computer-drawn graphs, e. g., from a set of 100000 computer-drawn graphs over 90% were correctly recognized. Most false recognition results where observed for hand-drawn graphs as they can include drawing errors and inaccuracies. For universal usability we have implemented a prototype called OGRup for mobile devices like smartphones or tablet computers. With our software it is possible to directly take a picture of a graph via a built in camera, recognize the graph, and then use the result for further processing. Furthermore, in order to gain more insight into the way a person draws a graph by hand, we have conducted a field study
QCompere @ REPERE 2013
International audienceWe describe QCompere consortium submissions to the REPERE 2013 evaluation campaign. The REPERE challenge aims at gathering four communities (face recognition, speaker identification, optical character recognition and named entity detection) towards the same goal: multimodal person recognition in TV broadcast. First, four mono-modal components are introduced (one for each foregoing community) constituting the elementary building blocks of our various submissions. Then, depending on the target modality (speaker or face recognition) and on the task (supervised or unsupervised recognition), four different fusion techniques are introduced: they can be summarized as propagation-, classifier-, rule- or graph-based approaches. Finally, their performance is evaluated on REPERE 2013 test set and their advantages and limitations are discussed
Graph Distillation for Action Detection with Privileged Modalities
We propose a technique that tackles action detection in multimodal videos
under a realistic and challenging condition in which only limited training data
and partially observed modalities are available. Common methods in transfer
learning do not take advantage of the extra modalities potentially available in
the source domain. On the other hand, previous work on multimodal learning only
focuses on a single domain or task and does not handle the modality discrepancy
between training and testing. In this work, we propose a method termed graph
distillation that incorporates rich privileged information from a large-scale
multimodal dataset in the source domain, and improves the learning in the
target domain where training data and modalities are scarce. We evaluate our
approach on action classification and detection tasks in multimodal videos, and
show that our model outperforms the state-of-the-art by a large margin on the
NTU RGB+D and PKU-MMD benchmarks. The code is released at
http://alan.vision/eccv18_graph/.Comment: ECCV 201
A Unified Multilingual Handwriting Recognition System using multigrams sub-lexical units
We address the design of a unified multilingual system for handwriting
recognition. Most of multi- lingual systems rests on specialized models that
are trained on a single language and one of them is selected at test time.
While some recognition systems are based on a unified optical model, dealing
with a unified language model remains a major issue, as traditional language
models are generally trained on corpora composed of large word lexicons per
language. Here, we bring a solution by con- sidering language models based on
sub-lexical units, called multigrams. Dealing with multigrams strongly reduces
the lexicon size and thus decreases the language model complexity. This makes
pos- sible the design of an end-to-end unified multilingual recognition system
where both a single optical model and a single language model are trained on
all the languages. We discuss the impact of the language unification on each
model and show that our system reaches state-of-the-art methods perfor- mance
with a strong reduction of the complexity.Comment: preprin
Semi-Supervised First-Person Activity Recognition in Body-Worn Video
Body-worn cameras are now commonly used for logging daily life, sports, and
law enforcement activities, creating a large volume of archived footage. This
paper studies the problem of classifying frames of footage according to the
activity of the camera-wearer with an emphasis on application to real-world
police body-worn video. Real-world datasets pose a different set of challenges
from existing egocentric vision datasets: the amount of footage of different
activities is unbalanced, the data contains personally identifiable
information, and in practice it is difficult to provide substantial training
footage for a supervised approach. We address these challenges by extracting
features based exclusively on motion information then segmenting the video
footage using a semi-supervised classification algorithm. On publicly available
datasets, our method achieves results comparable to, if not better than,
supervised and/or deep learning methods using a fraction of the training data.
It also shows promising results on real-world police body-worn video
- âŠ