1,536 research outputs found
Estimating the information gap between textual and visual representations
Photos, drawings, figures, etc. supplement textual information in
various kinds of media, for example, in web news or scientific pub- lications. In this respect, the
intended effect of an image can be quite different, e.g., providing additional information,
focusing on certain details of surrounding text, or simply being a general il- lustration of a
topic. As a consequence, the semantic correlation between information of different modalities can
vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise
way. The variety of possible interrelations of textual and graphical information and the question,
how they can be de- scribed and automatically estimated have not been addressed yet by previous
work. In this paper, we present several contributions to close this gap. First, we introduce two
measures to describe cross- modal interrelations: cross-modal mutual information (CMI) and semantic
correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI
and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an
appropriate deep neural network model for the demanding task. The system has been evaluated on a
challenging test set and the experimental results demonstrate the feasibility of the approach
The Cityscapes Dataset for Semantic Urban Scene Understanding
Visual understanding of complex urban street scenes is an enabling factor for
a wide range of applications. Object detection has benefited enormously from
large-scale datasets, especially in the context of deep learning. For semantic
urban scene understanding, however, no current dataset adequately captures the
complexity of real-world urban scenes.
To address this, we introduce Cityscapes, a benchmark suite and large-scale
dataset to train and test approaches for pixel-level and instance-level
semantic labeling. Cityscapes is comprised of a large, diverse set of stereo
video sequences recorded in streets from 50 different cities. 5000 of these
images have high quality pixel-level annotations; 20000 additional images have
coarse annotations to enable methods that leverage large volumes of
weakly-labeled data. Crucially, our effort exceeds previous attempts in terms
of dataset size, annotation richness, scene variability, and complexity. Our
accompanying empirical study provides an in-depth analysis of the dataset
characteristics, as well as a performance evaluation of several
state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia
Joint stage recognition and anatomical annotation of drosophila gene expression patterns
Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms
Multimodal news article analysis
The intersection of Computer Vision and Natural Language Processing has been a hot topic of research in recent years, with results that were unthinkable only a few years ago. In view of this progress, we want to highlight online news articles as a potential next step for this area of research. The rich interrelations of text, tags, images or videos, as well as a vast corpus of general knowledge are an exciting benchmark for high-capacity models such as the deep neural networks. In this paper we present a series of tasks and baseline approaches to leverage corpus such as the BreakingNews dataset.Peer ReviewedPostprint (author's final draft
OLT: A Toolkit for Object Labeling Applied to Robotic RGB-D Datasets
In this work we present the Object Labeling Toolkit
(OLT), a set of software components publicly available for
helping in the management and labeling of sequential RGB-D
observations collected by a mobile robot. Such a robot can be
equipped with an arbitrary number of RGB-D devices, possibly
integrating other sensors (e.g. odometry, 2D laser scanners,
etc.). OLT first merges the robot observations to generate a
3D reconstruction of the scene from which object segmentation
and labeling is conveniently accomplished. The annotated labels
are automatically propagated by the toolkit to each RGB-D
observation in the collected sequence, providing a dense labeling
of both intensity and depth images. The resulting objects’ labels
can be exploited for many robotic oriented applications, including
high-level decision making, semantic mapping, or contextual
object recognition. Software components within OLT are highly
customizable and expandable, facilitating the integration of
already-developed algorithms. To illustrate the toolkit suitability,
we describe its application to robotic RGB-D sequences taken in
a home environment.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Spanish grant pro-
gram FPU-MICINN 2010 and the Spanish projects TAROTH:
New developments toward a Robot at Home (DPI2011-25483)
and PROMOVE: Advances in mobile robotics for promoting
independent life of elders (DPI2014-55826-R
Attribute-Graph: A Graph based approach to Image Ranking
We propose a novel image representation, termed Attribute-Graph, to rank
images by their semantic similarity to a given query image. An Attribute-Graph
is an undirected fully connected graph, incorporating both local and global
image characteristics. The graph nodes characterise objects as well as the
overall scene context using mid-level semantic attributes, while the edges
capture the object topology. We demonstrate the effectiveness of
Attribute-Graphs by applying them to the problem of image ranking. We benchmark
the performance of our algorithm on the 'rPascal' and 'rImageNet' datasets,
which we have created in order to evaluate the ranking performance on complex
queries containing multiple objects. Our experimental evaluation shows that
modelling images as Attribute-Graphs results in improved ranking performance
over existing techniques.Comment: In IEEE International Conference on Computer Vision (ICCV) 201
How Best to Hunt a Mammoth - Toward Automated Knowledge Extraction From Graphical Research Models
In the Information Systems (IS) discipline, central contributions of research projects are often represented in graphical research models, clearly illustrating constructs and their relationships. Although thousands of such representations exist, methods for extracting this source of knowledge are still in an early stage. We present a method for (1) extracting graphical research models from articles, (2) generating synthetic training data for (3) performing object detection with a neural network, and (4) a graph reconstruction algorithm to (5) storing results into a designated research model format. We trained YOLOv7 on 20,000 generated diagrams and evaluated its performance on 100 manually reconstructed diagrams from the Senior Scholars\u27 Basket. The results for extracting graphical research models show a F1-score of 0.82 for nodes, 0.72 for links, and an accuracy of 0.72 for labels, indicating the applicability for supporting the population of knowledge repositories contributing to knowledge synthesi
LASS: a simple assignment model with Laplacian smoothing
We consider the problem of learning soft assignments of items to
categories given two sources of information: an item-category similarity
matrix, which encourages items to be assigned to categories they are similar to
(and to not be assigned to categories they are dissimilar to), and an item-item
similarity matrix, which encourages similar items to have similar assignments.
We propose a simple quadratic programming model that captures this intuition.
We give necessary conditions for its solution to be unique, define an
out-of-sample mapping, and derive a simple, effective training algorithm based
on the alternating direction method of multipliers. The model predicts
reasonable assignments from even a few similarity values, and can be seen as a
generalization of semisupervised learning. It is particularly useful when items
naturally belong to multiple categories, as for example when annotating
documents with keywords or pictures with tags, with partially tagged items, or
when the categories have complex interrelations (e.g. hierarchical) that are
unknown.Comment: 20 pages, 4 figures. A shorter version appears in AAAI 201
- …