Search CORE

1,536 research outputs found

Estimating the information gap between textual and visual representations

Author: Ewerth Ralph
Henning Christian
Publication venue: New York City : Association for Computing Machinery
Publication date: 01/01/2017
Field of study

Photos, drawings, figures, etc. supplement textual information in various kinds of media, for example, in web news or scientific pub- lications. In this respect, the intended effect of an image can be quite different, e.g., providing additional information, focusing on certain details of surrounding text, or simply being a general il- lustration of a topic. As a consequence, the semantic correlation between information of different modalities can vary noticeably, too. Moreover, cross-modal interrelations are often hard to describe in a precise way. The variety of possible interrelations of textual and graphical information and the question, how they can be de- scribed and automatically estimated have not been addressed yet by previous work. In this paper, we present several contributions to close this gap. First, we introduce two measures to describe cross- modal interrelations: cross-modal mutual information (CMI) and semantic correlation (SC). Second, a novel approach relying on deep learning is suggested to estimate CMI and SC of textual and visual information. Third, three diverse datasets are leveraged to learn an appropriate deep neural network model for the demanding task. The system has been evaluated on a challenging test set and the experimental results demonstrate the feasibility of the approach

Repositorium für Naturwissenschaften und Technik

The Cityscapes Dataset for Semantic Urban Scene Understanding

Author: Benenson Rodrigo
Cordts Marius
Enzweiler Markus
Franke Uwe
Omran Mohamed
Ramos Sebastian
Rehfeld Timo
Roth Stefan
Schiele Bernt
Publication venue
Publication date: 01/01/2016
Field of study

Visual understanding of complex urban street scenes is an enabling factor for a wide range of applications. Object detection has benefited enormously from large-scale datasets, especially in the context of deep learning. For semantic urban scene understanding, however, no current dataset adequately captures the complexity of real-world urban scenes. To address this, we introduce Cityscapes, a benchmark suite and large-scale dataset to train and test approaches for pixel-level and instance-level semantic labeling. Cityscapes is comprised of a large, diverse set of stereo video sequences recorded in streets from 50 different cities. 5000 of these images have high quality pixel-level annotations; 20000 additional images have coarse annotations to enable methods that leverage large volumes of weakly-labeled data. Crucially, our effort exceeds previous attempts in terms of dataset size, annotation richness, scene variability, and complexity. Our accompanying empirical study provides an in-depth analysis of the dataset characteristics, as well as a performance evaluation of several state-of-the-art approaches based on our benchmark.Comment: Includes supplemental materia

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Joint stage recognition and anatomical annotation of drosophila gene expression patterns

Author: C. Ding
Fowlkes
Grumbling
H. Huang
H. Wang
Ji
Lecuyer
Luengo Hendriks
Lyne
Megason
Tomancak
Weigmann
X. Cai
Publication venue: Oxford University Press
Publication date
Field of study

Motivation: Staining the mRNA of a gene via in situ hybridization (ISH) during the development of a Drosophila melanogaster embryo delivers the detailed spatio-temporal patterns of the gene expression. Many related biological problems such as the detection of co-expressed genes, co-regulated genes and transcription factor binding motifs rely heavily on the analysis of these image patterns. To provide the text-based pattern searching for facilitating related biological studies, the images in the Berkeley Drosophila Genome Project (BDGP) study are annotated with developmental stage term and anatomical ontology terms manually by domain experts. Due to the rapid increase in the number of such images and the inevitable bias annotations by human curators, it is necessary to develop an automatic method to recognize the developmental stage and annotate anatomical terms

Crossref

PubMed Central

Multimodal news article analysis

Author: Ramisa Ayats Arnau
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 01/01/2017
Field of study

The intersection of Computer Vision and Natural Language Processing has been a hot topic of research in recent years, with results that were unthinkable only a few years ago. In view of this progress, we want to highlight online news articles as a potential next step for this area of research. The rich interrelations of text, tags, images or videos, as well as a vast corpus of general knowledge are an exciting benchmark for high-capacity models such as the deep neural networks. In this paper we present a series of tasks and baseline approaches to leverage corpus such as the BreakingNews dataset.Peer ReviewedPostprint (author's final draft

UPCommons. Portal del coneixement obert de la UPC

OLT: A Toolkit for Object Labeling Applied to Robotic RGB-D Datasets

Author: Galindo Cipriano
Gonzalez-Jimenez Antonio Javier
Ruiz-Sarmiento Jose-Raul
Publication venue
Publication date: 07/09/2015
Field of study

In this work we present the Object Labeling Toolkit (OLT), a set of software components publicly available for helping in the management and labeling of sequential RGB-D observations collected by a mobile robot. Such a robot can be equipped with an arbitrary number of RGB-D devices, possibly integrating other sensors (e.g. odometry, 2D laser scanners, etc.). OLT first merges the robot observations to generate a 3D reconstruction of the scene from which object segmentation and labeling is conveniently accomplished. The annotated labels are automatically propagated by the toolkit to each RGB-D observation in the collected sequence, providing a dense labeling of both intensity and depth images. The resulting objects’ labels can be exploited for many robotic oriented applications, including high-level decision making, semantic mapping, or contextual object recognition. Software components within OLT are highly customizable and expandable, facilitating the integration of already-developed algorithms. To illustrate the toolkit suitability, we describe its application to robotic RGB-D sequences taken in a home environment.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Spanish grant pro- gram FPU-MICINN 2010 and the Spanish projects TAROTH: New developments toward a Robot at Home (DPI2011-25483) and PROMOVE: Advances in mobile robotics for promoting independent life of elders (DPI2014-55826-R

Repositorio Institucional Universidad de Málaga

Attribute-Graph: A Graph based approach to Image Ranking

Author: Babu R. Venkatesh
Prabhu Nikita
Publication venue
Publication date: 08/10/2015
Field of study

We propose a novel image representation, termed Attribute-Graph, to rank images by their semantic similarity to a given query image. An Attribute-Graph is an undirected fully connected graph, incorporating both local and global image characteristics. The graph nodes characterise objects as well as the overall scene context using mid-level semantic attributes, while the edges capture the object topology. We demonstrate the effectiveness of Attribute-Graphs by applying them to the problem of image ranking. We benchmark the performance of our algorithm on the 'rPascal' and 'rImageNet' datasets, which we have created in order to evaluate the ranking performance on complex queries containing multiple objects. Our experimental evaluation shows that modelling images as Attribute-Graphs results in improved ranking performance over existing techniques.Comment: In IEEE International Conference on Computer Vision (ICCV) 201

arXiv.org e-Print Archive

Crossref

Open Access Repository of IISc Research Publications

How Best to Hunt a Mammoth - Toward Automated Knowledge Extraction From Graphical Research Models

Author: Campos Chiny Joshua
Dinter Barbara
Huettemann Sebastian
Larsen Kai R.
Mueller Roland M.
Publication venue: AIS Electronic Library (AISeL)
Publication date: 09/10/2023
Field of study

In the Information Systems (IS) discipline, central contributions of research projects are often represented in graphical research models, clearly illustrating constructs and their relationships. Although thousands of such representations exist, methods for extracting this source of knowledge are still in an early stage. We present a method for (1) extracting graphical research models from articles, (2) generating synthetic training data for (3) performing object detection with a neural network, and (4) a graph reconstruction algorithm to (5) storing results into a designated research model format. We trained YOLOv7 on 20,000 generated diagrams and evaluated its performance on 100 manually reconstructed diagrams from the Senior Scholars\u27 Basket. The results for extracting graphical research models show a F1-score of 0.82 for nodes, 0.72 for links, and an accuracy of 0.72 for labels, indicating the applicability for supporting the population of knowledge repositories contributing to knowledge synthesi

AIS Electronic Library (AISeL)

LASS: a simple assignment model with Laplacian smoothing

Author: Carreira-Perpiñán Miguel Á.
Wang Weiran
Publication venue
Publication date: 23/05/2014
Field of study

We consider the problem of learning soft assignments of

N

items to

K

categories given two sources of information: an item-category similarity matrix, which encourages items to be assigned to categories they are similar to (and to not be assigned to categories they are dissimilar to), and an item-item similarity matrix, which encourages similar items to have similar assignments. We propose a simple quadratic programming model that captures this intuition. We give necessary conditions for its solution to be unique, define an out-of-sample mapping, and derive a simple, effective training algorithm based on the alternating direction method of multipliers. The model predicts reasonable assignments from even a few similarity values, and can be seen as a generalization of semisupervised learning. It is particularly useful when items naturally belong to multiple categories, as for example when annotating documents with keywords or pictures with tags, with partially tagged items, or when the categories have complex interrelations (e.g. hierarchical) that are unknown.Comment: 20 pages, 4 figures. A shorter version appears in AAAI 201

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications