Search CORE

565 research outputs found

Hard to Cheat: A Turing Test based on Answering Questions about Images

Author: Fritz Mario
Malinowski Mateusz
Publication venue
Publication date: 01/01/2015
Field of study

Progress in language and image understanding by machines has sparkled the interest of the research community in more open-ended, holistic tasks, and refueled an old AI dream of building intelligent machines. We discuss a few prominent challenges that characterize such holistic tasks and argue for "question answering about images" as a particular appealing instance of such a holistic task. In particular, we point out that it is a version of a Turing Test that is likely to be more robust to over-interpretations and contrast it with tasks like grounding and generation of descriptions. Finally, we discuss tools to measure progress in this field.Comment: Presented in AAAI-15 Workshop: Beyond the Turing Tes

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Learning Multi-Scale Representations for Material Classification

Author: Fritz Mario
Li Wenbin
Publication venue
Publication date: 01/01/2014
Field of study

The recent progress in sparse coding and deep learning has made unsupervised feature learning methods a strong competitor to hand-crafted descriptors. In computer vision, success stories of learned features have been predominantly reported for object recognition tasks. In this paper, we investigate if and how feature learning can be used for material recognition. We propose two strategies to incorporate scale information into the learning procedure resulting in a novel multi-scale coding procedure. Our results show that our learned features for material recognition outperform hand-crafted descriptors on the FMD and the KTH-TIPS2 material classification benchmarks

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

See the Difference: Direct Pre-Image Reconstruction and Pose Estimation by Differentiating HOG

Author: Chiu Wei-Chen
Fritz Mario
Publication venue
Publication date: 01/01/2015
Field of study

The Histogram of Oriented Gradient (HOG) descriptor has led to many advances in computer vision over the last decade and is still part of many state of the art approaches. We realize that the associated feature computation is piecewise differentiable and therefore many pipelines which build on HOG can be made differentiable. This lends to advanced introspection as well as opportunities for end-to-end optimization. We present our implementation of

\nabla

HOG based on the auto-differentiation toolbox Chumpy and show applications to pre-image visualization and pose estimation which extends the existing differentiable renderer OpenDR pipeline. Both applications improve on the respective state-of-the-art HOG approaches

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

GazeDPM: Early Integration of Gaze Information in Deformable Part Models

Author: Bulling Andreas
Fritz Mario
Shcherbatyi Iaroslav
Publication venue
Publication date: 01/01/2015
Field of study

An increasing number of works explore collaborative human-computer systems in which human gaze is used to enhance computer vision systems. For object detection these efforts were so far restricted to late integration approaches that have inherent limitations, such as increased precision without increase in recall. We propose an early integration approach in a deformable part model, which constitutes a joint formulation over gaze and visual data. We show that our GazeDPM method improves over the state-of-the-art DPM baseline by 4% and a recent method for gaze-supported object detection by 3% on the public POET dataset. Our approach additionally provides introspection of the learnt models, can reveal salient image structures, and allows us to investigate the interplay between gaze attracting and repelling areas, the importance of view-specific models, as well as viewers' personal biases in gaze patterns. We finally study important practical aspects of our approach, such as the impact of using saliency maps instead of real fixations, the impact of the number of fixations, as well as robustness to gaze estimation error

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Spatio-Temporal Image Boundary Extrapolation

Author: Bhattacharyya Apratim
Fritz Mario
Malinowski Mateusz
Publication venue
Publication date: 01/01/2016
Field of study

Boundary prediction in images as well as video has been a very active topic of research and organizing visual information into boundaries and segments is believed to be a corner stone of visual perception. While prior work has focused on predicting boundaries for observed frames, our work aims at predicting boundaries of future unobserved frames. This requires our model to learn about the fate of boundaries and extrapolate motion patterns. We experiment on established real-world video segmentation dataset, which provides a testbed for this new task. We show for the first time spatio-temporal boundary extrapolation in this challenging scenario. Furthermore, we show long-term prediction of boundaries in situations where the motion is governed by the laws of physics. We successfully predict boundaries in a billiard scenario without any assumptions of a strong parametric model or any object notion. We argue that our model has with minimalistic model assumptions derived a notion of 'intuitive physics' that can be applied to novel scenes

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

Growth rates for persistently excited linear systems

Author: Chitour Yacine
Colonius Fritz
Sigalotti Mario
Publication venue
Publication date: 07/10/2013
Field of study

We consider a family of linear control systems

\dot{x}=Ax+\alpha Bu

where

\alpha

belongs to a given class of persistently exciting signals. We seek maximal

\alpha

-uniform stabilisation and destabilisation by means of linear feedbacks

u=Kx

. We extend previous results obtained for bidimensional single-input linear control systems to the general case as follows: if the pair

(A,B)

verifies a certain Lie bracket generating condition, then the maximal rate of convergence of

(A,B)

is equal to the maximal rate of divergence of

(-A,-B)

. We also provide more precise results in the general single-input case, where the above result is obtained under the sole assumption of controllability of the pair

(A,B)

arXiv.org e-Print Archive

HAL-CentraleSupelec

OPUS Augsburg

Crossref

INRIA a CCSD electronic archive server

HAL-Polytechnique

Not Using the Car to See the Sidewalk: Quantifying and Controlling the Effects of Context in Classification and Segmentation

Author: Fritz Mario
Schiele Bernt
Shetty Rakshith
Publication venue
Publication date: 17/12/2018
Field of study

Importance of visual context in scene understanding tasks is well recognized in the computer vision community. However, to what extent the computer vision models for image classification and semantic segmentation are dependent on the context to make their predictions is unclear. A model overly relying on context will fail when encountering objects in context distributions different from training data and hence it is important to identify these dependencies before we can deploy the models in the real-world. We propose a method to quantify the sensitivity of black-box vision models to visual context by editing images to remove selected objects and measuring the response of the target models. We apply this methodology on two tasks, image classification and semantic segmentation, and discover undesirable dependency between objects and context, for example that "sidewalk" segmentation relies heavily on "cars" being present in the image. We propose an object removal based data augmentation solution to mitigate this dependency and increase the robustness of classification and segmentation models to contextual variations. Our experiments show that the proposed data augmentation helps these models improve the performance in out-of-context scenarios, while preserving the performance on regular data.Comment: 14 pages (12 figures

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

MPG.PuRe

Ask Your Neurons: A Neural-based Approach to Answering Questions about Images

Author: Fritz Mario
Malinowski Mateusz
Rohrbach Marcus
Publication venue
Publication date: 01/01/2015
Field of study

We address a question answering task on real-world images that is set up as a Visual Turing Test. By combining latest advances in image representation and natural language processing, we propose Neural-Image-QA, an end-to-end formulation to this problem for which all parts are trained jointly. In contrast to previous efforts, we are facing a multi-modal problem where the language output (answer) is conditioned on visual and natural language input (image and question). Our approach Neural-Image-QA doubles the performance of the previous best approach on this problem. We provide additional insights into the problem by analyzing how much information is contained only in the language part for which we provide a new human baseline. To study human consensus, which is related to the ambiguities inherent in this challenging task, we propose two novel metrics and collect additional answers which extends the original DAQUAR dataset to DAQUAR-Consensus.Comment: ICCV'15 (Oral

arXiv.org e-Print Archive

Crossref

CISPA – Helmholtz-Zentrum für Informationssicherheit

Long-Term On-Board Prediction of People in Traffic Scenes under Uncertainty

Author: Bhattacharyya Apratim
Fritz Mario
Schiele Bernt
Publication venue
Publication date: 01/01/2017
Field of study

Progress towards advanced systems for assisted and autonomous driving is leveraging recent advances in recognition and segmentation methods. Yet, we are still facing challenges in bringing reliable driving to inner cities, as those are composed of highly dynamic scenes observed from a moving platform at considerable speeds. Anticipation becomes a key element in order to react timely and prevent accidents. In this paper we argue that it is necessary to predict at least 1 second and we thus propose a new model that jointly predicts ego motion and people trajectories over such large time horizons. We pay particular attention to modeling the uncertainty of our estimates arising from the non-deterministic nature of natural traffic scenes. Our experimental results show that it is indeed possible to predict people trajectories at the desired time horizons and that our uncertainty estimates are informative of the prediction error. We also show that both sequence modeling of trajectories as well as our novel method of long term odometry prediction are essential for best performance.Comment: CVPR 201

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

MPG.PuRe