4,436 research outputs found
Grounding semantics in robots for Visual Question Answering
In this thesis I describe an operational implementation of an object detection and description system that incorporates in an end-to-end Visual Question Answering system and evaluated it on two visual question answering datasets for compositional language and elementary visual reasoning
Interval-valued and intuitionistic fuzzy mathematical morphologies as special cases of L-fuzzy mathematical morphology
Mathematical morphology (MM) offers a wide range of tools for image processing and computer vision. MM was originally conceived for the processing of binary images and later extended to gray-scale morphology. Extensions of classical binary morphology to gray-scale morphology include approaches based on fuzzy set theory that give rise to fuzzy mathematical morphology (FMM). From a mathematical point of view, FMM relies on the fact that the class of all fuzzy sets over a certain universe forms a complete lattice. Recall that complete lattices provide for the most general framework in which MM can be conducted.
The concept of L-fuzzy set generalizes not only the concept of fuzzy set but also the concepts of interval-valued fuzzy set and Atanassov’s intuitionistic fuzzy set. In addition, the class of L-fuzzy sets forms a complete lattice whenever the underlying set L constitutes a complete lattice. Based on these observations, we develop a general approach towards L-fuzzy mathematical morphology in this paper. Our focus is in particular on the construction of connectives for interval-valued and intuitionistic fuzzy mathematical morphologies that arise as special, isomorphic cases of L-fuzzy MM. As an application of these ideas, we generate a combination of some well-known medical image reconstruction techniques in terms of interval-valued fuzzy image processing
Point-wise mutual information-based video segmentation with high temporal consistency
In this paper, we tackle the problem of temporally consistent boundary
detection and hierarchical segmentation in videos. While finding the best
high-level reasoning of region assignments in videos is the focus of much
recent research, temporal consistency in boundary detection has so far only
rarely been tackled. We argue that temporally consistent boundaries are a key
component to temporally consistent region assignment. The proposed method is
based on the point-wise mutual information (PMI) of spatio-temporal voxels.
Temporal consistency is established by an evaluation of PMI-based point
affinities in the spectral domain over space and time. Thus, the proposed
method is independent of any optical flow computation or previously learned
motion models. The proposed low-level video segmentation method outperforms the
learning-based state of the art in terms of standard region metrics
Superpixels: An Evaluation of the State-of-the-Art
Superpixels group perceptually similar pixels to create visually meaningful
entities while heavily reducing the number of primitives for subsequent
processing steps. As of these properties, superpixel algorithms have received
much attention since their naming in 2003. By today, publicly available
superpixel algorithms have turned into standard tools in low-level vision. As
such, and due to their quick adoption in a wide range of applications,
appropriate benchmarks are crucial for algorithm selection and comparison.
Until now, the rapidly growing number of algorithms as well as varying
experimental setups hindered the development of a unifying benchmark. We
present a comprehensive evaluation of 28 state-of-the-art superpixel algorithms
utilizing a benchmark focussing on fair comparison and designed to provide new
insights relevant for applications. To this end, we explicitly discuss
parameter optimization and the importance of strictly enforcing connectivity.
Furthermore, by extending well-known metrics, we are able to summarize
algorithm performance independent of the number of generated superpixels,
thereby overcoming a major limitation of available benchmarks. Furthermore, we
discuss runtime, robustness against noise, blur and affine transformations,
implementation details as well as aspects of visual quality. Finally, we
present an overall ranking of superpixel algorithms which redefines the
state-of-the-art and enables researchers to easily select appropriate
algorithms and the corresponding implementations which themselves are made
publicly available as part of our benchmark at
davidstutz.de/projects/superpixel-benchmark/
Predicting Future Instance Segmentation by Forecasting Convolutional Features
Anticipating future events is an important prerequisite towards intelligent
behavior. Video forecasting has been studied as a proxy task towards this goal.
Recent work has shown that to predict semantic segmentation of future frames,
forecasting at the semantic level is more effective than forecasting RGB frames
and then segmenting these. In this paper we consider the more challenging
problem of future instance segmentation, which additionally segments out
individual objects. To deal with a varying number of output labels per image,
we develop a predictive model in the space of fixed-sized convolutional
features of the Mask R-CNN instance segmentation model. We apply the "detection
head'" of Mask R-CNN on the predicted features to produce the instance
segmentation of future frames. Experiments show that this approach
significantly improves over strong baselines based on optical flow and
repurposed instance segmentation architectures
- …