9,265 research outputs found
Visual Affordance and Function Understanding: A Survey
Nowadays, robots are dominating the manufacturing, entertainment and
healthcare industries. Robot vision aims to equip robots with the ability to
discover information, understand it and interact with the environment. These
capabilities require an agent to effectively understand object affordances and
functionalities in complex visual domains. In this literature survey, we first
focus on Visual affordances and summarize the state of the art as well as open
problems and research gaps. Specifically, we discuss sub-problems such as
affordance detection, categorization, segmentation and high-level reasoning.
Furthermore, we cover functional scene understanding and the prevalent
functional descriptors used in the literature. The survey also provides
necessary background to the problem, sheds light on its significance and
highlights the existing challenges for affordance and functionality learning.Comment: 26 pages, 22 image
A Survey on Content-Aware Video Analysis for Sports
Sports data analysis is becoming increasingly large-scale, diversified, and
shared, but difficulty persists in rapidly accessing the most crucial
information. Previous surveys have focused on the methodologies of sports video
analysis from the spatiotemporal viewpoint instead of a content-based
viewpoint, and few of these studies have considered semantics. This study
develops a deeper interpretation of content-aware sports video analysis by
examining the insight offered by research into the structure of content under
different scenarios. On the basis of this insight, we provide an overview of
the themes particularly relevant to the research on content-aware systems for
broadcast sports. Specifically, we focus on the video content analysis
techniques applied in sportscasts over the past decade from the perspectives of
fundamentals and general review, a content hierarchical model, and trends and
challenges. Content-aware analysis methods are discussed with respect to
object-, event-, and context-oriented groups. In each group, the gap between
sensation and content excitement must be bridged using proper strategies. In
this regard, a content-aware approach is required to determine user demands.
Finally, the paper summarizes the future trends and challenges for sports video
analysis. We believe that our findings can advance the field of research on
content-aware video analysis for broadcast sports.Comment: Accepted for publication in IEEE Transactions on Circuits and Systems
for Video Technology (TCSVT
Improving Information Extraction from Images with Learned Semantic Models
Many applications require an understanding of an image that goes beyond the
simple detection and classification of its objects. In particular, a great deal
of semantic information is carried in the relationships between objects. We
have previously shown that the combination of a visual model and a statistical
semantic prior model can improve on the task of mapping images to their
associated scene description. In this paper, we review the model and compare it
to a novel conditional multi-way model for visual relationship detection, which
does not include an explicitly trained visual prior model. We also discuss
potential relationships between the proposed methods and memory models of the
human brain
Computational models of attention
This chapter reviews recent computational models of visual attention. We
begin with models for the bottom-up or stimulus-driven guidance of attention to
salient visual items, which we examine in seven different broad categories. We
then examine more complex models which address the top-down or goal-oriented
guidance of attention towards items that are more relevant to the task at hand
Rapid Probabilistic Interest Learning from Domain-Specific Pairwise Image Comparisons
A great deal of work aims to discover large general purpose models of image
interest or memorability for visual search and information retrieval. This
paper argues that image interest is often domain and user specific, and that
efficient mechanisms for learning about this domain-specific image interest as
quickly as possible, while limiting the amount of data-labelling required, are
often more useful to end-users. This work uses pairwise image comparisons to
reduce the labelling burden on these users, and introduces an image interest
estimation approach that performs similarly to recent data hungry deep learning
approaches trained using pairwise ranking losses. Here, we use a Gaussian
process model to interpolate image interest inferred using a Bayesian ranking
approach over image features extracted using a pre-trained convolutional neural
network. Results show that fitting a Gaussian process in high-dimensional image
feature space is not only computationally feasible, but also effective across a
broad range of domains. The proposed probabilistic interest estimation approach
produces image interests paired with uncertainties that can be used to identify
images for which additional labelling is required and measure inference
convergence, allowing for sample efficient active model training. Importantly,
the probabilistic formulation allows for effective visual search and
information retrieval when limited labelling data is available
A Bag of Words Approach for Semantic Segmentation of Monitored Scenes
This paper proposes a semantic segmentation method for outdoor scenes
captured by a surveillance camera. Our algorithm classifies each perceptually
homogenous region as one of the predefined classes learned from a collection of
manually labelled images. The proposed approach combines two different types of
information. First, color segmentation is performed to divide the scene into
perceptually similar regions. Then, the second step is based on SIFT keypoints
and uses the bag of words representation of the regions for the classification.
The prediction is done using a Na\"ive Bayesian Network as a generative
classifier. Compared to existing techniques, our method provides more compact
representations of scene contents and the segmentation result is more
consistent with human perception due to the combination of the color
information with the image keypoints. The experiments conducted on a publicly
available data set demonstrate the validity of the proposed method.Comment: \'Ecole Polytechnique de Montr\'eal, iWatchLife In
Computational models: Bottom-up and top-down aspects
Computational models of visual attention have become popular over the past
decade, we believe primarily for two reasons: First, models make testable
predictions that can be explored by experimentalists as well as theoreticians,
second, models have practical and technological applications of interest to the
applied science and engineering communities. In this chapter, we take a
critical look at recent attention modeling efforts. We focus on {\em
computational models of attention} as defined by Tsotsos \& Rothenstein
\shortcite{Tsotsos_Rothenstein11}: Models which can process any visual stimulus
(typically, an image or video clip), which can possibly also be given some task
definition, and which make predictions that can be compared to human or animal
behavioral or physiological responses elicited by the same stimulus and task.
Thus, we here place less emphasis on abstract models, phenomenological models,
purely data-driven fitting or extrapolation models, or models specifically
designed for a single task or for a restricted class of stimuli. For
theoretical models, we refer the reader to a number of previous reviews that
address attention theories and models more generally
\cite{Itti_Koch01nrn,Paletta_etal05,Frintrop_etal10,Rothenstein_Tsotsos08,Gottlieb_Balan10,Toet11,Borji_Itti12pami}
Recent Advances in Zero-shot Recognition
With the recent renaissance of deep convolution neural networks, encouraging
breakthroughs have been achieved on the supervised recognition tasks, where
each class has sufficient training data and fully annotated training data.
However, to scale the recognition to a large number of classes with few or now
training samples for each class remains an unsolved problem. One approach to
scaling up the recognition is to develop models capable of recognizing unseen
categories without any training instances, or zero-shot recognition/ learning.
This article provides a comprehensive review of existing zero-shot recognition
techniques covering various aspects ranging from representations of models, and
from datasets and evaluation settings. We also overview related recognition
tasks including one-shot and open set recognition which can be used as natural
extensions of zero-shot recognition when limited number of class samples become
available or when zero-shot recognition is implemented in a real-world setting.
Importantly, we highlight the limitations of existing approaches and point out
future research directions in this existing new research area.Comment: accepted by IEEE Signal Processing Magazin
Modeling and Inferring Human Intents and Latent Functional Objects for Trajectory Prediction
This paper is about detecting functional objects and inferring human
intentions in surveillance videos of public spaces. People in the videos are
expected to intentionally take shortest paths toward functional objects subject
to obstacles, where people can satisfy certain needs (e.g., a vending machine
can quench thirst), by following one of three possible intent behaviors: reach
a single functional object and stop, or sequentially visit several functional
objects, or initially start moving toward one goal but then change the intent
to move toward another. Since detecting functional objects in low-resolution
surveillance videos is typically unreliable, we call them "dark matter"
characterized by the functionality to attract people. We formulate the
Agent-based Lagrangian Mechanics wherein human trajectories are
probabilistically modeled as motions of agents in many layers of "dark-energy"
fields, where each agent can select a particular force field to affect its
motions, and thus define the minimum-energy Dijkstra path toward the
corresponding source "dark matter". For evaluation, we compiled and annotated a
new dataset. The results demonstrate our effectiveness in predicting human
intent behaviors and trajectories, and localizing functional objects, as well
as discovering distinct functional classes of objects by clustering human
motion behavior in the vicinity of functional objects
Vulnerable road user detection: state-of-the-art and open challenges
Correctly identifying vulnerable road users (VRUs), e.g. cyclists and
pedestrians, remains one of the most challenging environment perception tasks
for autonomous vehicles (AVs). This work surveys the current state-of-the-art
in VRU detection, covering topics such as benchmarks and datasets, object
detection techniques and relevant machine learning algorithms. The article
concludes with a discussion of remaining open challenges and promising future
research directions for this domain
- …