2,389 research outputs found
Deep Edge-Aware Saliency Detection
There has been profound progress in visual saliency thanks to the deep
learning architectures, however, there still exist three major challenges that
hinder the detection performance for scenes with complex compositions, multiple
salient objects, and salient objects of diverse scales. In particular, output
maps of the existing methods remain low in spatial resolution causing blurred
edges due to the stride and pooling operations, networks often neglect
descriptive statistical and handcrafted priors that have potential to
complement saliency detection results, and deep features at different layers
stay mainly desolate waiting to be effectively fused to handle multi-scale
salient objects. In this paper, we tackle these issues by a new fully
convolutional neural network that jointly learns salient edges and saliency
labels in an end-to-end fashion. Our framework first employs convolutional
layers that reformulate the detection task as a dense labeling problem, then
integrates handcrafted saliency features in a hierarchical manner into lower
and higher levels of the deep network to leverage available information for
multi-scale response, and finally refines the saliency map through dilated
convolutions by imposing context. In this way, the salient edge priors are
efficiently incorporated and the output resolution is significantly improved
while keeping the memory requirements low, leading to cleaner and sharper
object boundaries. Extensive experimental analyses on ten benchmarks
demonstrate that our framework achieves consistently superior performance and
attains robustness for complex scenes in comparison to the very recent
state-of-the-art approaches.Comment: 13 pages, 11 figure
Salient Object Detection with Semantic Priors
Salient object detection has increasingly become a popular topic in cognitive
and computational sciences, including computer vision and artificial
intelligence research. In this paper, we propose integrating \textit{semantic
priors} into the salient object detection process. Our algorithm consists of
three basic steps. Firstly, the explicit saliency map is obtained based on the
semantic segmentation refined by the explicit saliency priors learned from the
data. Next, the implicit saliency map is computed based on a trained model
which maps the implicit saliency priors embedded into regional features with
the saliency values. Finally, the explicit semantic map and the implicit map
are adaptively fused to form a pixel-accurate saliency map which uniformly
covers the objects of interest. We further evaluate the proposed framework on
two challenging datasets, namely, ECSSD and HKUIS. The extensive experimental
results demonstrate that our method outperforms other state-of-the-art methods.Comment: accepted to IJCAI 201
LCNN: Low-level Feature Embedded CNN for Salient Object Detection
In this paper, we propose a novel deep neural network framework embedded with
low-level features (LCNN) for salient object detection in complex images. We
utilise the advantage of convolutional neural networks to automatically learn
the high-level features that capture the structured information and semantic
context in the image. In order to better adapt a CNN model into the saliency
task, we redesign the network architecture based on the small-scale datasets.
Several low-level features are extracted, which can effectively capture
contrast and spatial information in the salient regions, and incorporated to
compensate with the learned high-level features at the output of the last fully
connected layer. The concatenated feature vector is further fed into a
hinge-loss SVM detector in a joint discriminative learning manner and the final
saliency score of each region within the bounding box is obtained by the linear
combination of the detector's weights. Experiments on three challenging
benchmark (MSRA-5000, PASCAL-S, ECCSD) demonstrate our algorithm to be
effective and superior than most low-level oriented state-of-the-arts in terms
of P-R curves, F-measure and mean absolute errors
Bootstrapping Robotic Ecological Perception from a Limited Set of Hypotheses Through Interactive Perception
To solve its task, a robot needs to have the ability to interpret its
perceptions. In vision, this interpretation is particularly difficult and
relies on the understanding of the structure of the scene, at least to the
extent of its task and sensorimotor abilities. A robot with the ability to
build and adapt this interpretation process according to its own tasks and
capabilities would push away the limits of what robots can achieve in a non
controlled environment. A solution is to provide the robot with processes to
build such representations that are not specific to an environment or a
situation. A lot of works focus on objects segmentation, recognition and
manipulation. Defining an object solely on the basis of its visual appearance
is challenging given the wide range of possible objects and environments.
Therefore, current works make simplifying assumptions about the structure of a
scene. Such assumptions reduce the adaptivity of the object extraction process
to the environments in which the assumption holds. To limit such assumptions,
we introduce an exploration method aimed at identifying moveable elements in a
scene without considering the concept of object. By using the interactive
perception framework, we aim at bootstrapping the acquisition process of a
representation of the environment with a minimum of context specific
assumptions. The robotic system builds a perceptual map called relevance map
which indicates the moveable parts of the current scene. A classifier is
trained online to predict the category of each region (moveable or
non-moveable). It is also used to select a region with which to interact, with
the goal of minimizing the uncertainty of the classification. A specific
classifier is introduced to fit these needs: the collaborative mixture models
classifier. The method is tested on a set of scenarios of increasing
complexity, using both simulations and a PR2 robot.Comment: 21 pages, 21 figure
Likelihood-based Parameter Estimation and Comparison of Dynamical Cognitive Models
Dynamical models of cognition play an increasingly important role in driving
theoretical and experimental research in psychology. Therefore, parameter
estimation, model analysis and comparison of dynamical models are of essential
importance. Here we propose a maximum-likelihood approach for model analysis in
a fully dynamical framework that includes time-ordered experimental data. Our
methods can be applied to dynamical models for the prediction of discrete
behavior (e.g., movement onsets), in particular, we use a dynamical model of
saccade generation in scene viewing as a case study for our approach. For this
model, the likelihood function can be computed directly by numerical
simulation, which enables more efficient parameter estimation including
Bayesian inference to obtain reliable estimates and corresponding credible
intervals. Using hierarchical models inference is even possible for individual
observers. Furthermore, our likelihood approach can be used to compare
different models. In our example, the dynamical framework is shown to
outperform non-dynamical statistical models. Additionally, the likelihood based
evaluation differentiates model variants, which produced indistinguishable
predictions on hitherto used statistics. Our results indicate that the
likelihood approach is a promising framework for dynamical cognitive models.Comment: 29 pages, 10 figures, to appear in Psychological Review as a
theoretical not
Joint Reasoning for Multi-Faceted Commonsense Knowledge
Commonsense knowledge (CSK) supports a variety of AI applications, from
visual understanding to chatbots. Prior works on acquiring CSK, such as
ConceptNet, have compiled statements that associate concepts, like everyday
objects or activities, with properties that hold for most or some instances of
the concept. Each concept is treated in isolation from other concepts, and the
only quantitative measure (or ranking) of properties is a confidence score that
the statement is valid. This paper aims to overcome these limitations by
introducing a multi-faceted model of CSK statements and methods for joint
reasoning over sets of inter-related statements. Our model captures four
different dimensions of CSK statements: plausibility, typicality, remarkability
and salience, with scoring and ranking along each dimension. For example,
hyenas drinking water is typical but not salient, whereas hyenas eating
carcasses is salient. For reasoning and ranking, we develop a method with soft
constraints, to couple the inference over concepts that are related in in a
taxonomic hierarchy. The reasoning is cast into an integer linear programming
(ILP), and we leverage the theory of reduction costs of a relaxed LP to compute
informative rankings. This methodology is applied to several large CSK
collections. Our evaluation shows that we can consolidate these inputs into
much cleaner and more expressive knowledge. Results are available at
https://dice.mpi-inf.mpg.de.Comment: 11 page
A Classifier-guided Approach for Top-down Salient Object Detection
We propose a framework for top-down salient object detection that
incorporates a tightly coupled image classification module. The classifier is
trained on novel category-aware sparse codes computed on object dictionaries
used for saliency modeling. A misclassification indicates that the
corresponding saliency model is inaccurate. Hence, the classifier selects
images for which the saliency models need to be updated. The category-aware
sparse coding produces better image classification accuracy as compared to
conventional sparse coding with a reduced computational complexity. A
saliency-weighted max-pooling is proposed to improve image classification,
which is further used to refine the saliency maps. Experimental results on
Graz-02 and PASCAL VOC-07 datasets demonstrate the effectiveness of salient
object detection. Although the role of the classifier is to support salient
object detection, we evaluate its performance in image classification and also
illustrate the utility of thresholded saliency maps for image segmentation.Comment: To appear in Signal Processing: Image Communication, Elsevier.
Available online from April 201
Global and Local Sensitivity Guided Key Salient Object Re-augmentation for Video Saliency Detection
The existing still-static deep learning based saliency researches do not
consider the weighting and highlighting of extracted features from different
layers, all features contribute equally to the final saliency decision-making.
Such methods always evenly detect all "potentially significant regions" and
unable to highlight the key salient object, resulting in detection failure of
dynamic scenes. In this paper, based on the fact that salient areas in videos
are relatively small and concentrated, we propose a \textbf{key salient object
re-augmentation method (KSORA) using top-down semantic knowledge and bottom-up
feature guidance} to improve detection accuracy in video scenes. KSORA includes
two sub-modules (WFE and KOS): WFE processes local salient feature selection
using bottom-up strategy, while KOS ranks each object in global fashion by
top-down statistical knowledge, and chooses the most critical object area for
local enhancement. The proposed KSORA can not only strengthen the saliency
value of the local key salient object but also ensure global saliency
consistency. Results on three benchmark datasets suggest that our model has the
capability of improving the detection accuracy on complex scenes. The
significant performance of KSORA, with a speed of 17FPS on modern GPUs, has
been verified by comparisons with other ten state-of-the-art algorithms.Comment: 6 figures, 10 page
Saliency detection by aggregating complementary background template with optimization framework
This paper proposes an unsupervised bottom-up saliency detection approach by
aggregating complementary background template with refinement. Feature vectors
are extracted from each superpixel to cover regional color, contrast and
texture information. By using these features, a coarse detection for salient
region is realized based on background template achieved by different
combinations of boundary regions instead of only treating four boundaries as
background. Then, by ranking the relevance of the image nodes with foreground
cues extracted from the former saliency map, we obtain an improved result.
Finally, smoothing operation is utilized to refine the foreground-based
saliency map to improve the contrast between salient and non-salient regions
until a close to binary saliency map is reached. Experimental results show that
the proposed algorithm generates more accurate saliency maps and performs
favorably against the state-off-the-art saliency detection methods on four
publicly available datasets.Comment: 28 pages,10 figure
Computational models: Bottom-up and top-down aspects
Computational models of visual attention have become popular over the past
decade, we believe primarily for two reasons: First, models make testable
predictions that can be explored by experimentalists as well as theoreticians,
second, models have practical and technological applications of interest to the
applied science and engineering communities. In this chapter, we take a
critical look at recent attention modeling efforts. We focus on {\em
computational models of attention} as defined by Tsotsos \& Rothenstein
\shortcite{Tsotsos_Rothenstein11}: Models which can process any visual stimulus
(typically, an image or video clip), which can possibly also be given some task
definition, and which make predictions that can be compared to human or animal
behavioral or physiological responses elicited by the same stimulus and task.
Thus, we here place less emphasis on abstract models, phenomenological models,
purely data-driven fitting or extrapolation models, or models specifically
designed for a single task or for a restricted class of stimuli. For
theoretical models, we refer the reader to a number of previous reviews that
address attention theories and models more generally
\cite{Itti_Koch01nrn,Paletta_etal05,Frintrop_etal10,Rothenstein_Tsotsos08,Gottlieb_Balan10,Toet11,Borji_Itti12pami}
- …