3,025 research outputs found
Disambiguating Multi–Modal Scene Representations Using Perceptual Grouping Constraints
In its early stages, the visual system suffers from a lot of ambiguity and noise that severely limits the performance of early vision algorithms. This article presents feedback mechanisms between early visual processes, such as perceptual grouping, stereopsis and depth reconstruction, that allow the system to reduce this ambiguity and improve early representation of visual information. In the first part, the article proposes a local perceptual grouping algorithm that — in addition to commonly used geometric information — makes use of a novel multi–modal measure between local edge/line features. The grouping information is then used to: 1) disambiguate stereopsis by enforcing that stereo matches preserve groups; and 2) correct the reconstruction error due to the image pixel sampling using a linear interpolation over the groups. The integration of mutual feedback between early vision processes is shown to reduce considerably ambiguity and noise without the need for global constraints
Children, Humanoid Robots and Caregivers
This paper presents developmental learning on a humanoid robot from human-robot interactions. We consider in particular teaching humanoids as children during the child's Separation and Individuation developmental phase (Mahler, 1979). Cognitive development during this phase is characterized both by the child's dependence on her mother for learning while becoming awareness of her own individuality, and by self-exploration of her physical surroundings. We propose a learning framework for a humanoid robot inspired on such cognitive development
Deep Learning for Free-Hand Sketch: A Survey
Free-hand sketches are highly illustrative, and have been widely used by
humans to depict objects or stories from ancient times to the present. The
recent prevalence of touchscreen devices has made sketch creation a much easier
task than ever and consequently made sketch-oriented applications increasingly
popular. The progress of deep learning has immensely benefited free-hand sketch
research and applications. This paper presents a comprehensive survey of the
deep learning techniques oriented at free-hand sketch data, and the
applications that they enable. The main contents of this survey include: (i) A
discussion of the intrinsic traits and unique challenges of free-hand sketch,
to highlight the essential differences between sketch data and other data
modalities, e.g., natural photos. (ii) A review of the developments of
free-hand sketch research in the deep learning era, by surveying existing
datasets, research topics, and the state-of-the-art methods through a detailed
taxonomy and experimental evaluation. (iii) Promotion of future work via a
discussion of bottlenecks, open problems, and potential research directions for
the community.Comment: This paper is accepted by IEEE TPAM
Cross-Paced Representation Learning with Partial Curricula for Sketch-based Image Retrieval
In this paper we address the problem of learning robust cross-domain
representations for sketch-based image retrieval (SBIR). While most SBIR
approaches focus on extracting low- and mid-level descriptors for direct
feature matching, recent works have shown the benefit of learning coupled
feature representations to describe data from two related sources. However,
cross-domain representation learning methods are typically cast into non-convex
minimization problems that are difficult to optimize, leading to unsatisfactory
performance. Inspired by self-paced learning, a learning methodology designed
to overcome convergence issues related to local optima by exploiting the
samples in a meaningful order (i.e. easy to hard), we introduce the cross-paced
partial curriculum learning (CPPCL) framework. Compared with existing
self-paced learning methods which only consider a single modality and cannot
deal with prior knowledge, CPPCL is specifically designed to assess the
learning pace by jointly handling data from dual sources and modality-specific
prior information provided in the form of partial curricula. Additionally,
thanks to the learned dictionaries, we demonstrate that the proposed CPPCL
embeds robust coupled representations for SBIR. Our approach is extensively
evaluated on four publicly available datasets (i.e. CUFS, Flickr15K, QueenMary
SBIR and TU-Berlin Extension datasets), showing superior performance over
competing SBIR methods
Learning to Generate and Refine Object Proposals
Visual object recognition is a fundamental and challenging
problem in computer vision. To build a practical recognition
system, one is first confronted with high computation complexity
due to an enormous search space from an image, which is caused by
large variations in object appearance, pose and mutual occlusion,
as well as other environmental factors. To reduce the search
complexity, a moderate set of image regions that are likely to
contain an object, regardless of its category, are usually first
generated in modern object recognition subsystems. These possible
object regions are called object proposals, object hypotheses or
object candidates, which can be used for down-stream
classification or global reasoning in many different vision tasks
like object detection, segmentation and tracking, etc.
This thesis addresses the problem of object proposal generation,
including bounding box and segment proposal generation, in
real-world scenarios. In particular, we investigate the
representation learning in object proposal generation with 3D
cues and contextual information, aiming to propose higher-quality
object candidates which have higher object recall, better
boundary coverage and lower number. We focus on three main
issues: 1) how can we incorporate additional geometric and
high-level semantic context information into the proposal
generation for stereo images? 2) how do we generate object
segment proposals for stereo images with learning representations
and learning grouping process? and 3) how can we learn a
context-driven representation to refine segment proposals
efficiently?
In this thesis, we propose a series of solutions to address each
of the raised problems. We first propose a semantic context and
depth-aware object proposal generation method. We design a set of
new cues to encode the objectness, and then train an efficient
random forest classifier to re-rank the initial proposals and
linear regressors to fine-tune their locations. Next, we extend
the task to the segment proposal generation in the same setting
and develop a learning-based segment proposal generation method
for stereo images. Our method makes use of learned deep features
and designed geometric features to represent a region and learns
a similarity network to guide the superpixel grouping process. We
also learn a ranking network to predict the objectness score for
each segment proposal. To address the third problem, we take a
transformation-based approach to improve the quality of a given
segment candidate pool based on context information. We propose
an efficient deep network that learns affine transformations to
warp an initial object mask towards nearby object region, based
on a novel feature pooling strategy. Finally, we extend our
affine warping approach to address the object-mask alignment
problem and particularly the problem of refining a set of segment
proposals. We design an end-to-end deep spatial transformer
network that learns free-form deformations (FFDs) to non-rigidly
warp the shape mask towards the ground truth, based on a
multi-level dual mask feature pooling strategy. We evaluate all
our approaches on several publicly available object recognition
datasets and show superior performance
Anomaly Detection in Autonomous Driving: A Survey
Nowadays, there are outstanding strides towards a future with autonomous
vehicles on our roads. While the perception of autonomous vehicles performs
well under closed-set conditions, they still struggle to handle the unexpected.
This survey provides an extensive overview of anomaly detection techniques
based on camera, lidar, radar, multimodal and abstract object level data. We
provide a systematization including detection approach, corner case level,
ability for an online application, and further attributes. We outline the
state-of-the-art and point out current research gaps.Comment: Daniel Bogdoll and Maximilian Nitsche contributed equally. Accepted
for publication at CVPR 2022 WAD worksho
Data-Driven Shape Analysis and Processing
Data-driven methods play an increasingly important role in discovering
geometric, structural, and semantic relationships between 3D shapes in
collections, and applying this analysis to support intelligent modeling,
editing, and visualization of geometric data. In contrast to traditional
approaches, a key feature of data-driven approaches is that they aggregate
information from a collection of shapes to improve the analysis and processing
of individual shapes. In addition, they are able to learn models that reason
about properties and relationships of shapes without relying on hard-coded
rules or explicitly programmed instructions. We provide an overview of the main
concepts and components of these techniques, and discuss their application to
shape classification, segmentation, matching, reconstruction, modeling and
exploration, as well as scene analysis and synthesis, through reviewing the
literature and relating the existing works with both qualitative and numerical
comparisons. We conclude our report with ideas that can inspire future research
in data-driven shape analysis and processing.Comment: 10 pages, 19 figure
- …