5,281 research outputs found
Semantic Visual Localization
Robust visual localization under a wide range of viewing conditions is a
fundamental problem in computer vision. Handling the difficult cases of this
problem is not only very challenging but also of high practical relevance,
e.g., in the context of life-long localization for augmented reality or
autonomous robots. In this paper, we propose a novel approach based on a joint
3D geometric and semantic understanding of the world, enabling it to succeed
under conditions where previous approaches failed. Our method leverages a novel
generative model for descriptor learning, trained on semantic scene completion
as an auxiliary task. The resulting 3D descriptors are robust to missing
observations by encoding high-level 3D geometric and semantic information.
Experiments on several challenging large-scale localization datasets
demonstrate reliable localization under extreme viewpoint, illumination, and
geometry changes
Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
The most common paradigm for vision-based multi-object tracking is
tracking-by-detection, due to the availability of reliable detectors for
several important object categories such as cars and pedestrians. However,
future mobile systems will need a capability to cope with rich human-made
environments, in which obtaining detectors for every possible object category
would be infeasible. In this paper, we propose a model-free multi-object
tracking approach that uses a category-agnostic image segmentation method to
track objects. We present an efficient segmentation mask-based tracker which
associates pixel-precise masks reported by the segmentation. Our approach can
utilize semantic information whenever it is available for classifying objects
at the track level, while retaining the capability to track generic unknown
objects in the absence of such information. We demonstrate experimentally that
our approach achieves performance comparable to state-of-the-art
tracking-by-detection methods for popular object categories such as cars and
pedestrians. Additionally, we show that the proposed method can discover and
robustly track a large variety of other objects.Comment: ICRA'18 submissio
Towards Scene Understanding with Detailed 3D Object Representations
Current approaches to semantic image and scene understanding typically employ
rather simple object representations such as 2D or 3D bounding boxes. While
such coarse models are robust and allow for reliable object detection, they
discard much of the information about objects' 3D shape and pose, and thus do
not lend themselves well to higher-level reasoning. Here, we propose to base
scene understanding on a high-resolution object representation. An object class
- in our case cars - is modeled as a deformable 3D wireframe, which enables
fine-grained modeling at the level of individual vertices and faces. We augment
that model to explicitly include vertex-level occlusion, and embed all
instances in a common coordinate frame, in order to infer and exploit
object-object interactions. Specifically, from a single view we jointly
estimate the shapes and poses of multiple objects in a common 3D frame. A
ground plane in that frame is estimated by consensus among different objects,
which significantly stabilizes monocular 3D pose estimation. The fine-grained
model, in conjunction with the explicit 3D scene model, further allows one to
infer part-level occlusions between the modeled objects, as well as occlusions
by other, unmodeled scene elements. To demonstrate the benefits of such
detailed object class models in the context of scene understanding we
systematically evaluate our approach on the challenging KITTI street scene
dataset. The experiments show that the model's ability to utilize image
evidence at the level of individual parts improves monocular 3D pose estimation
w.r.t. both location and (continuous) viewpoint.Comment: International Journal of Computer Vision (appeared online on 4
November 2014). Online version:
http://link.springer.com/article/10.1007/s11263-014-0780-
Object Localization, Segmentation, and Classification in 3D Images
We address the problem of identifying objects of interest in 3D images as a set of related tasks involving localization of objects within a scene, segmentation of observed object instances from other scene elements, classifying detected objects into semantic categories, and estimating the 3D pose of detected objects within the scene. The increasing availability of 3D sensors motivates us to leverage large amounts of 3D data to train machine learning models to address these tasks in 3D images. Leveraging recent advances in deep learning has allowed us to develop models capable of addressing these tasks and optimizing these tasks jointly to reduce potential errors propagated when solving these tasks independently
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
From surfaces to objects : Recognizing objects using surface information and object models.
This thesis describes research on recognizing partially obscured objects using
surface information like Marr's 2D sketch ([MAR82]) and surface-based geometrical
object models. The goal of the recognition process is to produce a fully
instantiated object hypotheses, with either image evidence for each feature or
explanations for their absence, in terms of self or external occlusion.
The central point of the thesis is that using surface information should be
an important part of the image understanding process. This is because surfaces
are the features that directly link perception to the objects perceived (for
normal "camera-like" sensing) and because surfaces make explicit information
needed to understand and cope with some visual problems (e.g. obscured features).
Further, because surfaces are both the data and model primitive, detailed
recognition can be made both simpler and more complete.
Recognition input is a surface image, which represents surface orientation and
absolute depth. Segmentation criteria are proposed for forming surface patches
with constant curvature character, based on surface shape discontinuities which
become labeled segmentation- boundaries.
Partially obscured object surfaces are reconstructed using stronger surface based
constraints. Surfaces are grouped to form surface clusters, which are 3D
identity-independent solids that often correspond to model primitives. These are
used here as a context within which to select models and find all object features.
True three-dimensional properties of image boundaries, surfaces and surface
clusters are directly estimated using the surface data.
Models are invoked using a network formulation, where individual nodes
represent potential identities for image structures. The links between nodes are
defined by generic and structural relationships. They define indirect evidence relationships
for an identity. Direct evidence for the identities comes from the data
properties. A plausibility computation is defined according to the constraints inherent
in the evidence types. When a node acquires sufficient plausibility, the
model is invoked for the corresponding image structure.Objects are primarily represented using a surface-based geometrical model.
Assemblies are formed from subassemblies and surface primitives, which are
defined using surface shape and boundaries. Variable affixments between assemblies
allow flexibly connected objects.
The initial object reference frame is estimated from model-data surface relationships,
using correspondences suggested by invocation. With the reference
frame, back-facing, tangential, partially self-obscured, totally self-obscured and
fully visible image features are deduced. From these, the oriented model is used
for finding evidence for missing visible model features. IT no evidence is found,
the program attempts to find evidence to justify the features obscured by an unrelated
object. Structured objects are constructed using a hierarchical synthesis
process.
Fully completed hypotheses are verified using both existence and identity
constraints based on surface evidence.
Each of these processes is defined by its computational constraints and are
demonstrated on two test images. These test scenes are interesting because they
contain partially and fully obscured object features, a variety of surface and solid
types and flexibly connected objects. All modeled objects were fully identified
and analyzed to the level represented in their models and were also acceptably
spatially located.
Portions of this work have been reported elsewhere ([FIS83], [FIS85a], [FIS85b],
[FIS86]) by the author
- …