386,889 research outputs found
Model-Based Three-Dimensional Object Recognition and Localization Using Properties of Surface Curvatures.
The ability to recognize three-dimensional (3-D) objects accurately from range images is a fundamental goal of vision in robotics. This facility is important in automated manufacturing environments in industry. In contrast to the extensive work done in computer-aided design and manufacturing (CAD/CAM), the robotic process is primitive and ad hoc. This thesis defines and investigates a fundamental problem in robot vision systems: recognizing and localizing multiple free-form 3-D objects in range images. An effective and efficient approach is developed and implemented as a system Free-form Object Recognition and Localization (FORL). The technique used for surface characterization is surface curvatures derived from geometric models of objects. It uniquely defines surface shapes in conjunction with a knowledge representation scheme which is used in the search for corresponding surfaces of an objects. Model representation has a significant effect on model-based recognition. Without using surface properties, many important industrial vision tasks would remain beyond the competence of machine vision. Knowledge about model surface shapes is automatically abstracted from CAD models, and the CAD models are also used directly in the vision process. The knowledge representation scheme eases the processes of acquisition, retrieval, modification and reasoning so that the recognition and localization process is effective and efficient. Our approach is to recognize objects by hypothesizing and locating objects. The knowledge about the object surface shapes is used to infer the hypotheses and the CAD models are used to locate the objects. Therefore, localization becomes a by-product of the recognition process, which is significant since localization of an object is necessary in robotic applications. One of the most important problems in 3-D machine vision is the recognition of objects from their partial view due to occlusion. Our approach is surface-based, thus, sensitive to neither noise nor occlusion. For the same reason, surface-based recognition also makes the multiple object recognition easier. Our approach uses appropriate strategies for recognition and localization of 3-D solids by using the information from the CAD database, which makes the integration of robot vision systems with CAD/CAM systems a promising future
Learning the Semantics of Manipulation Action
In this paper we present a formal computational framework for modeling
manipulation actions. The introduced formalism leads to semantics of
manipulation action and has applications to both observing and understanding
human manipulation actions as well as executing them with a robotic mechanism
(e.g. a humanoid robot). It is based on a Combinatory Categorial Grammar. The
goal of the introduced framework is to: (1) represent manipulation actions with
both syntax and semantic parts, where the semantic part employs
-calculus; (2) enable a probabilistic semantic parsing schema to learn
the -calculus representation of manipulation action from an annotated
action corpus of videos; (3) use (1) and (2) to develop a system that visually
observes manipulation actions and understands their meaning while it can reason
beyond observations using propositional logic and axiom schemata. The
experiments conducted on a public available large manipulation action dataset
validate the theoretical framework and our implementation
Competition and selection during visual processing of natural scenes and objects
When a visual scene, containing many discrete objects, is presented to our retinae, only a subset of these objects will be explicitly represented in visual awareness. The number of objects accessing short-term visual memory might be even smaller. Finally, it is not known to what extent “ignored” objects (those that do not enter visual awareness) will be processed –or recognized. By combining free recall, forced-choice recognition and visual priming paradigms for the same natural visual scenes and subjects, we were able to estimate these numbers, and provide insights as to the fate of objects that are not explicitly recognized in a single fixation. When presented for 250 ms with a scene containing 10 distinct objects, human observers can remember up to 4 objects with full confidence, and between 2 and 3 more when forced to guess. Importantly, the objects that the subjects consistently failed to report elicited a significant negative priming effect when presented in a subsequent task, suggesting that their identity was represented in high-level cortical areas of the visual system, before the corresponding neural activity was suppressed during attentional selection. These results shed light on neural mechanisms of attentional competition, and representational capacity at different levels of the human visual system
Multi-scale 3-D Surface Description: Open and Closed Surfaces
A novel technique for multi-scale smoothing of a free-form 3-D surface is presented. Complete triangulated models of 3-D objects are constructed automatically and using a local parametrization technique, are then smoothed using a 2-D Gaussian filter. Our method for local parametrization makes use of semigeodesic coordinates as a natural and efficient way of sampling the local surface shape. The smoothing eliminates the surface noise together with high curvature regions such as sharp edges, therefore, sharp corners become rounded as the object is smoothed iteratively. Our technique for free-form 3-D multi-scale surface smoothing is independent of the underlying triangulation. It is also argued that the proposed technique is preferrable to volumetric smoothing or level set methods since it is applicable to incomplete surface data which occurs during occlusion. Our technique was applied to closed as well as open 3-D surfaces and the results are presented here
PIXOR: Real-time 3D Object Detection from Point Clouds
We address the problem of real-time 3D object detection from point clouds in
the context of autonomous driving. Computation speed is critical as detection
is a necessary component for safety. Existing approaches are, however,
expensive in computation due to high dimensionality of point clouds. We utilize
the 3D data more efficiently by representing the scene from the Bird's Eye View
(BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs
oriented 3D object estimates decoded from pixel-wise neural network
predictions. The input representation, network architecture, and model
optimization are especially designed to balance high accuracy and real-time
efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection
benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets
we show that the proposed detector surpasses other state-of-the-art methods
notably in terms of Average Precision (AP), while still runs at >28 FPS.Comment: Update of CVPR2018 paper: correct timing, fix typos, add
acknowledgemen
Objects2action: Classifying and localizing actions without any video example
The goal of this paper is to recognize actions in video without the need for
examples. Different from traditional zero-shot approaches we do not demand the
design and specification of attribute classifiers and class-to-attribute
mappings to allow for transfer from seen classes to unseen classes. Our key
contribution is objects2action, a semantic word embedding that is spanned by a
skip-gram model of thousands of object categories. Action labels are assigned
to an object encoding of unseen video based on a convex combination of action
and object affinities. Our semantic embedding has three main characteristics to
accommodate for the specifics of actions. First, we propose a mechanism to
exploit multiple-word descriptions of actions and objects. Second, we
incorporate the automated selection of the most responsive objects per action.
And finally, we demonstrate how to extend our zero-shot approach to the
spatio-temporal localization of actions in video. Experiments on four action
datasets demonstrate the potential of our approach
- …