386,889 research outputs found

    Model-Based Three-Dimensional Object Recognition and Localization Using Properties of Surface Curvatures.

    Get PDF
    The ability to recognize three-dimensional (3-D) objects accurately from range images is a fundamental goal of vision in robotics. This facility is important in automated manufacturing environments in industry. In contrast to the extensive work done in computer-aided design and manufacturing (CAD/CAM), the robotic process is primitive and ad hoc. This thesis defines and investigates a fundamental problem in robot vision systems: recognizing and localizing multiple free-form 3-D objects in range images. An effective and efficient approach is developed and implemented as a system Free-form Object Recognition and Localization (FORL). The technique used for surface characterization is surface curvatures derived from geometric models of objects. It uniquely defines surface shapes in conjunction with a knowledge representation scheme which is used in the search for corresponding surfaces of an objects. Model representation has a significant effect on model-based recognition. Without using surface properties, many important industrial vision tasks would remain beyond the competence of machine vision. Knowledge about model surface shapes is automatically abstracted from CAD models, and the CAD models are also used directly in the vision process. The knowledge representation scheme eases the processes of acquisition, retrieval, modification and reasoning so that the recognition and localization process is effective and efficient. Our approach is to recognize objects by hypothesizing and locating objects. The knowledge about the object surface shapes is used to infer the hypotheses and the CAD models are used to locate the objects. Therefore, localization becomes a by-product of the recognition process, which is significant since localization of an object is necessary in robotic applications. One of the most important problems in 3-D machine vision is the recognition of objects from their partial view due to occlusion. Our approach is surface-based, thus, sensitive to neither noise nor occlusion. For the same reason, surface-based recognition also makes the multiple object recognition easier. Our approach uses appropriate strategies for recognition and localization of 3-D solids by using the information from the CAD database, which makes the integration of robot vision systems with CAD/CAM systems a promising future

    Learning the Semantics of Manipulation Action

    Full text link
    In this paper we present a formal computational framework for modeling manipulation actions. The introduced formalism leads to semantics of manipulation action and has applications to both observing and understanding human manipulation actions as well as executing them with a robotic mechanism (e.g. a humanoid robot). It is based on a Combinatory Categorial Grammar. The goal of the introduced framework is to: (1) represent manipulation actions with both syntax and semantic parts, where the semantic part employs λ\lambda-calculus; (2) enable a probabilistic semantic parsing schema to learn the λ\lambda-calculus representation of manipulation action from an annotated action corpus of videos; (3) use (1) and (2) to develop a system that visually observes manipulation actions and understands their meaning while it can reason beyond observations using propositional logic and axiom schemata. The experiments conducted on a public available large manipulation action dataset validate the theoretical framework and our implementation

    Competition and selection during visual processing of natural scenes and objects

    Get PDF
    When a visual scene, containing many discrete objects, is presented to our retinae, only a subset of these objects will be explicitly represented in visual awareness. The number of objects accessing short-term visual memory might be even smaller. Finally, it is not known to what extent “ignored” objects (those that do not enter visual awareness) will be processed –or recognized. By combining free recall, forced-choice recognition and visual priming paradigms for the same natural visual scenes and subjects, we were able to estimate these numbers, and provide insights as to the fate of objects that are not explicitly recognized in a single fixation. When presented for 250 ms with a scene containing 10 distinct objects, human observers can remember up to 4 objects with full confidence, and between 2 and 3 more when forced to guess. Importantly, the objects that the subjects consistently failed to report elicited a significant negative priming effect when presented in a subsequent task, suggesting that their identity was represented in high-level cortical areas of the visual system, before the corresponding neural activity was suppressed during attentional selection. These results shed light on neural mechanisms of attentional competition, and representational capacity at different levels of the human visual system

    Multi-scale 3-D Surface Description: Open and Closed Surfaces

    Get PDF
    A novel technique for multi-scale smoothing of a free-form 3-D surface is presented. Complete triangulated models of 3-D objects are constructed automatically and using a local parametrization technique, are then smoothed using a 2-D Gaussian filter. Our method for local parametrization makes use of semigeodesic coordinates as a natural and efficient way of sampling the local surface shape. The smoothing eliminates the surface noise together with high curvature regions such as sharp edges, therefore, sharp corners become rounded as the object is smoothed iteratively. Our technique for free-form 3-D multi-scale surface smoothing is independent of the underlying triangulation. It is also argued that the proposed technique is preferrable to volumetric smoothing or level set methods since it is applicable to incomplete surface data which occurs during occlusion. Our technique was applied to closed as well as open 3-D surfaces and the results are presented here

    PIXOR: Real-time 3D Object Detection from Point Clouds

    Full text link
    We address the problem of real-time 3D object detection from point clouds in the context of autonomous driving. Computation speed is critical as detection is a necessary component for safety. Existing approaches are, however, expensive in computation due to high dimensionality of point clouds. We utilize the 3D data more efficiently by representing the scene from the Bird's Eye View (BEV), and propose PIXOR, a proposal-free, single-stage detector that outputs oriented 3D object estimates decoded from pixel-wise neural network predictions. The input representation, network architecture, and model optimization are especially designed to balance high accuracy and real-time efficiency. We validate PIXOR on two datasets: the KITTI BEV object detection benchmark, and a large-scale 3D vehicle detection benchmark. In both datasets we show that the proposed detector surpasses other state-of-the-art methods notably in terms of Average Precision (AP), while still runs at >28 FPS.Comment: Update of CVPR2018 paper: correct timing, fix typos, add acknowledgemen

    Objects2action: Classifying and localizing actions without any video example

    Get PDF
    The goal of this paper is to recognize actions in video without the need for examples. Different from traditional zero-shot approaches we do not demand the design and specification of attribute classifiers and class-to-attribute mappings to allow for transfer from seen classes to unseen classes. Our key contribution is objects2action, a semantic word embedding that is spanned by a skip-gram model of thousands of object categories. Action labels are assigned to an object encoding of unseen video based on a convex combination of action and object affinities. Our semantic embedding has three main characteristics to accommodate for the specifics of actions. First, we propose a mechanism to exploit multiple-word descriptions of actions and objects. Second, we incorporate the automated selection of the most responsive objects per action. And finally, we demonstrate how to extend our zero-shot approach to the spatio-temporal localization of actions in video. Experiments on four action datasets demonstrate the potential of our approach
    • …
    corecore