180,840 research outputs found

    Determining the 3-D structure and motion of objects using a scanning laser range sensor

    Get PDF
    In order for the EVAHR robot to autonomously track and grasp objects, its vision system must be able to determine the 3-D structure and motion of an object from a sequence of sensory images. This task is accomplished by the use of a laser radar range sensor which provides dense range maps of the scene. Unfortunately, the currently available laser radar range cameras use a sequential scanning approach which complicates image analysis. Although many algorithms have been developed for recognizing objects from range images, none are suited for use with single beam, scanning, time-of-flight sensors because all previous algorithms assume instantaneous acquisition of the entire image. This assumption is invalid since the EVAHR robot is equipped with a sequential scanning laser range sensor. If an object is moving while being imaged by the device, the apparent structure of the object can be significantly distorted due to the significant non-zero delay time between sampling each image pixel. If an estimate of the motion of the object can be determined, this distortion can be eliminated; but, this leads to the motion-structure paradox - most existing algorithms for 3-D motion estimation use the structure of objects to parameterize their motions. The goal of this research is to design a rigid-body motion recovery technique which overcomes this limitation. The method being developed is an iterative, linear, feature-based approach which uses the non-zero image acquisition time constraint to accurately recover the motion parameters from the distorted structure of the 3-D range maps. Once the motion parameters are determined, the structural distortion in the range images is corrected

    Three-Dimensional Motion Estimation of Objects for Video Coding

    Get PDF
    Three-dimensional (3-D) motion estimation is applied to the problem of motion compensation for video coding. We suppose that the video sequence consists of the perspective projections of a collection of rigid bodies which undergo a rototranslational motion. Motion compensation can be performed on the sequence once the shape of the objects and the motion parameters are determined. We show that the motion equations of a rigid body can be formulated as a nonlinear dynamic system whose state is represented by the motion parameters and by the scaled depths of the object feature points. An extended Kalman filter is used to estimate both the motion and the object shape parameters simultaneously. The inclusion of the shape parameters in the estimation procedure adds a set of constraints to the filter equations that appear to be essential for reliable motion estimation. Our experiments show that the proposed approach gives two advantages. First, the filter can give more reliable estimates in the presence of measurement noise in comparison with other motion estimators that separately compute motion and structure. Second, the filter can efficiently track abrupt motion changes. Moreover, the structure imposed by the model implies that the reconstructed motion is very natural as opposed to more common block-based schemes. Also, the parameterization of the model allows for a very efficient coding of the motion informatio

    Motion estimation using optical flow field

    Get PDF
    Over the last decade, many low-level vision algorithms have been devised for extracting depth from intensity images. Most of them are based on motion of the rigid observer. Translation and rotation are constants with respect to space coordinates. When multi-objects move and/or the objects change shape, the algorithms cannot be used. In this dissertation, we develop a new robust framework for the determination of dense 3-D position and motion fields from a stereo image sequence. The framework is based on unified optical flow field (UOFF). In the UOFF approach, a four frame mode is used to compute six dense 3-D position and velocity fields. Their accuracy depends on the accuracy of optical flow field computation. The approach can estimate rigid and/or nonrigid motion as well as observer and/or object(s) motion. Here, a novel approach to optical flow field computation is developed. The approach is named as correlation-feedback approach. It has three different features from any other existing approaches. They are feedback, rubber window, and special refinement. With those three features, error is reduced, boundary is conserved, subpixel estimation accuracy is increased, and the system is robust. Convergence of the algorithm is proved in general. Since the UOFF is based on each pixel, it is sensitive to noise or uncertainty at each pixel. In order to improve its performance, we applied two Kalman filters. Our analysis indicates that different image areas need different convergence rates, for instance. the areas along boundaries have faster convergence rate than an interior area. The first Kalman filter is developed to conserve moving boundary in optical How determination by applying needed nonhomogeneous iterations. The second Kalman filter is devised to compute 3-D motion and structure based on a stereo image sequence. Since multi-object motion is allowed, newly visible areas may be exposed in images. How to detect and handle the newly visible areas is addressed. The system and measurement noise covariance matrices, Q and R, in the two Kalman filters are analyzed in detail. Numerous experiments demonstrate the efficiency of our approach

    Recursive Motion Estimation of Range Image

    Get PDF
    ABSTRACT In this paper, we present an innovative recursive motion estimation technique that can take advantage of the in-depth resolution (range) to perform an accurate estimation of objects that have undergone 3-D translational and rotational movements. This approach iteratively aims at minimizing the error between the object in the current frame and its compensated object using estimated motion displacement from the previous range measurements. In addition, in order to use the range data on the non-rectangular grid in the Cartesian coordinate, we consider a combination of derivative filters and the transformation between the Cartesian coordinates and the sensor-centered coordinates. For sequences of moving range images we demonstrate the effectiveness of the proposed scheme. , which is more straightforward than the feature-based algorithm. In our approach, which falls in this category, we are mainly concerned with rigid motion where the structure of moving images is based on a single beam laser scanner technology. In this technology a deflection mirror assembly scans a beam over the scene. This type of technique has been widely used for many tactical and industria

    Automatic video segmentation employing object/camera modeling techniques

    Get PDF
    Practically established video compression and storage techniques still process video sequences as rectangular images without further semantic structure. However, humans watching a video sequence immediately recognize acting objects as semantic units. This semantic object separation is currently not reflected in the technical system, making it difficult to manipulate the video at the object level. The realization of object-based manipulation will introduce many new possibilities for working with videos like composing new scenes from pre-existing video objects or enabling user-interaction with the scene. Moreover, object-based video compression, as defined in the MPEG-4 standard, can provide high compression ratios because the foreground objects can be sent independently from the background. In the case that the scene background is static, the background views can even be combined into a large panoramic sprite image, from which the current camera view is extracted. This results in a higher compression ratio since the sprite image for each scene only has to be sent once. A prerequisite for employing object-based video processing is automatic (or at least user-assisted semi-automatic) segmentation of the input video into semantic units, the video objects. This segmentation is a difficult problem because the computer does not have the vast amount of pre-knowledge that humans subconsciously use for object detection. Thus, even the simple definition of the desired output of a segmentation system is difficult. The subject of this thesis is to provide algorithms for segmentation that are applicable to common video material and that are computationally efficient. The thesis is conceptually separated into three parts. In Part I, an automatic segmentation system for general video content is described in detail. Part II introduces object models as a tool to incorporate userdefined knowledge about the objects to be extracted into the segmentation process. Part III concentrates on the modeling of camera motion in order to relate the observed camera motion to real-world camera parameters. The segmentation system that is described in Part I is based on a background-subtraction technique. The pure background image that is required for this technique is synthesized from the input video itself. Sequences that contain rotational camera motion can also be processed since the camera motion is estimated and the input images are aligned into a panoramic scene-background. This approach is fully compatible to the MPEG-4 video-encoding framework, such that the segmentation system can be easily combined with an object-based MPEG-4 video codec. After an introduction to the theory of projective geometry in Chapter 2, which is required for the derivation of camera-motion models, the estimation of camera motion is discussed in Chapters 3 and 4. It is important that the camera-motion estimation is not influenced by foreground object motion. At the same time, the estimation should provide accurate motion parameters such that all input frames can be combined seamlessly into a background image. The core motion estimation is based on a feature-based approach where the motion parameters are determined with a robust-estimation algorithm (RANSAC) in order to distinguish the camera motion from simultaneously visible object motion. Our experiments showed that the robustness of the original RANSAC algorithm in practice does not reach the theoretically predicted performance. An analysis of the problem has revealed that this is caused by numerical instabilities that can be significantly reduced by a modification that we describe in Chapter 4. The synthetization of static-background images is discussed in Chapter 5. In particular, we present a new algorithm for the removal of the foreground objects from the background image such that a pure scene background remains. The proposed algorithm is optimized to synthesize the background even for difficult scenes in which the background is only visible for short periods of time. The problem is solved by clustering the image content for each region over time, such that each cluster comprises static content. Furthermore, it is exploited that the times, in which foreground objects appear in an image region, are similar to the corresponding times of neighboring image areas. The reconstructed background could be used directly as the sprite image in an MPEG-4 video coder. However, we have discovered that the counterintuitive approach of splitting the background into several independent parts can reduce the overall amount of data. In the case of general camera motion, the construction of a single sprite image is even impossible. In Chapter 6, a multi-sprite partitioning algorithm is presented, which separates the video sequence into a number of segments, for which independent sprites are synthesized. The partitioning is computed in such a way that the total area of the resulting sprites is minimized, while simultaneously satisfying additional constraints. These include a limited sprite-buffer size at the decoder, and the restriction that the image resolution in the sprite should never fall below the input-image resolution. The described multisprite approach is fully compatible to the MPEG-4 standard, but provides three advantages. First, any arbitrary rotational camera motion can be processed. Second, the coding-cost for transmitting the sprite images is lower, and finally, the quality of the decoded sprite images is better than in previously proposed sprite-generation algorithms. Segmentation masks for the foreground objects are computed with a change-detection algorithm that compares the pure background image with the input images. A special effect that occurs in the change detection is the problem of image misregistration. Since the change detection compares co-located image pixels in the camera-motion compensated images, a small error in the motion estimation can introduce segmentation errors because non-corresponding pixels are compared. We approach this problem in Chapter 7 by integrating risk-maps into the segmentation algorithm that identify pixels for which misregistration would probably result in errors. For these image areas, the change-detection algorithm is modified to disregard the difference values for the pixels marked in the risk-map. This modification significantly reduces the number of false object detections in fine-textured image areas. The algorithmic building-blocks described above can be combined into a segmentation system in various ways, depending on whether camera motion has to be considered or whether real-time execution is required. These different systems and example applications are discussed in Chapter 8. Part II of the thesis extends the described segmentation system to consider object models in the analysis. Object models allow the user to specify which objects should be extracted from the video. In Chapters 9 and 10, a graph-based object model is presented in which the features of the main object regions are summarized in the graph nodes, and the spatial relations between these regions are expressed with the graph edges. The segmentation algorithm is extended by an object-detection algorithm that searches the input image for the user-defined object model. We provide two objectdetection algorithms. The first one is specific for cartoon sequences and uses an efficient sub-graph matching algorithm, whereas the second processes natural video sequences. With the object-model extension, the segmentation system can be controlled to extract individual objects, even if the input sequence comprises many objects. Chapter 11 proposes an alternative approach to incorporate object models into a segmentation algorithm. The chapter describes a semi-automatic segmentation algorithm, in which the user coarsely marks the object and the computer refines this to the exact object boundary. Afterwards, the object is tracked automatically through the sequence. In this algorithm, the object model is defined as the texture along the object contour. This texture is extracted in the first frame and then used during the object tracking to localize the original object. The core of the algorithm uses a graph representation of the image and a newly developed algorithm for computing shortest circular-paths in planar graphs. The proposed algorithm is faster than the currently known algorithms for this problem, and it can also be applied to many alternative problems like shape matching. Part III of the thesis elaborates on different techniques to derive information about the physical 3-D world from the camera motion. In the segmentation system, we employ camera-motion estimation, but the obtained parameters have no direct physical meaning. Chapter 12 discusses an extension to the camera-motion estimation to factorize the motion parameters into physically meaningful parameters (rotation angles, focal-length) using camera autocalibration techniques. The speciality of the algorithm is that it can process camera motion that spans several sprites by employing the above multi-sprite technique. Consequently, the algorithm can be applied to arbitrary rotational camera motion. For the analysis of video sequences, it is often required to determine and follow the position of the objects. Clearly, the object position in image coordinates provides little information if the viewing direction of the camera is not known. Chapter 13 provides a new algorithm to deduce the transformation between the image coordinates and the real-world coordinates for the special application of sport-video analysis. In sport videos, the camera view can be derived from markings on the playing field. For this reason, we employ a model of the playing field that describes the arrangement of lines. After detecting significant lines in the input image, a combinatorial search is carried out to establish correspondences between lines in the input image and lines in the model. The algorithm requires no information about the specific color of the playing field and it is very robust to occlusions or poor lighting conditions. Moreover, the algorithm is generic in the sense that it can be applied to any type of sport by simply exchanging the model of the playing field. In Chapter 14, we again consider panoramic background images and particularly focus ib their visualization. Apart from the planar backgroundsprites discussed previously, a frequently-used visualization technique for panoramic images are projections onto a cylinder surface which is unwrapped into a rectangular image. However, the disadvantage of this approach is that the viewer has no good orientation in the panoramic image because he looks into all directions at the same time. In order to provide a more intuitive presentation of wide-angle views, we have developed a visualization technique specialized for the case of indoor environments. We present an algorithm to determine the 3-D shape of the room in which the image was captured, or, more generally, to compute a complete floor plan if several panoramic images captured in each of the rooms are provided. Based on the obtained 3-D geometry, a graphical model of the rooms is constructed, where the walls are displayed with textures that are extracted from the panoramic images. This representation enables to conduct virtual walk-throughs in the reconstructed room and therefore, provides a better orientation for the user. Summarizing, we can conclude that all segmentation techniques employ some definition of foreground objects. These definitions are either explicit, using object models like in Part II of this thesis, or they are implicitly defined like in the background synthetization in Part I. The results of this thesis show that implicit descriptions, which extract their definition from video content, work well when the sequence is long enough to extract this information reliably. However, high-level semantics are difficult to integrate into the segmentation approaches that are based on implicit models. Intead, those semantics should be added as postprocessing steps. On the other hand, explicit object models apply semantic pre-knowledge at early stages of the segmentation. Moreover, they can be applied to short video sequences or even still pictures since no background model has to be extracted from the video. The definition of a general object-modeling technique that is widely applicable and that also enables an accurate segmentation remains an important yet challenging problem for further research

    Learning Articulated Motions From Visual Demonstration

    Full text link
    Many functional elements of human homes and workplaces consist of rigid components which are connected through one or more sliding or rotating linkages. Examples include doors and drawers of cabinets and appliances; laptops; and swivel office chairs. A robotic mobile manipulator would benefit from the ability to acquire kinematic models of such objects from observation. This paper describes a method by which a robot can acquire an object model by capturing depth imagery of the object as a human moves it through its range of motion. We envision that in future, a machine newly introduced to an environment could be shown by its human user the articulated objects particular to that environment, inferring from these "visual demonstrations" enough information to actuate each object independently of the user. Our method employs sparse (markerless) feature tracking, motion segmentation, component pose estimation, and articulation learning; it does not require prior object models. Using the method, a robot can observe an object being exercised, infer a kinematic model incorporating rigid, prismatic and revolute joints, then use the model to predict the object's motion from a novel vantage point. We evaluate the method's performance, and compare it to that of a previously published technique, for a variety of household objects.Comment: Published in Robotics: Science and Systems X, Berkeley, CA. ISBN: 978-0-9923747-0-

    Unsupervised Learning of Complex Articulated Kinematic Structures combining Motion and Skeleton Information

    Get PDF
    In this paper we present a novel framework for unsupervised kinematic structure learning of complex articulated objects from a single-view image sequence. In contrast to prior motion information based methods, which estimate relatively simple articulations, our method can generate arbitrarily complex kinematic structures with skeletal topology by a successive iterative merge process. The iterative merge process is guided by a skeleton distance function which is generated from a novel object boundary generation method from sparse points. Our main contributions can be summarised as follows: (i) Unsupervised complex articulated kinematic structure learning by combining motion and skeleton information. (ii) Iterative fine-to-coarse merging strategy for adaptive motion segmentation and structure smoothing. (iii) Skeleton estimation from sparse feature points. (iv) A new highly articulated object dataset containing multi-stage complexity with ground truth. Our experiments show that the proposed method out-performs state-of-the-art methods both quantitatively and qualitatively

    A joint motion & disparity motion estimation technique for 3D integral video compression using evolutionary strategy

    Get PDF
    3D imaging techniques have the potential to establish a future mass-market in the fields of entertainment and communications. Integral imaging, which can capture true 3D color images with only one camera, has been seen as the right technology to offer stress-free viewing to audiences of more than one person. Just like any digital video, 3D video sequences must also be compressed in order to make it suitable for consumer domain applications. However, ordinary compression techniques found in state-of-the-art video coding standards such as H.264, MPEG-4 and MPEG-2 are not capable of producing enough compression while preserving the 3D clues. Fortunately, a huge amount of redundancies can be found in an integral video sequence in terms of motion and disparity. This paper discusses a novel approach to use both motion and disparity information to compress 3D integral video sequences. We propose to decompose the integral video sequence down to viewpoint video sequences and jointly exploit motion and disparity redundancies to maximize the compression. We further propose an optimization technique based on evolutionary strategies to minimize the computational complexity of the joint motion disparity estimation. Experimental results demonstrate that Joint Motion and Disparity Estimation can achieve over 1 dB objective quality gain over normal motion estimation. Once combined with Evolutionary strategy, this can achieve up to 94% computational cost saving

    Three dimensional transparent structure segmentation and multiple 3D motion estimation from monocular perspective image sequences

    Get PDF
    A three dimensional scene can be segmented using different cues, such as boundaries, texture, motion, discontinuities of the optical flow, stereo, models for structure, etc. We investigate segmentation based upon one of these cues, namely three dimensional motion. If the scene contain transparent objects, the two dimensional (local) cues are inconsistent, since neighboring points with similar optical flow can correspond to different objects. We present a method for performing three dimensional motion-based segmentation of (possibly) transparent scenes together with recursive estimation of the motion of each independent rigid object from monocular perspective images. Our algorithm is based on a recently proposed method for rigid motion reconstruction and a validation test which allows us to initialize the scheme and detect outliers during the motion estimation procedure. The scheme is tested on challenging real and synthetic image sequences. Segmentation is performed for the Ullmann's experiment of two transparent cylinders rotating about the same axis in opposite directions
    • 

    corecore