75,483 research outputs found

    Robust motion estimation using connected operators

    Get PDF
    This paper discusses the use of connected operators for robust motion estimation The proposed strategy involves a motion estimation step extracting the dominant motion and a ltering step relying on connected operators that remove objects that do not fol low the dominant motion. These two steps are iterated in order to obtain an accurate motion estimation and a precise de nition of the objects fol lowing this motion This strategy can be applied on the entire frame or on individual connected components As a result the complete motion oriented segmentation and motion estimation of the frame can be achievedPeer ReviewedPostprint (published version

    Three dimensional transparent structure segmentation and multiple 3D motion estimation from monocular perspective image sequences

    Get PDF
    A three dimensional scene can be segmented using different cues, such as boundaries, texture, motion, discontinuities of the optical flow, stereo, models for structure, etc. We investigate segmentation based upon one of these cues, namely three dimensional motion. If the scene contain transparent objects, the two dimensional (local) cues are inconsistent, since neighboring points with similar optical flow can correspond to different objects. We present a method for performing three dimensional motion-based segmentation of (possibly) transparent scenes together with recursive estimation of the motion of each independent rigid object from monocular perspective images. Our algorithm is based on a recently proposed method for rigid motion reconstruction and a validation test which allows us to initialize the scheme and detect outliers during the motion estimation procedure. The scheme is tested on challenging real and synthetic image sequences. Segmentation is performed for the Ullmann's experiment of two transparent cylinders rotating about the same axis in opposite directions

    Video object segmentation introducing depth and motion information

    Get PDF
    We present a method to estimate the relative depth between objects in scenes of video sequences. The information for the estimation of the relative depth is obtained from the overlapping produced between objects when there is relative motion as well as from motion coherence between neighbouring regions. A relaxation labelling algorithm is used to solve conflicts and assign every region to a depth level. The depth estimation is used in a segmentation scheme which uses grey level information to produce a first segmentation. Regions of this partition are merged on the basis of their depth level.Peer ReviewedPostprint (published version

    A cooperative Top-Down/Bottom-Up Technique for Motion Field Segmentation

    Get PDF
    The segmentation of video sequences into regions underlying a coherent motion is one of the most useful processing for video analysis and coding. In this paper, we propose an algorithm that exploits the advantages of both top-down and bottom-up techniques for motion eld segmentation. To remove camera motion, a global motion estimation and compensation is rst performed. Local motion estimation is then carried out relying on a traslational motion model. Starting from this motion eld, a two-stage analysis based on ane models takes place. In the rst stage, using a top-down segmentation technique, macro-regions with coherent ane motion are extracted. In the second stage, the segmentation of each macro-region is rened using a bottom-up approach based on a motion vector clustering. In order to further improve the accuracy of the spatio-temporal segmentation, a Markov Random Field (MRF)-inspired motion-and-intensity based renement step is performed to adjust objects boundaries

    Joint Optical Flow and Temporally Consistent Semantic Segmentation

    Full text link
    The importance and demands of visual scene understanding have been steadily increasing along with the active development of autonomous systems. Consequently, there has been a large amount of research dedicated to semantic segmentation and dense motion estimation. In this paper, we propose a method for jointly estimating optical flow and temporally consistent semantic segmentation, which closely connects these two problem domains and leverages each other. Semantic segmentation provides information on plausible physical motion to its associated pixels, and accurate pixel-level temporal correspondences enhance the accuracy of semantic segmentation in the temporal domain. We demonstrate the benefits of our approach on the KITTI benchmark, where we observe performance gains for flow and segmentation. We achieve state-of-the-art optical flow results, and outperform all published algorithms by a large margin on challenging, but crucial dynamic objects.Comment: 14 pages, Accepted for CVRSUAD workshop at ECCV 201

    Self-Supervised Relative Depth Learning for Urban Scene Understanding

    Full text link
    As an agent moves through the world, the apparent motion of scene elements is (usually) inversely proportional to their depth. It is natural for a learning agent to associate image patterns with the magnitude of their displacement over time: as the agent moves, faraway mountains don't move much; nearby trees move a lot. This natural relationship between the appearance of objects and their motion is a rich source of information about the world. In this work, we start by training a deep network, using fully automatic supervision, to predict relative scene depth from single images. The relative depth training images are automatically derived from simple videos of cars moving through a scene, using recent motion segmentation techniques, and no human-provided labels. This proxy task of predicting relative depth from a single image induces features in the network that result in large improvements in a set of downstream tasks including semantic segmentation, joint road segmentation and car detection, and monocular (absolute) depth estimation, over a network trained from scratch. The improvement on the semantic segmentation task is greater than those produced by any other automatically supervised methods. Moreover, for monocular depth estimation, our unsupervised pre-training method even outperforms supervised pre-training with ImageNet. In addition, we demonstrate benefits from learning to predict (unsupervised) relative depth in the specific videos associated with various downstream tasks. We adapt to the specific scenes in those tasks in an unsupervised manner to improve performance. In summary, for semantic segmentation, we present state-of-the-art results among methods that do not use supervised pre-training, and we even exceed the performance of supervised ImageNet pre-trained models for monocular depth estimation, achieving results that are comparable with state-of-the-art methods

    Study on Segmentation and Global Motion Estimation in Object Tracking Based on Compressed Domain

    Get PDF
    Object tracking is an interesting and needed procedure for many real time applications. But it is a challenging one, because of the presence of challenging sequences with abrupt motion occlusion, cluttered background and also the camera shake. In many video processing systems, the presence of moving objects limits the accuracy of Global Motion Estimation (GME). On the other hand, the inaccuracy of global motion parameter estimates affects the performance of motion segmentation. In the proposed method, we introduce a procedure for simultaneous object segmentation and GME from block-based motion vector (MV) field, motion vector is refined firstly by spatial and temporal correlation of motion and initial segmentation is produced by using the motion vector difference after global motion estimation

    Learning from Synthetic Humans

    Get PDF
    Estimating human pose, shape, and motion from images and videos are fundamental challenges with many applications. Recent advances in 2D human pose estimation use large amounts of manually-labeled training data for learning convolutional neural networks (CNNs). Such data is time consuming to acquire and difficult to extend. Moreover, manual labeling of 3D pose, depth and motion is impractical. In this work we present SURREAL (Synthetic hUmans foR REAL tasks): a new large-scale dataset with synthetically-generated but realistic images of people rendered from 3D sequences of human motion capture data. We generate more than 6 million frames together with ground truth pose, depth maps, and segmentation masks. We show that CNNs trained on our synthetic dataset allow for accurate human depth estimation and human part segmentation in real RGB images. Our results and the new dataset open up new possibilities for advancing person analysis using cheap and large-scale synthetic data.Comment: Appears in: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017). 9 page

    Region and graph-based motion segmentation

    Get PDF
    Indexado ISIThis paper describes an approach for integrating motion estimation and region clustering techniques with the purpose of obtaining precise multiple motion segmentation. Motivated by the good results obtained in static segmentation we propose a hybrid approach where motion segmentation is achieved within a region-based clustering approach taken the initial result of a spatial pre-segmentation and extended to include motion information. Motion vectors are first estimated with a multiscale variational method applied directly over the input images and then refined by incorporating segmentation results into a region-based warping scheme. The complete algorithm facilitates obtaining spatially continuous segmentation maps which are closely related to actual object boundaries. A comparative study is made with some of the best known motion segmentation algorithms
    • …
    corecore