45 research outputs found

    Learning to Segment Moving Objects in Videos

    Full text link
    We segment moving objects in videos by ranking spatio-temporal segment proposals according to "moving objectness": how likely they are to contain a moving object. In each video frame, we compute segment proposals using multiple figure-ground segmentations on per frame motion boundaries. We rank them with a Moving Objectness Detector trained on image and motion fields to detect moving objects and discard over/under segmentations or background parts of the scene. We extend the top ranked segments into spatio-temporal tubes using random walkers on motion affinities of dense point trajectories. Our final tube ranking consistently outperforms previous segmentation methods in the two largest video segmentation benchmarks currently available, for any number of proposals. Further, our per frame moving object proposals increase the detection rate up to 7\% over previous state-of-the-art static proposal methods

    Segmentation and Classification of Multimodal Imagery

    Get PDF
    Segmentation and classification are two important computer vision tasks that transform input data into a compact representation that allow fast and efficient analysis. Several challenges exist in generating accurate segmentation or classification results. In a video, for example, objects often change the appearance and are partially occluded, making it difficult to delineate the object from its surroundings. This thesis proposes video segmentation and aerial image classification algorithms to address some of the problems and provide accurate results. We developed a gradient driven three-dimensional segmentation technique that partitions a video into spatiotemporal objects. The algorithm utilizes the local gradient computed at each pixel location together with the global boundary map acquired through deep learning methods to generate initial pixel groups by traversing from low to high gradient regions. A local clustering method is then employed to refine these initial pixel groups. The refined sub-volumes in the homogeneous regions of video are selected as initial seeds and iteratively combined with adjacent groups based on intensity similarities. The volume growth is terminated at the color boundaries of the video. The over-segments obtained from the above steps are then merged hierarchically by a multivariate approach yielding a final segmentation map for each frame. In addition, we also implemented a streaming version of the above algorithm that requires a lower computational memory. The results illustrate that our proposed methodology compares favorably well, on a qualitative and quantitative level, in segmentation quality and computational efficiency with the latest state of the art techniques. We also developed a convolutional neural network (CNN)-based method to efficiently combine information from multisensor remotely sensed images for pixel-wise semantic classification. The CNN features obtained from multiple spectral bands are fused at the initial layers of deep neural networks as opposed to final layers. The early fusion architecture has fewer parameters and thereby reduces the computational time and GPU memory during training and inference. We also introduce a composite architecture that fuses features throughout the network. The methods were validated on four different datasets: ISPRS Potsdam, Vaihingen, IEEE Zeebruges, and Sentinel-1, Sentinel-2 dataset. For the Sentinel-1,-2 datasets, we obtain the ground truth labels for three classes from OpenStreetMap. Results on all the images show early fusion, specifically after layer three of the network, achieves results similar to or better than a decision level fusion mechanism. The performance of the proposed architecture is also on par with the state-of-the-art results

    Scale-Adaptive Video Understanding.

    Full text link
    The recent rise of large-scale, diverse video data has urged a new era of high-level video understanding. It is increasingly critical for intelligent systems to extract semantics from videos. In this dissertation, we explore the use of supervoxel hierarchies as a type of video representation for high-level video understanding. The supervoxel hierarchies contain rich multiscale decompositions of video content, where various structures can be found at various levels. However, no single level of scale contains all the desired structures we need. It is essential to adaptively choose the scales for subsequent video analysis. Thus, we present a set of tools to manipulate scales in supervoxel hierarchies including both scale generation and scale selection methods. In our scale generation work, we evaluate a set of seven supervoxel methods in the context of what we consider to be a good supervoxel for video representation. We address a key limitation that has traditionally prevented supervoxel scale generation on long videos. We do so by proposing an approximation framework for streaming hierarchical scale generation that is able to generate multiscale decompositions for arbitrarily-long videos using constant memory. Subsequently, we present two scale selection methods that are able to adaptively choose the scales according to application needs. The first method flattens the entire supervoxel hierarchy into a single segmentation that overcomes the limitation induced by trivial selection of a single scale. We show that the selection can be driven by various post hoc feature criteria. The second scale selection method combines the supervoxel hierarchy with a conditional random field for the task of labeling actors and actions in videos. We formulate the scale selection problem and the video labeling problem in a joint framework. Experiments on a novel large-scale video dataset demonstrate the effectiveness of the explicit consideration of scale selection in video understanding. Aside from the computational methods, we present a visual psychophysical study to quantify how well the actor and action semantics in high-level video understanding are retained in supervoxel hierarchies. The ultimate findings suggest that some semantics are well-retained in the supervoxel hierarchies and can be used for further video analysis.PhDComputer Science and EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/133202/1/cliangxu_1.pd

    Unsupervised brain anomaly detection in MR images

    Get PDF
    Brain disorders are characterized by morphological deformations in shape and size of (sub)cortical structures in one or both hemispheres. These deformations cause deviations from the normal pattern of brain asymmetries, resulting in asymmetric lesions that directly affect the patient’s condition. Unsupervised methods aim to learn a model from unlabeled healthy images, so that an unseen image that breaks priors of this model, i.e., an outlier, is considered an anomaly. Consequently, they are generic in detecting any lesions, e.g., coming from multiple diseases, as long as these notably differ from healthy training images. This thesis addresses the development of solutions to leverage unsupervised machine learning for the detection/analysis of abnormal brain asymmetries related to anomalies in magnetic resonance (MR) images. First, we propose an automatic probabilistic-atlas-based approach for anomalous brain image segmentation. Second, we explore an automatic method for the detection of abnormal hippocampi from abnormal asymmetries based on deep generative networks and a one-class classifier. Third, we present a more generic framework to detect abnormal asymmetries in the entire brain hemispheres. Our approach extracts pairs of symmetric regions — called supervoxels — in both hemispheres of a test image under study. One-class classifiers then analyze the asymmetries present in each pair. Experimental results on 3D MR-T1 images from healthy subjects and patients with a variety of lesions show the effectiveness and robustness of the proposed unsupervised approaches for brain anomaly detection

    Efficient extraction of semantic information from medical images in large datasets using random forests

    No full text
    Large datasets of unlabelled medical images are increasingly becoming available; however only a small subset tend to be manually semantically labelled as it is a tedious and extremely time-consuming task to do for large datasets. This thesis aims to tackle the problem of efficiently extracting semantic information in the form of image segmentations and organ localisations from large datasets of unlabelled medical images. To do so, we investigate the suitability of supervoxels and random classification forests for the task. The first contribution of this thesis is a novel method for efficiently estimating coarse correspondences between pairs of images that can handle difficult cases that exhibit large variations in fields of view. The proposed methods adapts the random forest framework, which is a supervised learning algorithm, to work in an unsupervised manner by automatically generating labels for training via the use of supervoxels. The second contribution of this thesis is a method that extends our first contribution so as to be applicable efficiently on a large dataset of images. The proposed method is efficient and can be used to obtain correspondences between a large number of object-like supervoxels that are representative of organ structures in the images. The method is evaluated for the applications of organ-based image retrieval and weakly-supervised image segmentation using extremely minimal user input. While the method does not achieve image segmentation accuracies for all organs in an abdominal CT dataset compared to current fully-supervised state-of-the-art methods, it does provide a promising way for efficiently extracting and parsing a large dataset of medical images for the purpose of further processing.Open Acces

    Supervoxel-Consistent Foreground Propagation in Video

    Full text link
    Abstract. A major challenge in video segmentation is that the fore-ground object may move quickly in the scene at the same time its ap-pearance and shape evolves over time. While pairwise potentials used in graph-based algorithms help smooth labels between neighboring (su-per)pixels in space and time, they offer only a myopic view of consis-tency and can be misled by inter-frame optical flow errors. We propose a higher order supervoxel label consistency potential for semi-supervised foreground segmentation. Given an initial frame with manual annota-tion for the foreground object, our approach propagates the foreground region through time, leveraging bottom-up supervoxels to guide its es-timates towards long-range coherent regions. We validate our approach on three challenging datasets and achieve state-of-the-art results.
    corecore