75 research outputs found

    Incorporating Boltzmann Machine Priors for Semantic Labeling in Images and Videos

    Get PDF
    Semantic labeling is the task of assigning category labels to regions in an image. For example, a scene may consist of regions corresponding to categories such as sky, water, and ground, or parts of a face such as eyes, nose, and mouth. Semantic labeling is an important mid-level vision task for grouping and organizing image regions into coherent parts. Labeling these regions allows us to better understand the scene itself as well as properties of the objects in the scene, such as their parts, location, and interaction within the scene. Typical approaches for this task include the conditional random field (CRF), which is well-suited to modeling local interactions among adjacent image regions. However the CRF is limited in dealing with complex, global (long-range) interactions between regions in an image, and between frames in a video. This thesis presents approaches to modeling long-range interactions within images and videos, for use in semantic labeling. In order to model these long-range interactions, we incorporate priors based on the restricted Boltzmann machine (RBM). The RBM is a generative model which has demonstrated the ability to learn the shape of an object and the CRBM is a temporal extension which can learn the motion of an object. Although the CRF is a good baseline labeler, we show how the RBM and CRBM can be added to the architecture to model both the global object shape within an image and the temporal dependencies of the object from previous frames in a video. We demonstrate the labeling performance of our models for the parts of complex face images from the Labeled Faces in the Wild database (for images) and the YouTube Faces Database (for videos). Our hybrid models produce results that are both quantitatively and qualitatively better than the baseline CRF alone for both images and videos

    Multitemporal Very High Resolution from Space: Outcome of the 2016 IEEE GRSS Data Fusion Contest

    Get PDF
    In this paper, the scientific outcomes of the 2016 Data Fusion Contest organized by the Image Analysis and Data Fusion Technical Committee of the IEEE Geoscience and Remote Sensing Society are discussed. The 2016 Contest was an open topic competition based on a multitemporal and multimodal dataset, which included a temporal pair of very high resolution panchromatic and multispectral Deimos-2 images and a video captured by the Iris camera on-board the International Space Station. The problems addressed and the techniques proposed by the participants to the Contest spanned across a rather broad range of topics, and mixed ideas and methodologies from the remote sensing, video processing, and computer vision. In particular, the winning team developed a deep learning method to jointly address spatial scene labeling and temporal activity modeling using the available image and video data. The second place team proposed a random field model to simultaneously perform coregistration of multitemporal data, semantic segmentation, and change detection. The methodological key ideas of both these approaches and the main results of the corresponding experimental validation are discussed in this paper

    Visual object category discovery in images and videos

    Get PDF
    textThe current trend in visual recognition research is to place a strict division between the supervised and unsupervised learning paradigms, which is problematic for two main reasons. On the one hand, supervised methods require training data for each and every category that the system learns; training data may not always be available and is expensive to obtain. On the other hand, unsupervised methods must determine the optimal visual cues and distance metrics that distinguish one category from another to group images into semantically meaningful categories; however, for unlabeled data, these are unknown a priori. I propose a visual category discovery framework that transcends the two paradigms and learns accurate models with few labeled exemplars. The main insight is to automatically focus on the prevalent objects in images and videos, and learn models from them for category grouping, segmentation, and summarization. To implement this idea, I first present a context-aware category discovery framework that discovers novel categories by leveraging context from previously learned categories. I devise a novel object-graph descriptor to model the interaction between a set of known categories and the unknown to-be-discovered categories, and group regions that have similar appearance and similar object-graphs. I then present a collective segmentation framework that simultaneously discovers the segmentations and groupings of objects by leveraging the shared patterns in the unlabeled image collection. It discovers an ensemble of representative instances for each unknown category, and builds top-down models from them to refine the segmentation of the remaining instances. Finally, building on these techniques, I show how to produce compact visual summaries for first-person egocentric videos that focus on the important people and objects. The system leverages novel egocentric and high-level saliency features to predict important regions in the video, and produces a concise visual summary that is driven by those regions. I compare against existing state-of-the-art methods for category discovery and segmentation on several challenging benchmark datasets. I demonstrate that we can discover visual concepts more accurately by focusing on the prevalent objects in images and videos, and show clear advantages of departing from the status quo division between the supervised and unsupervised learning paradigms. The main impact of my thesis is that it lays the groundwork for building large-scale visual discovery systems that can automatically discover visual concepts with minimal human supervision.Electrical and Computer Engineerin

    An improved dynamic graph tracking algorithm

    Get PDF
    We propose several improvements of an existing baseline short-term visual tracking algorithm. The baseline tracker applies a dynamic graph representation to track the target. The target local parts are used as nodes in the graph, while the connections between neighboring parts represent the graph edges. This flexible model %representation of the target structure proves useful in the presence of extensive target visual changes throughout the sequence. A recent benchmark has shown that the tracker compares favorably in performance with other state-of-the-art trackers, with a notable weakness in cases of input sequences with high variance in scene and object lighting. We have performed an in-depth analysis of the tracker and propose a list of improvements. With respect to an unstable component in the tracker implementation of the foreground/background image segmentation, we propose an improvement which boosts the accuracy in cases of rapid illumination change of the target. We also propose a dynamic adjustment of the aforementioned segmentation with respect to the size of the resulting foreground, which improves tracking reliability and reduces the number of tracking failures. The implemented improvements are analyzed on the VOT2015 benchmark. Fixing the unstable component yields improvements in cases of rapid illumination change and reduces failure rate, while the dynamic segmentation adjustment improves tracking accuracy and robustness in the vast majority of cases, barring rapid illumination change

    An improved dynamic graph tracking algorithm

    Get PDF
    We propose several improvements of an existing baseline short-term visual tracking algorithm. The baseline tracker applies a dynamic graph representation to track the target. The target local parts are used as nodes in the graph, while the connections between neighboring parts represent the graph edges. This flexible model %representation of the target structure proves useful in the presence of extensive target visual changes throughout the sequence. A recent benchmark has shown that the tracker compares favorably in performance with other state-of-the-art trackers, with a notable weakness in cases of input sequences with high variance in scene and object lighting. We have performed an in-depth analysis of the tracker and propose a list of improvements. With respect to an unstable component in the tracker implementation of the foreground/background image segmentation, we propose an improvement which boosts the accuracy in cases of rapid illumination change of the target. We also propose a dynamic adjustment of the aforementioned segmentation with respect to the size of the resulting foreground, which improves tracking reliability and reduces the number of tracking failures. The implemented improvements are analyzed on the VOT2015 benchmark. Fixing the unstable component yields improvements in cases of rapid illumination change and reduces failure rate, while the dynamic segmentation adjustment improves tracking accuracy and robustness in the vast majority of cases, barring rapid illumination change

    An improved dynamic graph tracking algorithm

    Get PDF
    We propose several improvements of an existing baseline short-term visual tracking algorithm. The baseline tracker applies a dynamic graph representation to track the target. The target local parts are used as nodes in the graph, while the connections between neighboring parts represent the graph edges. This flexible model %representation of the target structure proves useful in the presence of extensive target visual changes throughout the sequence. A recent benchmark has shown that the tracker compares favorably in performance with other state-of-the-art trackers, with a notable weakness in cases of input sequences with high variance in scene and object lighting. We have performed an in-depth analysis of the tracker and propose a list of improvements. With respect to an unstable component in the tracker implementation of the foreground/background image segmentation, we propose an improvement which boosts the accuracy in cases of rapid illumination change of the target. We also propose a dynamic adjustment of the aforementioned segmentation with respect to the size of the resulting foreground, which improves tracking reliability and reduces the number of tracking failures. The implemented improvements are analyzed on the VOT2015 benchmark. Fixing the unstable component yields improvements in cases of rapid illumination change and reduces failure rate, while the dynamic segmentation adjustment improves tracking accuracy and robustness in the vast majority of cases, barring rapid illumination change

    Superpixel lattices

    Get PDF
    Superpixels are small image segments that are used in popular approaches to object detection and recognition problems. The superpixel approach is motivated by the observation that pixels within small image segments can usually be attributed the same label. This allows a superpixel representation to produce discriminative features based on data dependent regions of support. The reduced set of image primitives produced by superpixels can also be exploited to improve the efficiency of subsequent processing steps. However, it is common for the superpixel representation to have a different graph structure from the original pixel representation of the image. The first part of the thesis argues that a number of desirable properties of the pixel representation should be maintained by superpixels and that this is not possible with existing methods. We propose a new representation, the superpixel lattice, and demonstrate its advantages. The second part of the thesis investigates incorporating a priori information into superpixel segmentations. We learn a probabilistic model that describes the spatial density of object boundaries in the image. We demonstrate our approach using road scene data and show that our algorithm successfully exploits the spatial distribution of object boundaries to improve the superpixel segmentation. The third part of the thesis presents a globally optimal solution to our superpixel lattice problem in either the horizontal or vertical direction. The solution makes use of a Markov Random Field formulation where the label field is guaranteed to be a set of ordered layers. We introduce an iterative algorithm that uses this framework to learn colour distributions across an image in an unsupervised manner. We conclude that our approach achieves comparable or better performance than competing methods and that it confers several additional advantages
    corecore