362 research outputs found
From image co-segmentation to discrete optimization in computer vision - the exploration on graphical model, statistical physics, energy minimization, and integer programming
This dissertation aims to explore the ideas and frameworks for solving the discrete optimization problem in computer vision. Much of the work is inspired by the study of the image co-segmentation problem. It is through the research on this topic that the author has become very familiar with the graphical model and energy minimization point of view in handling computer vision problems - that is, how to combine the local information with the neighborhood interaction information in the graphical system for the inference; and also the author has come to the realization that many problems in and beyond computer vision can be solved in that way.
At the beginning of this dissertation, we first give a comprehensive background review on graphical model, energy minimization, integer programming, as well as all their connections with the fundamental statistical physics. We aim to review the various aspects of the concepts, models, algorithms, etc., in a systematic way and from a different perspective. For instance, we review the correspondences between the commonly used unary/binary energy objective terms in computer vision with those of the fundamental Ising model in statistical physics; and also we summarize several widely used discrete energy minimization algorithms in computer vision under a unified framework in statistical physics; in addition we stress the close connections between the graphical model energy minimization and the integer programming problems, and especially we point out the central role of Mixed-Integer Quadratic Programming in discrete optimization in and beyond computer vision.
Moreover, we explore the relationship between integer programming and energy minimization experimentally. We test integer programming methods on randomly generated energy formulations (as those would appear in computer vision problems), and similarly energy minimization methods on the integer programming problem of Graph K-coloring. Therefore we can easily compare the optimization performance of various methods (no matter whether they are designed for energy minimization or integer programming) on one platform. We come to the conclusion that sharing the methods across the fields (energy minimization in computer vision and integer programming in applied mathematics) is very helpful and beneficial.
Based on the statistical physics inspired energy minimization framework we obtained, we formulate the task of density based clustering into this formulation. Energy is defined in terms of inhomogeneity in local point density. A sequence of energy minima are found to recursively partition the points, and thus we find a hierarchical embedding of clusters that are increasingly homogeneous in density. Energy is expressed as the sum of a unary (data) term and a binary (smoothness) term. The only parameter required to be specified by the user is a homogeneity criterion - the degree of acceptable fluctuation in density within a cluster. Thus, we do not have to specify, for example, the number of clusters present. Disjoint clusters with the same density are identified separately. Experimental results show that our method is able to handle clusters of different shapes, sizes and densities. We present the performance of our approach using the energy optimization algorithms ICM, LBP, Graph-cut, and Mean field theory algorithm. We also show that the family of commonly used spectral, graph clustering algorithms (such as Normalized-cut) is a special case of our formulation, using only the binary energy term while ignoring the unary term.
After all the discussions above on the general framework for solving the discrete optimization problem in computer vision, the dissertation then focuses on the study of image co-segmentation, which is in fact carried out before the above topics. Image co-segmentation is the task of automatically discovering, locating and segmenting some unknown common object in a set of images. It has become a popular research topic in computer vision during recent years. The unsupervised nature is an important characteristic of the problem; i.e., the common object is a priori unknown. Moreover, the common object may be subject to viewpoint change, lighting condition change, occlusion, and deformation across the images; all these conditions make the co-segmentation task very challenging. In this part of the study we focus on the research of image co-segmentation and propose various approaches for addressing this problem.
Most existing co-segmentation methods focus on co-segmenting the images with a very dominant common object, where the background interference is very limited. Such images are not realistic for the co-segmentation task, since in practice we may always encounter images with very rich and complex content where the common object is not dominant and appears simultaneously along with a large number of other objects. In this work we aim to address the image co-segmentation problem on this kind of image that cannot be handled properly with many previous methods.
Two distinct approaches have been proposed in this work for image co-segmentation; the key difference lies in the method of common object discovery. The first approach is a "topology" based approach (also called a "point-region" approach) while the second one is a "sparse optimization" based approach. Specifically, in the first approach we combine the image key point features with the segment features together to discover the common object, while relying on the local topology consistency of both key point and segment layout for the robust recognition. The obtained initial foreground (the common object) in each image is refined through graphical model energy minimization based on a global appearance model extracted from the entire image dataset. The second approach is inspired by sparse optimization techniques; in this approach we use a sparse approximation scheme to find the optimal correspondence of the segments in two images as the initial estimation of the common object, based on some linear additive features extracted from the segments. In both proposed approaches, we emphasize the exploration of inter-image information in all steps of the algorithms; therefore, the common object need not to be dominant or salient in each individual image, as long as it is "common" across the image set.
Extensive experiments have been conducted in this study to validate the performance of the proposed approaches. We carry out experiments on the widely used benchmark datasets for image co-segmentation, including iCoseg dataset, the multi-view co-segmentation dataset, Oxford flower dataset and so forth. Besides the above datasets, in order to better evaluate the performance on the rich and complex images with non-dominant common object, we also propose a new dataset in this work called richCoseg. Experiments are also conducted on this new dataset and qualitative and quantitative comparisons with the recent methods are provided.
Finally, this dissertation also discusses very briefly some other vision problems the author has studied in previously published works
Hyperspectral image representation and processing with binary partition trees
The optimal exploitation of the information provided by hyperspectral images requires the development of advanced image processing tools. Therefore, under the title Hyperspectral image representation and Processing with Binary Partition Trees, this PhD thesis proposes the construction and the processing of a new region-based hierarchical hyperspectral image representation: the Binary Partition Tree (BPT). This hierarchical region-based representation can be interpreted as a set of hierarchical regions stored in a tree structure. Hence, the Binary Partition Tree succeeds in presenting: (i) the decomposition of the image in terms of coherent regions and (ii) the inclusion relations of the regions in the scene. Based on region-merging techniques, the construction of BPT is investigated in this work by studying hyperspectral region models and the associated similarity metrics. As a matter of fact, the very high dimensionality and the complexity of the data require the definition of specific region models and similarity measures. Once the BPT is constructed, the fixed tree structure allows implementing efficient and advanced application-dependent techniques on it. The application-dependent processing of BPT is generally implemented through a specific pruning of the tree. Accordingly, some pruning techniques are proposed and discussed
according to different applications. This Ph.D is focused in particular on segmentation, object detection and classification of hyperspectral imagery. Experimental results on various hyperspectral data sets demonstrate the interest and the good performances of the BPT representatio
Recommended from our members
From Fully-Supervised, Single-Task to Scarcely-Supervised, Multi-Task Deep Learning for Medical Image Analysis
Image analysis based on machine learning has gained prominence with the advent of deep learning, particularly in medical imaging. To be effective in addressing challenging image analysis tasks, however, conventional deep neural networks require large corpora of annotated training data, which are unfortunately scarce in the medical domain, thus often rendering fully-supervised learning strategies ineffective.This thesis devises for use in a variety of medical image analysis applications a series of novel deep learning methods, ranging from fully-supervised, single-task learning to scarcely-supervised, multi-task learning that makes efficient use of annotated training data. Specifically, its main contributions include (1) fully-supervised, single-task learning for the segmentation of pulmonary lobes from chest CT scans and the analysis of scoliosis from spine X-ray images; (2) supervised, single-task, domain-generalized pulmonary segmentation in chest X-ray images and retinal vasculature segmentation in fundoscopic images; (3) largely-unsupervised, multiple-task learning via deep generative modeling for the joint synthesis and classification of medical image data; and (4) partly-supervised, multiple-task learning for the combined segmentation and classification of chest and spine X-ray images
Going beyond semantic image segmentation, towards holistic scene understanding, with associative hierarchical random fields
In this thesis we exploit the generality and expressive power of the Associative Hierarchical
Random Field (AHRF) graphical model to take its use beyond that of semantic image segmentation,
into object-classes, towards a framework for holistic scene understanding. We provide a
working definition for the holistic approach to scene understanding, which allows for the integration
of existing, disparate, applications into an unifying ensemble. We believe that modelling
such an ensemble as an AHRF is both a principled and pragmatic solution. We present a hierarchy
that shows several methods for fusing applications together with the AHRF graphical model.
Each of the three; feature, potential and energy, layers subsumes its predecessor in generality
and together give rise to many options for integration. With applications on street scenes we
demonstrate an implementation of each layer. The first layer application joins appearance and
geometric features. For our second layer we implement a things and stuff co-junction using
higher order AHRF potentials for object detectors, with the goal of answering the classic questions:
What? Where? and How many? A holistic approach to recognition-and-reconstruction
is realised within our third layer by linking two energy based formulations of both applications.
Each application is evaluated qualitatively and quantitatively. In all cases our holistic approach
shows improvement over baseline methods
Hashing for Multimedia Similarity Modeling and Large-Scale Retrieval
In recent years, the amount of multimedia data such as images, texts, and videos have been growing rapidly on the Internet. Motivated by such trends, this thesis is dedicated to exploiting hashing-based solutions to reveal multimedia data correlations and support intra-media and inter-media similarity search among huge volumes of multimedia data. We start by investigating a hashing-based solution for audio-visual similarity modeling and apply it to the audio-visual sound source localization problem. We show that synchronized signals in audio and visual modalities demonstrate similar temporal changing patterns in certain feature spaces. We propose to use a permutation-based random hashing technique to capture the temporal order dynamics of audio and visual features by hashing them along the temporal axis into a common Hamming space. In this way, the audio-visual correlation problem is transformed into a similarity search problem in the Hamming space. Our hashing-based audio-visual similarity modeling has shown superior performances in the localization and segmentation of sounding objects in videos. The success of the permutation-based hashing method motivates us to generalize and formally define the supervised ranking-based hashing problem, and study its application to large-scale image retrieval. Specifically, we propose an effective supervised learning procedure to learn optimized ranking-based hash functions that can be used for large-scale similarity search. Compared with the randomized version, the optimized ranking-based hash codes are much more compact and discriminative. Moreover, it can be easily extended to kernel space to discover more complex ranking structures that cannot be revealed in linear subspaces. Experiments on large image datasets demonstrate the effectiveness of the proposed method for image retrieval. We further studied the ranking-based hashing method for the cross-media similarity search problem. Specifically, we propose two optimization methods to jointly learn two groups of linear subspaces, one for each media type, so that features\u27 ranking orders in different linear subspaces maximally preserve the cross-media similarities. Additionally, we develop this ranking-based hashing method in the cross-media context into a flexible hashing framework with a more general solution. We have demonstrated through extensive experiments on several real-world datasets that the proposed cross-media hashing method can achieve superior cross-media retrieval performances against several state-of-the-art algorithms. Lastly, to make better use of the supervisory label information, as well as to further improve the efficiency and accuracy of supervised hashing, we propose a novel multimedia discrete hashing framework that optimizes an instance-wise loss objective, as compared to the pairwise losses, using an efficient discrete optimization method. In addition, the proposed method decouples the binary codes learning and hash function learning into two separate stages, thus making the proposed method equally applicable for both single-media and cross-media search. Extensive experiments on both single-media and cross-media retrieval tasks demonstrate the effectiveness of the proposed method
- …