347 research outputs found

    Closed-Form Approximate CRF Training for Scalable Image Segmentation

    Get PDF
    We present LS-CRF, a new method for training cyclic Conditional Random Fields (CRFs) from large datasets that is inspired by classical closed-form expressions for the maximum likelihood parameters of a generative graphical model with tree topology. Training a CRF with LS-CRF requires only solving a set of independent regression problems, each of which can be solved efficiently in closed form or by an iterative solver. This makes LS-CRF orders of magnitude faster than classical CRF training based on probabilistic inference, and at the same time more flexible and easier to implement than other approximate techniques, such as pseudolikelihood or piecewise training. We apply LS-CRF to the task of semantic image segmentation, showing that it achieves on par accuracy to other training techniques at higher speed, thereby allowing efficient CRF training from very large training sets. For example, training a linearly parameterized pairwise CRF on 150,000 images requires less than one hour on a modern workstation

    Toward more scalable structured models

    Get PDF
    While deep learning has achieved huge success across different disciplines from computer vision and natural language processing to computational biology and physical sciences, training such models is known to require significant amounts of data. One possible reason is that the structural properties of the data and problem are not modeled explicitly. Effectively exploiting the structure can help build more efficient and performing models. The complexity of the structure requires models with enough representation capabilities. However, increased structured model complexity usually leads to increased inference complexity and trickier learning procedures. Also, making progress on real-world applications requires learning paradigms that circumvent the limitation of evaluating the partition function and scale to high-dimensional datasets. In this dissertation, we develop more scalable structured models, i.e., models with inference procedures that can handle complex dependencies between variables efficiently, and learning algorithms that operate in high-dimensional spaces. First, we extend Gaussian conditional random fields, traditionally unimodal and only capturing pairwise variables interactions, to model multi-modal distributions with high-order dependencies between the output space variables, while enabling exact inference and incorporating external constraints at runtime. We show compelling results on the task of diverse gray-image colorization. Then, we introduce a reinforcement learning-based method for solving inference in models with general higher-order potentials, that are intractable with traditional techniques. We show promising results on semantic segmentation. Finally, we propose a new loss, max-sliced score matching (MSSM), for learning structured models at scale. We assess our model on an estimation of densities and scores for implicit distributions in Variational and Wasserstein auto-encoders

    Multi-Modal Learning For Adaptive Scene Understanding

    Get PDF
    Modern robotics systems typically possess sensors of different modalities. Segmenting scenes observed by the robot into a discrete set of classes is a central requirement for autonomy. Equally, when a robot navigates through an unknown environment, it is often necessary to adjust the parameters of the scene segmentation model to maintain the same level of accuracy in changing situations. This thesis explores efficient means of adaptive semantic scene segmentation in an online setting with the use of multiple sensor modalities. First, we devise a novel conditional random field(CRF) inference method for scene segmentation that incorporates global constraints, enforcing particular sets of nodes to be assigned the same class label. To do this efficiently, the CRF is formulated as a relaxed quadratic program whose maximum a posteriori(MAP) solution is found using a gradient-based optimization approach. These global constraints are useful, since they can encode "a priori" information about the final labeling. This new formulation also reduces the dimensionality of the original image-labeling problem. The proposed model is employed in an urban street scene understanding task. Camera data is used for the CRF based semantic segmentation while global constraints are derived from 3D laser point clouds. Second, an approach to learn CRF parameters without the need for manually labeled training data is proposed. The model parameters are estimated by optimizing a novel loss function using self supervised reference labels, obtained based on the information from camera and laser with minimum amount of human supervision. Third, an approach that can conduct the parameter optimization while increasing the model robustness to non-stationary data distributions in the long trajectories is proposed. We adopted stochastic gradient descent to achieve this goal by using a learning rate that can appropriately grow or diminish to gain adaptability to changes in the data distribution

    Semantic image understanding: from pixel to word

    Get PDF
    The aim of semantic image understanding is to reveal the semantic meaning behind the image pixel. This thesis investigates problems related to semantic image understanding, and have made the following contributions. Our first contribution is to propose the usage of histogram matching in Multiple Kernel Learning. We treat the two-dimensional kernel matrix as an image and transfer the histogram matching algorithm in image processing to kernel matrix. Experiments on various computer vision and machine learning datasets have shown that our method can always boost the performance of state of the art MKL methods. Our second contribution is to advocate the segment-then-recognize strategy in pixel-level semantic image understanding. We have developed a new framework which tries to integrate semantic segmentation with low-level segmentation for proposing object consistent regions. We have also developed a novel method trying to integrate semantic segmentation with interactive segmentation. We found this segment-then-recognize strategy also works well on medical image data, where we designed a novel polar space random field model for proposing gland-like regions. In the realm of image-level semantic image understanding, our contribution is a novel way to utilize the random forest. Most of the previous works utilizing random forest store the posterior probabilities at each leaf node, and each random tree in the random forest is considered to be independent from each other. In contrast, we store the training samples instead of the posterior probabilities at each leaf node. We consider the random forest as a whole and propose the concept of semantic nearest neighbor and semantic similarity measure. Based on these two concepts, we devise novel methods for image annotation and image retrieval tasks
    • …
    corecore