Search CORE

29,950 research outputs found

Exploiting and improving tree-structured graphical models

Author: Choi Myung Jin, Ph. D. Massachusetts Institute of Technology
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 169-179).Probabilistic models commonly assume that variables are independent of each other conditioned on a subset of other variables. Graphical models provide a powerful framework for encoding such conditional independence structure of a large collection of random variables. A special class of graphical models with significant theoretical and practical importance is the class of tree-structured graphical models. Tree models have several advantages: they can be easily learned given data, their structures are often intuitive, and inference in tree models is highly efficient. However, tree models make strong conditional independence assumptions, which limit their modeling power substantially. This thesis exploits the advantages of tree-structured graphical models and considers modifications to overcome their limitations. To improve the modeling accuracy of tree models, we consider latent trees in which variables at some nodes represent the original (observed) variables of interest while others represent the latent variables added during the learning procedure. The appeal of such models is clear: the additional latent variables significantly increase the modeling power, and inference on trees is scalable with or without latent variables. We propose two computationally efficient and statistically consistent algorithms for learning latent trees, and compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree models. We exploit the advantages of tree models in the application of modeling contextual information of an image. Object co-occurrences and spatial relationships can be important cues in recognizing and localizing object instances. We develop tree-based context models and demonstrate that its simplicity enables us to integrate many sources of contextual information efficiently. In addition to object recognition, we are interested in using context models to detect objects that are out of their normal context. This task requires precise and careful modeling of object relationships, so we use a latent tree for object co-occurrences. Many of the latent variables can be interpreted as scene categories, capturing higher-order dependencies among object categories. Tree-structured graphical models have been widely used in multi-resolution (MR) modeling. In the last part of the thesis, we move beyond trees, and propose a new modeling framework that allows additional dependency structure at each scale of an MR tree model. We mainly focus on MR models with jointly Gaussian variables, and assume that variables at each scale have sparse covariance structure (as opposed to fully-uncorrelated structure in MR trees) conditioned on variables at other scales. We develop efficient inference algorithms that are partly based on inference on the embedded MR tree and partly based on local filtering at each scale. In addition, we present methods for learning such models given data at the finest scale by formulating a convex optimization problem.by Myung Jin Choi.Ph.D

Recognizing point clouds using conditional random fields

Author: Dellen Babette
Husain Syed Farzad
Torras Carme
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Detecting objects in cluttered scenes is a necessary step for many robotic tasks and facilitates the interaction of the robot with its environment. Because of the availability of efficient 3D sensing devices as the Kinect, methods for the recognition of objects in 3D point clouds have gained importance during the last years. In this paper, we propose a new supervised learning approach for the recognition of objects from 3D point clouds using Conditional Random Fields, a type of discriminative, undirected probabilistic graphical model. The various features and contextual relations of the objects are described by the potential functions in the graph. Our method allows for learning and inference from unorganized point clouds of arbitrary sizes and shows significant benefit in terms of computational speed during prediction when compared to a state-of-the-art approach based on constrained optimization.Peer ReviewedPostprint (author’s final draft

CiteSeerX

Blending Learning and Inference in Structured Prediction

Author: Hazan Tamir
McAllester David
Schwing Alexander
Urtasun Raquel
Publication venue
Publication date: 30/08/2013
Field of study

In this paper we derive an efficient algorithm to learn the parameters of structured predictors in general graphical models. This algorithm blends the learning and inference tasks, which results in a significant speedup over traditional approaches, such as conditional random fields and structured support vector machines. For this purpose we utilize the structures of the predictors to describe a low dimensional structured prediction task which encourages local consistencies within the different structures while learning the parameters of the model. Convexity of the learning task provides the means to enforce the consistencies between the different parts. The inference-learning blending algorithm that we propose is guaranteed to converge to the optimum of the low dimensional primal and dual programs. Unlike many of the existing approaches, the inference-learning blending allows us to learn efficiently high-order graphical models, over regions of any size, and very large number of parameters. We demonstrate the effectiveness of our approach, while presenting state-of-the-art results in stereo estimation, semantic segmentation, shape reconstruction, and indoor scene understanding

arXiv.org e-Print Archive

CiteSeerX

Learning Deep Structured Models

Author: Chen Liang-Chieh
Schwing Alexander G.
Urtasun Raquel
Yuille Alan L.
Publication venue
Publication date: 27/04/2015
Field of study

Many problems in real-world applications involve predicting several random variables which are statistically related. Markov random fields (MRFs) are a great mathematical tool to encode such relationships. The goal of this paper is to combine MRFs with deep learning algorithms to estimate complex representations while taking into account the dependencies between the output random variables. Towards this goal, we propose a training algorithm that is able to learn structured models jointly with deep features that form the MRF potentials. Our approach is efficient as it blends learning and inference and makes use of GPU acceleration. We demonstrate the effectiveness of our algorithm in the tasks of predicting words from noisy images, as well as multi-class classification of Flickr photographs. We show that joint learning of the deep features and the MRF parameters results in significant performance gains.Comment: 11 pages including referenc

arXiv.org e-Print Archive

CiteSeerX

Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models

Author: Anandkumar Anima
Nimmagadda Tejaswi
Publication venue
Publication date: 01/01/2015
Field of study

Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which leverages the strengths of multiple machine learning methods, viz deep learning, probabilistic models and kernel methods to obtain state-of-art performance on Microsoft COCO, consisting of non-iconic images. We incorporate contextual information in natural images through a conditional latent tree probabilistic model (CLTM), where the object co-occurrences are conditioned on the extracted fc7 features from pre-trained Imagenet CNN as input. We learn the CLTM tree structure using conditional pairwise probabilities for object co-occurrences, estimated through kernel methods, and we learn its node and edge potentials by training a new 3-layer neural network, which takes fc7 features as input. Object classification is carried out via inference on the learnt conditional tree model, and we obtain significant gain in precision-recall and F-measures on MS-COCO, especially for difficult object categories. Moreover, the latent variables in the CLTM capture scene information: the images with top activations for a latent node have common themes such as being a grasslands or a food scene, and on on. In addition, we show that a simple k-means clustering of the inferred latent nodes alone significantly improves scene classification performance on the MIT-Indoor dataset, without the need for any retraining, and without using scene labels during training. Thus, we present a unified framework for multi-object classification and unsupervised scene understanding

arXiv.org e-Print Archive

eScholarship - University of California

Caltech Authors

Probabilistic Label Relation Graphs with Ising Models

Author: Deng Jia
Ding Nan
Murphy Kevin
Neven Hartmut
Publication venue
Publication date: 22/12/2015
Field of study

We consider classification problems in which the label space has structure. A common example is hierarchical label spaces, corresponding to the case where one label subsumes another (e.g., animal subsumes dog). But labels can also be mutually exclusive (e.g., dog vs cat) or unrelated (e.g., furry, carnivore). To jointly model hierarchy and exclusion relations, the notion of a HEX (hierarchy and exclusion) graph was introduced in [7]. This combined a conditional random field (CRF) with a deep neural network (DNN), resulting in state of the art results when applied to visual object classification problems where the training labels were drawn from different levels of the ImageNet hierarchy (e.g., an image might be labeled with the basic level category "dog", rather than the more specific label "husky"). In this paper, we extend the HEX model to allow for soft or probabilistic relations between labels, which is useful when there is uncertainty about the relationship between two labels (e.g., an antelope is "sort of" furry, but not to the same degree as a grizzly bear). We call our new model pHEX, for probabilistic HEX. We show that the pHEX graph can be converted to an Ising model, which allows us to use existing off-the-shelf inference methods (in contrast to the HEX method, which needed specialized inference algorithms). Experimental results show significant improvements in a number of large-scale visual object classification tasks, outperforming the previous HEX model.Comment: International Conference on Computer Vision (2015

arXiv.org e-Print Archive

SEGCloud: Semantic Segmentation of 3D Point Clouds

Author: Armeni Iro
Choy Christopher B.
Gwak JunYoung
Savarese Silvio
Tchapmi Lyne P.
Publication venue
Publication date: 20/10/2017
Field of study

3D semantic scene labeling is fundamental to agents operating in the real world. In particular, labeling raw 3D point sets from sensors provides fine-grained semantics. Recent works leverage the capabilities of Neural Networks (NNs), but are limited to coarse voxel predictions and do not explicitly enforce global consistency. We present SEGCloud, an end-to-end framework to obtain 3D point-level segmentation that combines the advantages of NNs, trilinear interpolation(TI) and fully connected Conditional Random Fields (FC-CRF). Coarse voxel predictions from a 3D Fully Convolutional NN are transferred back to the raw 3D points via trilinear interpolation. Then the FC-CRF enforces global consistency and provides fine-grained semantics on the points. We implement the latter as a differentiable Recurrent NN to allow joint optimization. We evaluate the framework on two indoor and two outdoor 3D datasets (NYU V2, S3DIS, KITTI, Semantic3D.net), and show performance comparable or superior to the state-of-the-art on all datasets.Comment: Accepted as a spotlight at the International Conference of 3D Vision (3DV 2017

arXiv.org e-Print Archive