1,696 research outputs found
Sequence learning using deep neural networks with flexibility and interpretability
Throughout this thesis, I investigate two long-standing yet rarely explored sequence learning challenges under the Probabilistic Graphical Models (PGMs) framework: learning multi-timescale representations on a single sequence and learning higher-order dynamics between multi-sequences. The first challenge is tackled with Hidden Markov Models (HMMs), a type of directed PGMs, under the reinforcement learning framework. I prove that the Semi-Markov Decision Problem (SMDP) formulated option framework [Sutton et al., 1999, Bacon et al., 2017, Zhang and Whiteson, 2019], one of the most promising Hierarchical Reinforcement Learning (HRL) frameworks, has a Markov Decision Problem (MDP) equivalence. Based on this equivalence, a simple yet effective Skill-Action (SA) architecture is proposed. Our empirical studies on challenging robot simulation environments demonstrate that SA significantly outperforms all baselines on both infinite horizon and transfer learning environments. Because of its exceptional scalability, SA gives rise to a large scale pre-training architecture in reinforcement learning. The second challenge is tackled with Markov Random Fields (MRFs), also known as undirected PGMs, under the supervised learning framework. I employ binary MRFs with weighted Lower Linear Envelope Potentials (LLEPs) to capture higher-order dependencies. I propose an exact inference algorithm under the graph-cuts framework and an efficient learning algorithm under the Latent Structural Support Vector Machines (LSSVMs) framework. In order to learn higher-order latent dynamics on time series, we layer multi-task recurrent neural networks (RNNs) on top of Markov random fields (MRFs). A sub-gradient algorithm is employed to perform end-to-end training. We conduct thorough empirical studies on three popular Chinese stock market indexes and the proposed method outperforms all baselines. To our best knowledge, the proposed technique is the first to investigate higher-order dynamics between stocks
A learning framework for higher-order consistency models in multi-class pixel labeling problems
Recently, higher-order Markov random field (MRF) models have been successfully
applied to problems in computer vision, especially scene understanding problems.
One successful higher-order MRF model for scene understanding is the consistency
model [Kohli and Kumar, 2010; Kohli et al., 2009] and earlier work by Ladicky et al.
[2009, 2013] which contain higher-order potentials composed of lower linear envelope
functions. In semantic image segmentation problems, which seek to identify the pixels of images with pre-defined labels of objects and backgrounds, this model encourages consistent label assignments over segmented regions of images. However, solving this MRF problem exactly is generally NP-hard; instead, efficient approximate inference algorithms are used. Furthermore, the lower linear envelope functions involve a number of parameters to learn. But, the typical cross-validation used for pairwise MRF models is not a practical method for estimating such a large number of parameters. Nevertheless, few works have proposed efficient learning methods to deal with the large number of parameters in these consistency models. In this thesis, we propose a unified inference and learning framework for the consistency model. We investigate various issues and present solutions for inference and learning with this higher-order MRF model as follows. First, we derive two variants of the consistency model for multi-class pixel labeling tasks. Our model defines an energy function scoring any given label assignments over an image. In order to perform Maximum a posteriori (MAP) inference in this model, we minimize the energy function using move-making algorithms in which the higher-order problems are transformed into tractable pairwise problems. Then, we employ a max-margin framework for learning optimal parameters. This learning framework provides a generalized approach for searching the large parameter space. Second, we propose a novel use of the Gaussian mixture model (GMM) for encoding consistency constraints over a large set of pixels. Here, we use various oversegmentation methods to define coherent regions for the consistency potentials. In general, Mean shift (MS) produces locally coherent regions, and GMM provides globally coherent regions, which do not need to be contiguous. Our model exploits both local and global information together and improves the labeling accuracy on real data sets. Accordingly, we use multiple higher-order terms associated with each over-segmentation method. Our learning framework allows us to deal with the large
number of parameters involved with multiple higher-order terms. Next, we explore a dual decomposition (DD) method for our multi-class consistency model. The dual decomposition MRF (DD-MRF) is an alternative method for optimizing the energy function. In dual decomposition, a complex MRF problem is decomposed into many easy subproblems and we optimize the relaxed dual problem
using a projected subgradient method. At convergence, we expect a global optimum
in the dual space because it is a concave maximization problem. To optimize our
higher-order DD-MRF exactly, we propose an exact minimization algorithm for solving
the higher-order subproblems. Moreover, the minimization algorithm is much more efficient than graph-cuts. The dual decomposition approach also solves the
max-margin learning problem by minimizing the dual losses derived from DD-MRF.
Here, our minimization algorithm allows us to optimize the DD learning exactly and
efficiently, which in most cases finds better parameters than the previous learning
approach. Last, we focus on improving labeling accuracies of our higher-order model by combining mid-level features, which we call region features. The region features
help customize the general envelope functions for individual segmented regions.
By assigning specified weights to the envelope functions, we can choose subsets of
highly likely labels for each segmented region. We train multiple classifiers with
region features and aggregate them to increase prediction performance of possible
labels for each region. Importantly, introducing these region features does not change the previous inference and learning algorithms
Occlusion-Aware Object Localization, Segmentation and Pose Estimation
We present a learning approach for localization and segmentation of objects
in an image in a manner that is robust to partial occlusion. Our algorithm
produces a bounding box around the full extent of the object and labels pixels
in the interior that belong to the object. Like existing segmentation aware
detection approaches, we learn an appearance model of the object and consider
regions that do not fit this model as potential occlusions. However, in
addition to the established use of pairwise potentials for encouraging local
consistency, we use higher order potentials which capture information at the
level of im- age segments. We also propose an efficient loss function that
targets both localization and segmentation performance. Our algorithm achieves
13.52% segmentation error and 0.81 area under the false-positive per image vs.
recall curve on average over the challenging CMU Kitchen Occlusion Dataset.
This is a 42.44% decrease in segmentation error and a 16.13% increase in
localization performance compared to the state-of-the-art. Finally, we show
that the visibility labelling produced by our algorithm can make full 3D pose
estimation from a single image robust to occlusion.Comment: British Machine Vision Conference 2015 (poster
Automatic Synchronization of Multi-User Photo Galleries
In this paper we address the issue of photo galleries synchronization, where
pictures related to the same event are collected by different users. Existing
solutions to address the problem are usually based on unrealistic assumptions,
like time consistency across photo galleries, and often heavily rely on
heuristics, limiting therefore the applicability to real-world scenarios. We
propose a solution that achieves better generalization performance for the
synchronization task compared to the available literature. The method is
characterized by three stages: at first, deep convolutional neural network
features are used to assess the visual similarity among the photos; then, pairs
of similar photos are detected across different galleries and used to construct
a graph; eventually, a probabilistic graphical model is used to estimate the
temporal offset of each pair of galleries, by traversing the minimum spanning
tree extracted from this graph. The experimental evaluation is conducted on
four publicly available datasets covering different types of events,
demonstrating the strength of our proposed method. A thorough discussion of the
obtained results is provided for a critical assessment of the quality in
synchronization.Comment: ACCEPTED to IEEE Transactions on Multimedi
Statistical Approaches to Inferring Object Shape from Single Images
Depth inference is a fundamental problem of computer vision with a broad range of potential applications. Monocular depth inference techniques, particularly shape from shading dates back to as early as the 40's when it was first used to study the shape of the lunar surface. Since then there has been ample research to develop depth inference algorithms using monocular cues. Most of these are based on physical models of image formation and rely on a number of simplifying assumptions that do not hold for real world and natural imagery. Very few make use of the rich statistical information contained in real world images and their 3D information. There have been a few notable exceptions though. The study of statistics of natural scenes has been concentrated on outdoor scenes which are cluttered. Statistics of scenes of single objects has been less studied, but is an essential part of daily human interaction with the environment. Inferring shape of single objects is a very important computer vision problem which has captured the interest of many researchers over the past few decades and has applications in object recognition, robotic grasping, fault detection and Content Based Image Retrieval (CBIR). This thesis focuses on studying the statistical properties of single objects and their range images which can benefit shape inference techniques. I acquired two databases: Single Object Range and HDR (SORH) and the Eton Myers Database of single objects, including laser-acquired depth, binocular stereo, photometric stereo and High Dynamic Range (HDR) photography. I took a data driven approach and studied the statistics of color and range images of real scenes of single objects along with whole 3D objects and uncovered some interesting trends in the data. The fractal structure of natural images was previously well known, and thought to be a universal property. However, my research showed that the fractal structure of single objects and surfaces is governed by a wholly different set of rules. Classical computer vision problems of binocular and multi-view stereo, photometric stereo, shape from shading, structure from motion, and others, all rely on accurate and complete models of which 3D shapes and textures are plausible in nature, to avoid producing unlikely outputs. Bayesian approaches are common for these problems, and hopefully the findings on the statistics of the shape of single objects from this work and others will both inform new and more accurate Bayesian priors on shape, and also enable more efficient probabilistic inference procedures
- …