3 research outputs found

    Graphical Models: Modeling, Optimization, and Hilbert Space Embedding

    No full text
    Over the past two decades graphical models have been widely used as a powerful tool for compactly representing distributions. On the other hand, kernel methods have also been used extensively to come up with rich representations. This thesis aims to combine graphical models with kernels to produce compact models with rich representational abilities. The following four areas are our focus. 1. Conditional random fields for multi-agent reinforcement learning. Conditional random fields (CRFs) are graphical models for modeling the probability of labels given the observations. They have traditionally assumed that, conditioned on the training data, the label sequences of different training examples are independent and identically distributed (iid). We extended the use of CRFs to a class of temporal learning algorithms, namely policy gradient reinforcement learning (RL). Now the labels are no longer iid. They are actions that update the environment and affect the next observation. From an RL point of view, CRFs provide a natural way to model joint actions in a decentralized Markov decision process. Using tree sampling for inference, our experiment shows the RL methods employing CRFs clearly outperform those which do not model the proper joint policy. 2. Bayesian online multi-label classification. Gaussian density filtering provides fast and effective inference for graphical models (Maybeck, 1982). Based on it, we propose a Bayesian online multi-label classification (BOMC) framework which learns a probabilistic model of the linear classifier. The training labels are incorporated to update the posterior of the classifiers via a graphical model similar to TrueSkill (Herbrich et al, 2007). Using samples from the posterior, we label the test data by maximizing the expected F1-score. In our experiments, BOMC delivers significantly higher macro-averaged F1-score than the state-of-the-art online maximum margin learners. 3. Hilbert space embedment of distributions. Graphical models are also an essential tool in kernel measures of independence for non-iid data. Traditional information theory often requires density estimation, which makes it unideal for statistical estimation. Motivated by the fact that distributions often appear in machine learning via expectations, we can characterize the distance between distributions in terms of distances between means, especially means in reproducing kernel Hilbert spaces which are called kernel embeddings. Under this framework, the undirected graphical models further allow us to factorize the kernel embeddings onto cliques, which yields efficient measures of independence for non-iid data (Zhang et al, 2009). 4. Optimization in maximum margin models for structured data. Maximum margin estimation for structured data is an important task where graphical models also play a key role. They are special cases of regularized risk minimization, for which bundle methods (BMRM, Teo et al, 2007) are a state-of-the-art general purpose solver. Smola et al (2007) proved that BMRM requires O(1/epsilon) iterations to converge to an epsilon accurate solution, and we further show that this rate hits the lower bound. Motivated by (Nesterov 2003, 2005), we utilized the composite structure of the objective function and devised an algorithm for the structured loss which converges to an epsilon accurate solution in O(1/sqrt{epsilon}) iterations

    Sequential Labeling with Structural SVM under Nondecomposable Losses

    Full text link
    © 2012 IEEE. Sequential labeling addresses the classification of sequential data, which are widespread in fields as diverse as computer vision, finance, and genomics. The model traditionally used for sequential labeling is the hidden Markov model (HMM), where the sequence of class labels to be predicted is encoded as a Markov chain. In recent years, HMMs have benefited from minimum-loss training approaches, such as the structural support vector machine (SSVM), which, in many cases, has reported higher classification accuracy. However, the loss functions available for training are restricted to decomposable cases, such as the 0-1 loss and the Hamming loss. In many practical cases, other loss functions, such as those based on the F1 measure, the precision/recall break-even point, and the average precision (AP), can describe desirable performance more effectively. For this reason, in this paper, we propose a training algorithm for SSVM that can minimize any loss based on the classification contingency table, and we present a training algorithm that minimizes an AP loss. Experimental results over a set of diverse and challenging data sets (TUM Kitchen, CMU Multimodal Activity, and Ozone Level Detection) show that the proposed training algorithms achieve significant improvements of the F1 measure and AP compared with the conventional SSVM, and their performance is in line with or above that of other state-of-the-art sequential labeling approaches

    Sequential Labeling With Structural SVM Under Nondecomposable Losses

    No full text
    corecore