    Bundle methods for regularized risk minimization with applications to robust learning

    Supervised learning in general and regularized risk minimization in particular is about solving optimization problem which is jointly defined by a performance measure and a set of labeled training examples. The outcome of learning, a model, is then used mainly for predicting the labels for unlabeled examples in the testing environment. In real-world scenarios: a typical learning process often involves solving a sequence of similar problems with different parameters before a final model is identified. For learning to be successful, the final model must be produced timely, and the model should be robust to (mild) irregularities in the testing environment. The purpose of this thesis is to investigate ways to speed up the learning process and improve the robustness of the learned model. We first develop a batch convex optimization solver specialized to the regularized risk minimization based on standard bundle methods. The solver inherits two main properties of the standard bundle methods. Firstly, it is capable of solving both differentiable and non-differentiable problems, hence its implementation can be reused for different tasks with minimal modification. Secondly, the optimization is easily amenable to parallel and distributed computation settings; this makes the solver highly scalable in the number of training examples. However, unlike the standard bundle methods, the solver does not have extra parameters which need careful tuning. Furthermore, we prove that the solver has faster convergence rate. In addition to that, the solver is very efficient in computing approximate regularization path and model selection. We also present a convex risk formulation for incorporating invariances and prior knowledge into the learning problem. This formulation generalizes many existing approaches for robust learning in the setting of insufficient or noisy training examples and covariate shift. Lastly, we extend a non-convex risk formulation for binary classification to structured prediction. Empirical results show that the model obtained with this risk formulation is robust to outliers in the training examples

    Optimization of Markov Random Fields in Computer Vision

    A large variety of computer vision tasks can be formulated using Markov Random Fields (MRF). Except in certain special cases, optimizing an MRF is intractable, due to a large number of variables and complex dependencies between them. In this thesis, we present new algorithms to perform inference in MRFs, that are either more efficient (in terms of running time and/or memory usage) or more effective (in terms of solution quality), than the state-of-the-art methods. First, we introduce a memory efficient max-flow algorithm for multi-label submodular MRFs. In fact, such MRFs have been shown to be optimally solvable using max-flow based on an encoding of the labels proposed by Ishikawa, in which each variable XiX_i is represented by \ell nodes (where \ell is the number of labels) arranged in a column. However, this method in general requires 222\,\ell^2 edges for each pair of neighbouring variables. This makes it inapplicable to realistic problems with many variables and labels, due to excessive memory requirement. By contrast, our max-flow algorithm stores 22\,\ell values per variable pair, requiring much less storage. Consequently, our algorithm makes it possible to optimally solve multi-label submodular problems involving large numbers of variables and labels on a standard computer. Next, we present a move-making style algorithm for multi-label MRFs with robust non-convex priors. In particular, our algorithm iteratively approximates the original MRF energy with an appropriately weighted surrogate energy that is easier to minimize. Furthermore, it guarantees that the original energy decreases at each iteration. To this end, we consider the scenario where the weighted surrogate energy is multi-label submodular (i.e., it can be optimally minimized by max-flow), and show that our algorithm then lets us handle of a large variety of non-convex priors. Finally, we consider the fully connected Conditional Random Field (dense CRF) with Gaussian pairwise potentials that has proven popular and effective for multi-class semantic segmentation. While the energy of a dense CRF can be minimized accurately using a Linear Programming (LP) relaxation, the state-of-the-art algorithm is too slow to be useful in practice. To alleviate this deficiency, we introduce an efficient LP minimization algorithm for dense CRFs. To this end, we develop a proximal minimization framework, where the dual of each proximal problem is optimized via block-coordinate descent. We show that each block of variables can be optimized in a time linear in the number of pixels and labels. Consequently, our algorithm enables efficient and effective optimization of dense CRFs with Gaussian pairwise potentials. We evaluated all our algorithms on standard energy minimization datasets consisting of computer vision problems, such as stereo, inpainting and semantic segmentation. The experiments at the end of each chapter provide compelling evidence that all our approaches are either more efficient or more effective than all existing baselines

    Holistic interpretation of visual data based on topology:semantic segmentation of architectural facades

    The work presented in this dissertation is a step towards effectively incorporating contextual knowledge in the task of semantic segmentation. To date, the use of context has been confined to the genre of the scene with a few exceptions in the field. Research has been directed towards enhancing appearance descriptors. While this is unarguably important, recent studies show that computer vision has reached a near-human level of performance in relying on these descriptors when objects have stable distinctive surface properties and in proper imaging conditions. When these conditions are not met, humans exploit their knowledge about the intrinsic geometric layout of the scene to make local decisions. Computer vision lags behind when it comes to this asset. For this reason, we aim to bridge the gap by presenting algorithms for semantic segmentation of building facades making use of scene topological aspects. We provide a classification scheme to carry out segmentation and recognition simultaneously.The algorithm is able to solve a single optimization function and yield a semantic interpretation of facades, relying on the modeling power of probabilistic graphs and efficient discrete combinatorial optimization tools. We tackle the same problem of semantic facade segmentation with the neural network approach.We attain accuracy figures that are on-par with the state-of-the-art in a fully automated pipeline.Starting from pixelwise classifications obtained via Convolutional Neural Networks (CNN). These are then structurally validated through a cascade of Restricted Boltzmann Machines (RBM) and Multi-Layer Perceptron (MLP) that regenerates the most likely layout. In the domain of architectural modeling, there is geometric multi-model fitting. We introduce a novel guided sampling algorithm based on Minimum Spanning Trees (MST), which surpasses other propagation techniques in terms of robustness to noise. We make a number of additional contributions such as measure of model deviation which captures variations among fitted models


    Volume 1 establishes the foundations of this new field. It goes through all the steps from data collection, their summary and clustering, to different aspects of resource-aware learning, i.e., hardware, memory, energy, and communication awareness. Machine learning methods are inspected with respect to resource requirements and how to enhance scalability on diverse computing architectures ranging from embedded systems to large computing clusters

    Efficient inference and learning in graphical models for multi-organ shape segmentation

    This thesis explores the use of discriminatively trained deformable contour models (DCMs) for shape-based segmentation in medical images. We make contributions in two fronts: in the learning problem, where the model is trained from a set of annotated images, and in the inference problem, whose aim is to segment an image given a model. We demonstrate the merit of our techniques in a large X-Ray image segmentation benchmark, where we obtain systematic improvements in accuracy and speedups over the current state-of-the-art. For learning, we formulate training the DCM scoring function as large-margin structured prediction and construct a training objective that aims at giving the highest score to the ground-truth contour configuration. We incorporate a loss function adapted to DCM-based structured prediction. In particular, we consider training with the Mean Contour Distance (MCD) performance measure. Using this loss function during training amounts to scoring each candidate contour according to its Mean Contour Distance to the ground truth configuration. Training DCMs using structured prediction with the standard zero-one loss already outperforms the current state-of-the-art method [Seghers et al. 2007] on the considered medical benchmark [Shiraishi et al. 2000, van Ginneken et al. 2006]. We demonstrate that training with the MCD structured loss further improves over the generic zero-one loss results by a statistically significant amount. For inference, we propose efficient solvers adapted to combinatorial problems with discretized spatial variables. Our contributions are three-fold:first, we consider inference for loopy graphical models, making no assumption about the underlying graph topology. We use an efficient decomposition-coordination algorithm to solve the resulting optimization problem: we decompose the model’s graph into a set of open, chain-structured graphs. We employ the Alternating Direction Method of Multipliers (ADMM) to fix the potential inconsistencies of the individual solutions. Even-though ADMMis an approximate inference scheme, we show empirically that our implementation delivers the exact solution for the considered examples. Second,we accelerate optimization of chain-structured graphical models by using the Hierarchical A∗ search algorithm of [Felzenszwalb & Mcallester 2007] couple dwith the pruning techniques developed in [Kokkinos 2011a]. We achieve a one order of magnitude speedup in average over the state-of-the-art technique based on Dynamic Programming (DP) coupled with Generalized DistanceTransforms (GDTs) [Felzenszwalb & Huttenlocher 2004]. Third, we incorporate the Hierarchical A∗ algorithm in the ADMM scheme to guarantee an efficient optimization of the underlying chain structured subproblems. The resulting algorithm is naturally adapted to solve the loss-augmented inference problem in structured prediction learning, and hence is used during training and inference. In Appendix A, we consider the case of 3D data and we develop an efficientmethod to find the mode of a 3D kernel density distribution. Our algorithm has guaranteed convergence to the global optimum, and scales logarithmically in the volume size by virtue of recursively subdividing the search space. We use this method to rapidly initialize 3D brain tumor segmentation where we demonstrate substantial acceleration with respect to a standard mean-shift implementation. In Appendix B, we describe in more details our extension of the Hierarchical A∗ search algorithm of [Felzenszwalb & Mcallester 2007] to inference on chain-structured graphs.Cette thèse explore l’utilisation des modèles de contours déformables pour la segmentation basée sur la forme des images médicales. Nous apportons des contributions sur deux fronts: dans le problème de l’apprentissage statistique, où le modèle est formé à partir d’un ensemble d’images annotées, et le problème de l’inférence, dont le but est de segmenter une image étant donnée un modèle. Nous démontrons le mérite de nos techniques sur une grande base d’images à rayons X, où nous obtenons des améliorations systématiques et des accélérations par rapport à la méthode de l’état de l’art. Concernant l’apprentissage, nous formulons la formation de la fonction de score des modèles de contours déformables en un problème de prédiction structurée à grande marge et construisons une fonction d’apprentissage qui vise à donner le plus haut score à la configuration vérité-terrain. Nous intégrons une fonction de perte adaptée à la prédiction structurée pour les modèles de contours déformables. En particulier, nous considérons l’apprentissage avec la mesure de performance consistant en la distance moyenne entre contours, comme une fonction de perte. L’utilisation de cette fonction de perte au cours de l’apprentissage revient à classer chaque contour candidat selon sa distance moyenne du contour vérité-terrain. Notre apprentissage des modèles de contours déformables en utilisant la prédiction structurée avec la fonction zéro-un de perte surpasse la méthode [Seghers et al. 2007] de référence sur la base d’images médicales considérée [Shiraishi et al. 2000, van Ginneken et al. 2006]. Nous démontrons que l’apprentissage avec la fonction de perte de distance moyenne entre contours améliore encore plus les résultats produits avec l’apprentissage utilisant la fonction zéro-un de perte et ce d’une quantité statistiquement significative.Concernant l’inférence, nous proposons des solveurs efficaces et adaptés aux problèmes combinatoires à variables spatiales discrétisées. Nos contributions sont triples: d’abord, nous considérons le problème d’inférence pour des modèles graphiques qui contiennent des boucles, ne faisant aucune hypothèse sur la topologie du graphe sous-jacent. Nous utilisons un algorithme de décomposition-coordination efficace pour résoudre le problème d’optimisation résultant: nous décomposons le graphe du modèle en un ensemble de sous-graphes en forme de chaines ouvertes. Nous employons la Méthode de direction alternée des multiplicateurs (ADMM) pour réparer les incohérences des solutions individuelles. Même si ADMM est une méthode d’inférence approximative, nous montrons empiriquement que notre implémentation fournit une solution exacte pour les exemples considérés. Deuxièmement, nous accélérons l’optimisation des modèles graphiques en forme de chaîne en utilisant l’algorithme de recherche hiérarchique A* [Felzenszwalb & Mcallester 2007] couplé avec les techniques d’élagage développés dans [Kokkinos 2011a]. Nous réalisons une accélération de 10 fois en moyenne par rapport à l’état de l’art qui est basé sur la programmation dynamique (DP) couplé avec les transformées de distances généralisées [Felzenszwalb & Huttenlocher 2004]. Troisièmement, nous intégrons A* dans le schéma d’ADMM pour garantir une optimisation efficace des sous-problèmes en forme de chaine. En outre, l’algorithme résultant est adapté pour résoudre les problèmes d’inférence augmentée par une fonction de perte qui se pose lors de l’apprentissage de prédiction des structure, et est donc utilisé lors de l’apprentissage et de l’inférence. [...


    Uncertainty in Artificial Intelligence: Proceedings of the Thirty-Fourth Conference

    Sequential decision making in artificial musical intelligence

    Over the past 60 years, artificial intelligence has grown from a largely academic field of research to a ubiquitous array of tools and approaches used in everyday technology. Despite its many recent successes and growing prevalence, certain meaningful facets of computational intelligence have not been as thoroughly explored. Such additional facets cover a wide array of complex mental tasks which humans carry out easily, yet are difficult for computers to mimic. A prime example of a domain in which human intelligence thrives, but machine understanding is still fairly limited, is music. Over the last decade, many researchers have applied computational tools to carry out tasks such as genre identification, music summarization, music database querying, and melodic segmentation. While these are all useful algorithmic solutions, we are still a long way from constructing complete music agents, able to mimic (at least partially) the complexity with which humans approach music. One key aspect which hasn't been sufficiently studied is that of sequential decision making in musical intelligence. This thesis strives to answer the following question: Can a sequential decision making perspective guide us in the creation of better music agents, and social agents in general? And if so, how? More specifically, this thesis focuses on two aspects of musical intelligence: music recommendation and human-agent (and more generally agent-agent) interaction in the context of music. The key contributions of this thesis are the design of better music playlist recommendation algorithms; the design of algorithms for tracking user preferences over time; new approaches for modeling people's behavior in situations that involve music; and the design of agents capable of meaningful interaction with humans and other agents in a setting where music plays a roll (either directly or indirectly). Though motivated primarily by music-related tasks, and focusing largely on people's musical preferences, this thesis also establishes that insights from music-specific case studies can also be applicable in other concrete social domains, such as different types of content recommendation. Showing the generality of insights from musical data in other contexts serves as evidence for the utility of music domains as testbeds for the development of general artificial intelligence techniques. Ultimately, this thesis demonstrates the overall usefulness of taking a sequential decision making approach in settings previously unexplored from this perspectiveComputer Science

    Scalable sequential alternating proximal methods for sparse structural SVMs and CRFs

    Structural Support Vector Machines (SSVMs) and Conditional Random Fields (CRFs) are popular discriminative methods used for classifying structured and complex objects like parse trees, image segments and part-of-speech tags. The datasets involved are very large dimensional, and the models designed using typical training algorithms for SSVMs and CRFs are non-sparse. This non-sparse nature of models results in slow inference. Thus, there is a need to devise new algorithms for sparse SSVM and CRF classifier design. Use of elastic net and L1-regularizer has already been explored for solving primal CRF and SSVM problems, respectively, to design sparse classifiers. In this work, we focus on dual elastic net regularized SSVM and CRF. By exploiting the weakly coupled structure of these convex programming problems, we propose a new sequential alternating proximal (SAP) algorithm to solve these dual problems. This algorithm works by sequentially visiting each training set example and solving a simple subproblem restricted to a small subset of variables associated with that example. Numerical experiments on various benchmark sequence labeling datasets demonstrate that the proposed algorithm scales well. Further, the classifiers designed are sparser than those designed by solving the respective primal problems and demonstrate comparable generalization performance. Thus, the proposed SAP algorithm is a useful alternative for sparse SSVM and CRF classifier design