485 research outputs found

    Efficient Human Pose Estimation with Image-dependent Interactions

    Get PDF
    Human pose estimation from 2D images is one of the most challenging and computationally-demanding problems in computer vision. Standard models such as Pictorial Structures consider interactions between kinematically connected joints or limbs, leading to inference cost that is quadratic in the number of pixels. As a result, researchers and practitioners have restricted themselves to simple models which only measure the quality of limb-pair possibilities by their 2D geometric plausibility. In this talk, we propose novel methods which allow for efficient inference in richer models with data-dependent interactions. First, we introduce structured prediction cascades, a structured analog of binary cascaded classifiers, which learn to focus computational effort where it is needed, filtering out many states cheaply while ensuring the correct output is unfiltered. Second, we propose a way to decompose models of human pose with cyclic dependencies into a collection of tree models, and provide novel methods to impose model agreement. Finally, we develop a local linear approach that learns bases centered around modes in the training data, giving us image-dependent local models which are fast and accurate. These techniques allow for sparse and efficient inference on the order of minutes or seconds per image. As a result, we can afford to model pairwise interaction potentials much more richly with data-dependent features such as contour continuity, segmentation alignment, color consistency, optical flow and multiple modes. We show empirically that these richer models are worthwhile, obtaining significantly more accurate pose estimation on popular datasets

    Lifted Bayesian filtering in multi-entity systems

    Get PDF
    This thesis focuses on Bayesian filtering for systems that consist of multiple, interacting entites (e.g. agents or objects), which can naturally be described by Multiset Rewriting Systems (MRSs). The main insight is that the state space that is underling an MRS exhibits a certain symmetry, which can be exploited to increase inference efficiency. We provide an efficient, lifted filtering algorithm, which is able to achieve a factorial reduction in space and time complexity, compared to conventional, ground filtering.Diese Arbeit betrachtet Bayes'sche Filter in Systemen, die aus mehreren, interagierenden Entitäten (z.B. Agenten oder Objekten) bestehen. Die Systemdynamik solcher Systeme kann auf natürliche Art durch Multiset Rewriting Systems (MRS) spezifiziert werden. Die wesentliche Erkenntnis ist, dass der Zustandraum Symmetrien aufweist, die ausgenutzt werden können, um die Effizienz der Inferenz zu erhöhen. Wir führen einen effizienten, gelifteten Filter-Algorithmus ein, dessen Zeit- und Platzkomplexität gegenüber dem grundierten Algorithmus um einen faktoriellen Faktor reduziert ist

    Image partition and video segmentation using the Mumford-Shah functional

    Get PDF
    2010 - 2011The aim of this Thesis is to present an image partition and video segmentation procedure, based on the minimization of a modified version of Mumford-Shah functional. The Mumford-Shah functional used for image partition has been then extended to develop a video segmentation procedure. Differently by the image processing, in video analysis besides the usual spatial connectivity of pixels (or regions) on each single frame, we have a natural notion of “temporal” connectivity between pixels (or regions) on consecutive frames given by the optical flow. In this case, it makes sense to extend the tree data structure used to model a single image with a graph data structure that allows to handle a video sequence. The video segmentation procedure is based on minimization of a modified version of a Mumford-Shah functional. In particular the functional used for image partition allows to merge neighboring regions with similar color without considering their movement. Our idea has been to merge neighboring regions with similar color and similar optical flow vector. Also in this case the minimization of Mumford-Shah functional can be very complex if we consider each possible combination of the graph nodes. This computation becomes easy to do if we take into account a hierarchy of partitions constructed starting by the nodes of the graph.[edited by author]X n.s

    Learning Models for Discrete Optimization

    Get PDF
    We consider a class of optimization approaches that incorporate machine learning models into the algorithm structure. Our focus is on the algorithms that can learn the patterns in the search space in order to boost computational performance. The idea is to design optimization techniques that allow for computationally efficient tuning a priori. The final objective of this work is to provide efficient solvers that can be tuned for optimal performance in serial and parallel environments.This dissertation provides a novel machine learning model based on logistic regression and describes an implementation for scheduling problems. We incorporate the proposed learning model into a well-known optimization algorithm, tabu search, and demonstrate the potential of the underlying ideas. The dissertation also establishes a new framework for comparing optimization algorithms. This framework provides a comparison of algorithms that is statistically meaningful and intuitive. Using this framework, we demonstrate that the inclusion of the logistic regression model into the tabu search method provides significant boost of its performance. Finally, we study the parallel implementation of the algorithm and evaluate the algorithm performance when more connections between threads exist

    Combinatorial Solutions for Shape Optimization in Computer Vision

    Get PDF
    This thesis aims at solving so-called shape optimization problems, i.e. problems where the shape of some real-world entity is sought, by applying combinatorial algorithms. I present several advances in this field, all of them based on energy minimization. The addressed problems will become more intricate in the course of the thesis, starting from problems that are solved globally, then turning to problems where so far no global solutions are known. The first two chapters treat segmentation problems where the considered grouping criterion is directly derived from the image data. That is, the respective data terms do not involve any parameters to estimate. These problems will be solved globally. The first of these chapters treats the problem of unsupervised image segmentation where apart from the image there is no other user input. Here I will focus on a contour-based method and show how to integrate curvature regularity into a ratio-based optimization framework. The arising optimization problem is reduced to optimizing over the cycles in a product graph. This problem can be solved globally in polynomial, effectively linear time. As a consequence, the method does not depend on initialization and translational invariance is achieved. This is joint work with Daniel Cremers and Simon Masnou. I will then proceed to the integration of shape knowledge into the framework, while keeping translational invariance. This problem is again reduced to cycle-finding in a product graph. Being based on the alignment of shape points, the method actually uses a more sophisticated shape measure than most local approaches and still provides global optima. It readily extends to tracking problems and allows to solve some of them in real-time. I will present an extension to highly deformable shape models which can be included in the global optimization framework. This method simultaneously allows to decompose a shape into a set of deformable parts, based only on the input images. This is joint work with Daniel Cremers. In the second part segmentation is combined with so-called correspondence problems, i.e. the underlying grouping criterion is now based on correspondences that have to be inferred simultaneously. That is, in addition to inferring the shapes of objects, one now also tries to put into correspondence the points in several images. The arising problems become more intricate and are no longer optimized globally. This part is divided into two chapters. The first chapter treats the topic of real-time motion segmentation where objects are identified based on the observations that the respective points in the video will move coherently. Rather than pre-estimating motion, a single energy functional is minimized via alternating optimization. The main novelty lies in the real-time capability, which is achieved by exploiting a fast combinatorial segmentation algorithm. The results are furthermore improved by employing a probabilistic data term. This is joint work with Daniel Cremers. The final chapter presents a method for high resolution motion layer decomposition and was developed in combination with Daniel Cremers and Thomas Pock. Layer decomposition methods support the notion of a scene model, which allows to model occlusion and enforce temporal consistency. The contributions are twofold: from a practical point of view the proposed method allows to recover fine-detailed layer images by minimizing a single energy. This is achieved by integrating a super-resolution method into the layer decomposition framework. From a theoretical viewpoint the proposed method introduces layer-based regularity terms as well as a graph cut-based scheme to solve for the layer domains. The latter is combined with powerful continuous convex optimization techniques into an alternating minimization scheme. Lastly I want to mention that a significant part of this thesis is devoted to the recent trend of exploiting parallel architectures, in particular graphics cards: many combinatorial algorithms are easily parallelized. In Chapter 3 we will see a case where the standard algorithm is hard to parallelize, but easy for the respective problem instances

    Optimisation massivement multi-tâche sur grappes de calcul hétérogènes – Application aux problèmes de permutation

    Get PDF
    Branch-and-Bound (B&B) is a frequently used tree-search exploratory method for the exact resolution of combinatorial optimization problems (COPs). However, in practice, only small problem instances can be solved on a sequential computer, as B&B generates often generates a huge amount of subproblems to be evaluated. In order to solve large COPs, we revisit the design and implementation of massively parallel B&B on top of large heterogeneous clusters, integrating multi-core CPUs, many-core processors and GPUs.For the efficient storage and management of subproblems an original data structure (IVM) dedicated to permutation problems is used. Because of the highly irregular and unpredictable shape of the B&B tree, dynamic load balancing between parallel exploration processes is one of the main issues addressed in this thesis. Based on a compact encoding of the search space in the form of intervals, work stealing strategies for multi-core and GPU are proposed, as well as hierarchical approaches for load balancing in distributed memory multi-CPU/multi-GPU systems. Three permutation problems, the Flowshop Scheduling Problem (FSP), the Quadratic Assignment Problem (QAP) and the n-Queens puzzle problem are used as test-cases.The resolution, in 9 hours, of a FSP instance with an estimated sequential execution time of 22 years demonstrates the scalability of the proposed algorithms on a cluster composed of 36 GPUs.L'algorithme Branch-and-Bound (B&B) est une méthode de recherche arborescente fréquemment utilisé pour la résolution exacte de problèmes d'optimisation combinatoire (POC). Néanmoins, seules des petites instances peuvent être effectivement résolues sur une machine séquentielle, le nombre de sous-problèmes à évaluer étant souvent très grand. Visant la resolution de POC de grande taille, nous réexaminons la conception et l'implémentation d'algorithmes B&B massivement parallèles sur de larges plateformes hétérogènes de calcul, intégrant des processeurs multi-coeurs, many-cores et et processeurs graphiques (GPUs). Pour une représentation compacte en mémoire des sous-problèmes une structure de données originale (IVM), dédiée aux problèmes de permutation est utilisée. En raison de la forte irrégularité de l'arbre de recherche, l'équilibrage de charge dynamique entre processus d'exploration parallèles occupe une place centrale dans cette thèse. Basés sur un encodage compact de l'espace de recherche sous forme d'intervalles, des stratégies de vol de tâches sont proposées pour processeurs multi-core et GPU, ainsi une approche hiérarchique pour l'équilibrage de charge dans les systèmes multi-GPU et multi-CPU à mémoire distribuée. Trois problèmes d'optimisation définis sur l'ensemble des permutations, le problème d'ordonnancement Flow-Shop (FSP), d'affectation quadratique (QAP) et le problème des n-dames sont utilisés comme cas d'étude. La resolution en 9 heures d'une instance du FSP dont le temps de résolution séquentiel est estimé à 22 ans demontre la capacité de passage à l'échelle des algorithmes proposés sur une grappe de calcul composé de 36 GPUs

    Training issues and learning algorithms for feedforward and recurrent neural networks

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    Learning understandable classifier models.

    Get PDF
    The topic of this dissertation is the automation of the process of extracting understandable patterns and rules from data. An unprecedented amount of data is available to anyone with a computer connected to the Internet. The disciplines of Data Mining and Machine Learning have emerged over the last two decades to face this challenge. This has led to the development of many tools and methods. These tools often produce models that make very accurate predictions about previously unseen data. However, models built by the most accurate methods are usually hard to understand or interpret by humans. In consequence, they deliver only decisions, and are short of any explanations. Hence they do not directly lead to the acquisition of new knowledge. This dissertation contributes to bridging the gap between the accurate opaque models and those less accurate but more transparent for humans. This dissertation first defines the problem of learning from data. It surveys the state-of-the-art methods for supervised learning of both understandable and opaque models from data, as well as unsupervised methods that detect features present in the data. It describes popular methods of rule extraction from unintelligible models which rewrite them into an understandable form. Limitations of rule extraction are described. A novel definition of understandability which ties computational complexity and learning is provided to show that rule extraction is an NP-hard problem. Next, a discussion whether one can expect that even an accurate classifier has learned new knowledge. The survey ends with a presentation of two approaches to building of understandable classifiers. On the one hand, understandable models must be able to accurately describe relations in the data. On the other hand, often a description of the output of a system in terms of its input requires the introduction of intermediate concepts, called features. Therefore it is crucial to develop methods that describe the data with understandable features and are able to use those features to present the relation that describes the data. Novel contributions of this thesis follow the survey. Two families of rule extraction algorithms are considered. First, a method that can work with any opaque classifier is introduced. Artificial training patterns are generated in a mathematically sound way and used to train more accurate understandable models. Subsequently, two novel algorithms that require that the opaque model is a Neural Network are presented. They rely on access to the network\u27s weights and biases to induce rules encoded as Decision Diagrams. Finally, the topic of feature extraction is considered. The impact on imposing non-negativity constraints on the weights of a neural network is considered. It is proved that a three layer network with non-negative weights can shatter any given set of points and experiments are conducted to assess the accuracy and interpretability of such networks. Then, a novel path-following algorithm that finds robust sparse encodings of data is presented. In summary, this dissertation contributes to improved understandability of classifiers in several tangible and original ways. It introduces three distinct aspects of achieving this goal: infusion of additional patterns from the underlying pattern distribution into rule learners, the derivation of decision diagrams from neural networks, and achieving sparse coding with neural networks with non-negative weights
    corecore