32 research outputs found

    Screening Rules for Convex Problems

    Get PDF
    We propose a new framework for deriving screening rules for convex optimization problems. Our approach covers a large class of constrained and penalized optimization formulations, and works in two steps. First, given any approximate point, the structure of the objective function and the duality gap is used to gather information on the optimal solution. In the second step, this information is used to produce screening rules, i.e. safely identifying unimportant weight variables of the optimal solution. Our general framework leads to a large variety of useful existing as well as new screening rules for many applications. For example, we provide new screening rules for general simplex and L1L_1-constrained problems, Elastic Net, squared-loss Support Vector Machines, minimum enclosing ball, as well as structured norm regularized problems, such as group lasso

    Learning vector quantization for proximity data

    Get PDF
    Hofmann D. Learning vector quantization for proximity data. Bielefeld: Universität Bielefeld; 2016.Prototype-based classifiers such as learning vector quantization (LVQ) often display intuitive and flexible classification and learning rules. However, classical techniques are restricted to vectorial data only, and hence not suited for more complex data structures. Therefore, a few extensions of diverse LVQ variants to more general data which are characterized based on pairwise similarities or dissimilarities only have been proposed recently in the literature. In this contribution, we propose a novel extension of LVQ to similarity data which is based on the kernelization of an underlying probabilistic model: kernel robust soft LVQ (KRSLVQ). Relying on the notion of a pseudo-Euclidean embedding of proximity data, we put this specific approach as well as existing alternatives into a general framework which characterizes different fundamental possibilities how to extend LVQ towards proximity data: the main characteristics are given by the choice of the cost function, the interface to the data in terms of similarities or dissimilarities, and the way in which optimization takes place. In particular the latter strategy highlights the difference of popular kernel approaches versus so-called relational approaches. While KRSLVQ and alternatives lead to state of the art results, these extensions have two drawbacks as compared to their vectorial counterparts: (i) a quadratic training complexity is encountered due to the dependency of the methods on the full proximity matrix; (ii) prototypes are no longer given by vectors but they are represented in terms of an implicit linear combination of data, i.e. interpretability of the prototypes is lost. We investigate different techniques to deal with these challenges: We consider a speed-up of training by means of low rank approximations of the Gram matrix by its Nyström approximation. In benchmarks, this strategy is successful if the considered data are intrinsically low-dimensional. We propose a quick check to efficiently test this property prior to training. We extend KRSLVQ by sparse approximations of the prototypes: instead of the full coefficient vectors, few exemplars which represent the prototypes can be directly inspected by practitioners in the same way as data. We compare different paradigms based on which to infer a sparse approximation: sparsity priors while training, geometric approaches including orthogonal matching pursuit and core techniques, and heuristic approximations based on the coefficients or proximities. We demonstrate the performance of these LVQ techniques for benchmark data, reaching state of the art results. We discuss the behavior of the methods to enhance performance and interpretability as concerns quality, sparsity, and representativity, and we propose different measures how to quantitatively evaluate the performance of the approaches. We would like to point out that we had the possibility to present our findings in international publication organs including three journal articles [6, 9, 2], four conference papers [8, 5, 7, 1] and two workshop contributions [4, 3]. References [1] A. Gisbrecht, D. Hofmann, and B. Hammer. Discriminative dimensionality reduction mappings. Advances in Intelligent Data Analysis, 7619: 126–138, 2012. [2] B. Hammer, D. Hofmann, F.-M. Schleif, and X. Zhu. Learning vector quantization for (dis-)similarities. Neurocomputing, 131: 43–51, 2014. [3] D. Hofmann. Sparse approximations for kernel robust soft lvq. Mittweida Workshop on Computational Intelligence, 2013. [4] D. Hofmann, A. Gisbrecht, and B. Hammer. Discriminative probabilistic prototype based models in kernel space. New Challenges in Neural Computation, TR Machine Learning Reports, 2012. [5] D. Hofmann, A. Gisbrecht, and B. Hammer. Efficient approximations of kernel robust soft lvq. Workshop on Self-Organizing Maps, 198: 183–192, 2012. [6] D. Hofmann, A. Gisbrecht, and B. Hammer. Efficient approximations of robust soft learning vector quantization for non-vectorial data. Neurocomputing, 147: 96–106, 2015. [7] D. Hofmann and B. Hammer. Kernel robust soft learning vector quantization. Artificial Neural Networks in Pattern Recognition, 7477: 14–23, 2012. [8] D. Hofmann and B. Hammer. Sparse approximations for kernel learning vector quantization. European Symposium on Artificial Neural Networks, 549–554, 2013. [9] D. Hofmann, F.-M. Schleif, B. Paaßen, and B. Hammer. Learning interpretable kernelized prototype-based models. Neurocomputing, 141: 84–96, 2014

    Conditional Gradient Methods

    Full text link
    The purpose of this survey is to serve both as a gentle introduction and a coherent overview of state-of-the-art Frank--Wolfe algorithms, also called conditional gradient algorithms, for function minimization. These algorithms are especially useful in convex optimization when linear optimization is cheaper than projections. The selection of the material has been guided by the principle of highlighting crucial ideas as well as presenting new approaches that we believe might become important in the future, with ample citations even of old works imperative in the development of newer methods. Yet, our selection is sometimes biased, and need not reflect consensus of the research community, and we have certainly missed recent important contributions. After all the research area of Frank--Wolfe is very active, making it a moving target. We apologize sincerely in advance for any such distortions and we fully acknowledge: We stand on the shoulder of giants.Comment: 238 pages with many figures. The FrankWolfe.jl Julia package (https://github.com/ZIB-IOL/FrankWolfe.jl) providces state-of-the-art implementations of many Frank--Wolfe method

    Definition and learning of logic-based kernels for categorical data, and application to collaborative filtering

    Get PDF
    The continuous pursuit of better prediction quality has gradually led to the development of increasingly complex machine learning models, e.g., deep neural networks. Despite the great success in many domains, the black-box nature of these models makes them not suitable for applications in which the model understanding is at least as important as the prediction accuracy, such as medical applications. On the other hand, more interpretable models, as decision trees, are in general much less accurate. In this thesis, we try to merge the positive aspects of these two realities, by injecting interpretable elements inside complex methods. We focus on kernel methods which have an elegant framework that decouples learning algorithms from data representations. In particular, the first main contribution of this thesis is the proposal of a new family of Boolean kernels, i.e., kernels defined on binary data, with the aim of creating interpretable feature spaces. Assuming binary input vectors, the core idea is to build embedding spaces in which the dimensions represent logical formulas (of a specific form) of the input variables. As a result the solution of a kernel machine can be represented as a weighted sum of logical propositions, and this allows to extract from it human-readable rules. Our framework provides a constructive and efficient way to calculate Boolean kernels of different forms (e.g., disjunctive, conjunctive, DNF, CNF). We show that on binary classification tasks over categorical datasets the proposed kernels achieve state-of-the-art performances. We also provide some theoretical properties about the expressiveness of such kernels. The second main contribution consists in the development of a new multiple kernel learning algorithm to automatically learn the best representation (avoiding the validation). We start from a theoretical result which states that, under mild conditions, any dot-product kernel can be seen as a linear non-negative combination of Boolean conjunctive kernels. Then, from this combination, our MKL algorithm learns non-parametrically the best combination of the conjunctive kernels. This algorithm is designed to optimize the radius-margin ratio of the combined kernel, which has been demonstrated of being an upper bound of the Leave-One-Out error. An extensive empirical evaluation, on several binary classification tasks, shows how our MKL technique is able to outperform state-of-the-art MKL approaches. A third contribution is the proposal of another kernel family for binary input data, which aims to overcome the limitations of the Boolean kernels. In this case the focus is not exclusively on the interpretability, but also on the expressivity. With this new framework, that we dubbed propositional kernel framework, is possible to build kernel functions able to create feature spaces containing almost any kind of logical propositions. Finally, the last contribution is the application of the Boolean kernels to Recommender Systems, specifically, on top-N recommendation tasks. First of all, we propose a novel kernel-based collaborative filtering method and we apply on top of it our Boolean kernels. Empirical results on several collaborative filtering datasets show how less expressive kernels can alleviate the sparsity issue, which is peculiar in this kind of applications

    Discriminant feature extraction and selection for person-independent facial expression recognition

    Get PDF
    This thesis is to develop new facial expression recognition techniques based on 2D/3D images or videos, with the purpose to improve the recognition efficiency and accuracy of the current state-of-art. A fully automatic facial expression recognition system is designed, including real-time landmark detection, spatio-temporal feature extraction, hierarchical classification, and most discriminant facial regions identification for expression recognition. In general, the proposed system improved the facial expression recognition state-of-art

     Ocean Remote Sensing with Synthetic Aperture Radar

    Get PDF
    The ocean covers approximately 71% of the Earth’s surface, 90% of the biosphere and contains 97% of Earth’s water. The Synthetic Aperture Radar (SAR) can image the ocean surface in all weather conditions and day or night. SAR remote sensing on ocean and coastal monitoring has become a research hotspot in geoscience and remote sensing. This book—Progress in SAR Oceanography—provides an update of the current state of the science on ocean remote sensing with SAR. Overall, the book presents a variety of marine applications, such as, oceanic surface and internal waves, wind, bathymetry, oil spill, coastline and intertidal zone classification, ship and other man-made objects’ detection, as well as remotely sensed data assimilation. The book is aimed at a wide audience, ranging from graduate students, university teachers and working scientists to policy makers and managers. Efforts have been made to highlight general principles as well as the state-of-the-art technologies in the field of SAR Oceanography

    Quantum algorithms for machine learning and optimization

    Get PDF
    The theories of optimization and machine learning answer foundational questions in computer science and lead to new algorithms for practical applications. While these topics have been extensively studied in the context of classical computing, their quantum counterparts are far from well-understood. In this thesis, we explore algorithms that bridge the gap between the fields of quantum computing and machine learning. First, we consider general optimization problems with only function evaluations. For two core problems, namely general convex optimization and volume estimation of convex bodies, we give quantum algorithms as well as quantum lower bounds that constitute the quantum speedups of both problems to be polynomial compared to their classical counterparts. We then consider machine learning and optimization problems with input data stored explicitly as matrices. We first look at semidefinite programs and provide quantum algorithms with polynomial speedup compared to the classical state-of-the-art. We then move to machine learning and give the optimal quantum algorithms for linear and kernel-based classifications. To complement with our quantum algorithms, we also introduce a framework for quantum-inspired classical algorithms, showing that for low-rank matrix arithmetics there can only be polynomial quantum speedup. Finally, we study statistical problems on quantum computers, with the focus on testing properties of probability distributions. We show that for testing various properties including L1-distance, L2-distance, Shannon and Renyi entropies, etc., there are polynomial quantum speedups compared to their classical counterparts. We also extend these results to testing properties of quantum states
    corecore