106,840 research outputs found

    Testing and Learning on Distributions with Symmetric Noise Invariance

    Full text link
    Kernel embeddings of distributions and the Maximum Mean Discrepancy (MMD), the resulting distance between distributions, are useful tools for fully nonparametric two-sample testing and learning on distributions. However, it is rarely that all possible differences between samples are of interest -- discovered differences can be due to different types of measurement noise, data collection artefacts or other irrelevant sources of variability. We propose distances between distributions which encode invariance to additive symmetric noise, aimed at testing whether the assumed true underlying processes differ. Moreover, we construct invariant features of distributions, leading to learning algorithms robust to the impairment of the input distributions with symmetric additive noise.Comment: 22 page

    Sticky Seeding in Discrete-Time Reversible-Threshold Networks

    Get PDF
    When nodes can repeatedly update their behavior (as in agent-based models from computational social science or repeated-game play settings) the problem of optimal network seeding becomes very complex. For a popular spreading-phenomena model of binary-behavior updating based on thresholds of adoption among neighbors, we consider several planning problems in the design of \textit{Sticky Interventions}: when adoption decisions are reversible, the planner aims to find a Seed Set where temporary intervention leads to long-term behavior change. We prove that completely converting a network at minimum cost is Ω(ln(OPT))\Omega(\ln (OPT) )-hard to approximate and that maximizing conversion subject to a budget is (11e)(1-\frac{1}{e})-hard to approximate. Optimization heuristics which rely on many objective function evaluations may still be practical, particularly in relatively-sparse networks: we prove that the long-term impact of a Seed Set can be evaluated in O(E2)O(|E|^2) operations. For a more descriptive model variant in which some neighbors may be more influential than others, we show that under integer edge weights from {0,1,2,...,k}\{0,1,2,...,k\} objective function evaluation requires only O(kE2)O(k|E|^2) operations. These operation bounds are based on improvements we give for bounds on time-steps-to-convergence under discrete-time reversible-threshold updates in networks.Comment: 19 pages, 2 figure

    Training Gaussian Mixture Models at Scale via Coresets

    Get PDF
    How can we train a statistical mixture model on a massive data set? In this work we show how to construct coresets for mixtures of Gaussians. A coreset is a weighted subset of the data, which guarantees that models fitting the coreset also provide a good fit for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size polynomial in dimension and the number of mixture components, while being independent of the data set size. Hence, one can harness computationally intensive algorithms to compute a good approximation on a significantly smaller data set. More importantly, such coresets can be efficiently constructed both in distributed and streaming settings and do not impose restrictions on the data generating process. Our results rely on a novel reduction of statistical estimation to problems in computational geometry and new combinatorial complexity results for mixtures of Gaussians. Empirical evaluation on several real-world datasets suggests that our coreset-based approach enables significant reduction in training-time with negligible approximation error

    Learning Sparse High Dimensional Filters: Image Filtering, Dense CRFs and Bilateral Neural Networks

    Full text link
    Bilateral filters have wide spread use due to their edge-preserving properties. The common use case is to manually choose a parametric filter type, usually a Gaussian filter. In this paper, we will generalize the parametrization and in particular derive a gradient descent algorithm so the filter parameters can be learned from data. This derivation allows to learn high dimensional linear filters that operate in sparsely populated feature spaces. We build on the permutohedral lattice construction for efficient filtering. The ability to learn more general forms of high-dimensional filters can be used in several diverse applications. First, we demonstrate the use in applications where single filter applications are desired for runtime reasons. Further, we show how this algorithm can be used to learn the pairwise potentials in densely connected conditional random fields and apply these to different image segmentation tasks. Finally, we introduce layers of bilateral filters in CNNs and propose bilateral neural networks for the use of high-dimensional sparse data. This view provides new ways to encode model structure into network architectures. A diverse set of experiments empirically validates the usage of general forms of filters

    One-class classifiers based on entropic spanning graphs

    Get PDF
    One-class classifiers offer valuable tools to assess the presence of outliers in data. In this paper, we propose a design methodology for one-class classifiers based on entropic spanning graphs. Our approach takes into account the possibility to process also non-numeric data by means of an embedding procedure. The spanning graph is learned on the embedded input data and the outcoming partition of vertices defines the classifier. The final partition is derived by exploiting a criterion based on mutual information minimization. Here, we compute the mutual information by using a convenient formulation provided in terms of the α\alpha-Jensen difference. Once training is completed, in order to associate a confidence level with the classifier decision, a graph-based fuzzy model is constructed. The fuzzification process is based only on topological information of the vertices of the entropic spanning graph. As such, the proposed one-class classifier is suitable also for data characterized by complex geometric structures. We provide experiments on well-known benchmarks containing both feature vectors and labeled graphs. In addition, we apply the method to the protein solubility recognition problem by considering several representations for the input samples. Experimental results demonstrate the effectiveness and versatility of the proposed method with respect to other state-of-the-art approaches.Comment: Extended and revised version of the paper "One-Class Classification Through Mutual Information Minimization" presented at the 2016 IEEE IJCNN, Vancouver, Canad

    Completeness results for quasi-categories of algebras, homotopy limits, and related general constructions

    Full text link
    Consider a diagram of quasi-categories that admit and functors that preserve limits or colimits of a fixed shape. We show that any weighted limit whose weight is a projective cofibrant simplicial functor is again a quasi-category admitting these (co)limits and that they are preserved by the functors in the limit cone. In particular, the Bousfield-Kan homotopy limit of a diagram of quasi-categories admit any limits or colimits existing in and preserved by the functors in that diagram. In previous work, we demonstrated that the quasi-category of algebras for a homotopy coherent monad could be described as a weighted limit with projective cofibrant weight, so these results immediately provide us with important (co)completeness results for quasi-categories of algebras. These generalise most of the classical categorical results, except for a well known theorem which shows that limits lift to the category of algebras for any monad, regardless of whether its functor part preserves those limits. The second half of this paper establishes this more general result in the quasi-categorical setting: showing that the monadic forgetful functor of the quasi-category of algebras for a homotopy coherent monad creates all limits that exist in the base quasi-category, without further assumption on the monad. This proof relies upon a more delicate and explicit analysis of the particular weight used to define quasi-categories of algebras.Comment: 33 pages; a sequel to arXiv:1306.5144 and arXiv:1310.8279; v3: final journal version with updated internal references to the new version of "Homotopy coherent adjunctions and the formal theory of monads

    Low Rank Approximation of Binary Matrices: Column Subset Selection and Generalizations

    Get PDF
    Low rank matrix approximation is an important tool in machine learning. Given a data matrix, low rank approximation helps to find factors, patterns and provides concise representations for the data. Research on low rank approximation usually focus on real matrices. However, in many applications data are binary (categorical) rather than continuous. This leads to the problem of low rank approximation of binary matrix. Here we are given a d×nd \times n binary matrix AA and a small integer kk. The goal is to find two binary matrices UU and VV of sizes d×kd \times k and k×nk \times n respectively, so that the Frobenius norm of AUVA - U V is minimized. There are two models of this problem, depending on the definition of the dot product of binary vectors: The GF(2)\mathrm{GF}(2) model and the Boolean semiring model. Unlike low rank approximation of real matrix which can be efficiently solved by Singular Value Decomposition, approximation of binary matrix is NPNP-hard even for k=1k=1. In this paper, we consider the problem of Column Subset Selection (CSS), in which one low rank matrix must be formed by kk columns of the data matrix. We characterize the approximation ratio of CSS for binary matrices. For GF(2)GF(2) model, we show the approximation ratio of CSS is bounded by k2+1+k2(2k1)\frac{k}{2}+1+\frac{k}{2(2^k-1)} and this bound is asymptotically tight. For Boolean model, it turns out that CSS is no longer sufficient to obtain a bound. We then develop a Generalized CSS (GCSS) procedure in which the columns of one low rank matrix are generated from Boolean formulas operating bitwise on columns of the data matrix. We show the approximation ratio of GCSS is bounded by 2k1+12^{k-1}+1, and the exponential dependency on kk is inherent.Comment: 38 page
    corecore