366 research outputs found

    Robust Sparse Coding via Self-Paced Learning

    Full text link
    Sparse coding (SC) is attracting more and more attention due to its comprehensive theoretical studies and its excellent performance in many signal processing applications. However, most existing sparse coding algorithms are nonconvex and are thus prone to becoming stuck into bad local minima, especially when there are outliers and noisy data. To enhance the learning robustness, in this paper, we propose a unified framework named Self-Paced Sparse Coding (SPSC), which gradually include matrix elements into SC learning from easy to complex. We also generalize the self-paced learning schema into different levels of dynamic selection on samples, features and elements respectively. Experimental results on real-world data demonstrate the efficacy of the proposed algorithms.Comment: submitted to AAAI201

    Combinatorial rigidity of Incidence systems and Application to Dictionary learning

    Full text link
    Given a hypergraph HH with mm hyperedges and a set QQ of mm \emph{pinning subspaces}, i.e.\ globally fixed subspaces in Euclidean space Rd\mathbb{R}^d, a \emph{pinned subspace-incidence system} is the pair (H,Q)(H, Q), with the constraint that each pinning subspace in QQ is contained in the subspace spanned by the point realizations in Rd\mathbb{R}^d of vertices of the corresponding hyperedge of HH. This paper provides a combinatorial characterization of pinned subspace-incidence systems that are \emph{minimally rigid}, i.e.\ those systems that are guaranteed to generically yield a locally unique realization. Pinned subspace-incidence systems have applications in the \emph{Dictionary Learning (aka sparse coding)} problem, i.e.\ the problem of obtaining a sparse representation of a given set of data vectors by learning \emph{dictionary vectors} upon which the data vectors can be written as sparse linear combinations. Viewing the dictionary vectors from a geometry perspective as the spanning set of a subspace arrangement, the result gives a tight bound on the number of dictionary vectors for sufficiently randomly chosen data vectors, and gives a way of constructing a dictionary that meets the bound. For less stringent restrictions on data, but a natural modification of the dictionary learning problem, a further dictionary learning algorithm is provided. Although there are recent rigidity based approaches for low rank matrix completion, we are unaware of prior application of combinatorial rigidity techniques in the setting of Dictionary Learning. We also provide a systematic classification of problems related to dictionary learning together with various algorithms, their assumptions and performance.Comment: arXiv admin note: text overlap with arXiv:1503.01837, arXiv:1402.734

    SPRIGHT: A Fast and Robust Framework for Sparse Walsh-Hadamard Transform

    Full text link
    We consider the problem of computing the Walsh-Hadamard Transform (WHT) of some NN-length input vector in the presence of noise, where the NN-point Walsh spectrum is KK-sparse with K=O(Nδ)K = {O}(N^{\delta}) scaling sub-linearly in the input dimension NN for some 0<δ<10<\delta<1. Over the past decade, there has been a resurgence in research related to the computation of Discrete Fourier Transform (DFT) for some length-NN input signal that has a KK-sparse Fourier spectrum. In particular, through a sparse-graph code design, our earlier work on the Fast Fourier Aliasing-based Sparse Transform (FFAST) algorithm computes the KK-sparse DFT in time O(KlogK){O}(K\log K) by taking O(K){O}(K) noiseless samples. Inspired by the coding-theoretic design framework, Scheibler et al. proposed the Sparse Fast Hadamard Transform (SparseFHT) algorithm that elegantly computes the KK-sparse WHT in the absence of noise using O(KlogN){O}(K\log N) samples in time O(Klog2N){O}(K\log^2 N). However, the SparseFHT algorithm explicitly exploits the noiseless nature of the problem, and is not equipped to deal with scenarios where the observations are corrupted by noise. Therefore, a question of critical interest is whether this coding-theoretic framework can be made robust to noise. Further, if the answer is yes, what is the extra price that needs to be paid for being robust to noise? In this paper, we show, quite interestingly, that there is {\it no extra price} that needs to be paid for being robust to noise other than a constant factor. In other words, we can maintain the same sample complexity O(KlogN){O}(K\log N) and the computational complexity O(Klog2N){O}(K\log^2 N) as those of the noiseless case, using our SParse Robust Iterative Graph-based Hadamard Transform (SPRIGHT) algorithm.Comment: Part of our results was reported in ISIT 2014, titled "The SPRIGHT algorithm for robust sparse Hadamard Transforms.

    Tight Hardness for Shortest Cycles and Paths in Sparse Graphs

    Full text link
    Fine-grained reductions have established equivalences between many core problems with O~(n3)\tilde{O}(n^3)-time algorithms on nn-node weighted graphs, such as Shortest Cycle, All-Pairs Shortest Paths (APSP), Radius, Replacement Paths, Second Shortest Paths, and so on. These problems also have O~(mn)\tilde{O}(mn)-time algorithms on mm-edge nn-node weighted graphs, and such algorithms have wider applicability. Are these mnmn bounds optimal when mn2m \ll n^2? Starting from the hypothesis that the minimum weight (2+1)(2\ell+1)-Clique problem in edge weighted graphs requires n2+1o(1)n^{2\ell+1-o(1)} time, we prove that for all sparsities of the form m=Θ(n1+1/)m = \Theta(n^{1+1/\ell}), there is no O(n2+mn1ϵ)O(n^2 + mn^{1-\epsilon}) time algorithm for ϵ>0\epsilon>0 for \emph{any} of the below problems: Minimum Weight (2+1)(2\ell+1)-Cycle in a directed weighted graph, Shortest Cycle in a directed weighted graph, APSP in a directed or undirected weighted graph, Radius (or Eccentricities) in a directed or undirected weighted graph, Wiener index of a directed or undirected weighted graph, Replacement Paths in a directed weighted graph, Second Shortest Path in a directed weighted graph, Betweenness Centrality of a given node in a directed weighted graph. That is, we prove hardness for a variety of sparse graph problems from the hardness of a dense graph problem. Our results also lead to new conditional lower bounds from several related hypothesis for unweighted sparse graph problems including kk-cycle, shortest cycle, Radius, Wiener index and APSP.Comment: Updated the [AR16] citatio

    FAQ: Questions Asked Frequently

    Full text link
    We define and study the Functional Aggregate Query (FAQ) problem, which encompasses many frequently asked questions in constraint satisfaction, databases, matrix operations, probabilistic graphical models and logic. This is our main conceptual contribution. We then present a simple algorithm called "InsideOut" to solve this general problem. InsideOut is a variation of the traditional dynamic programming approach for constraint programming based on variable elimination. Our variation adds a couple of simple twists to basic variable elimination in order to deal with the generality of FAQ, to take full advantage of Grohe and Marx's fractional edge cover framework, and of the analysis of recent worst-case optimal relational join algorithms. As is the case with constraint programming and graphical model inference, to make InsideOut run efficiently we need to solve an optimization problem to compute an appropriate 'variable ordering'. The main technical contribution of this work is a precise characterization of when a variable ordering is 'semantically equivalent' to the variable ordering given by the input FAQ expression. Then, we design an approximation algorithm to find an equivalent variable ordering that has the best 'fractional FAQ-width'. Our results imply a host of known and a few new results in graphical model inference, matrix operations, relational joins, and logic. We also briefly explain how recent algorithms on beyond worst-case analysis for joins and those for solving SAT and #SAT can be viewed as variable elimination to solve FAQ over compactly represented input functions

    Improved Constructions for Non-adaptive Threshold Group Testing

    Full text link
    The basic goal in combinatorial group testing is to identify a set of up to dd defective items within a large population of size ndn \gg d using a pooling strategy. Namely, the items can be grouped together in pools, and a single measurement would reveal whether there are one or more defectives in the pool. The threshold model is a generalization of this idea where a measurement returns positive if the number of defectives in the pool reaches a fixed threshold u>0u > 0, negative if this number is no more than a fixed lower threshold <u\ell < u, and may behave arbitrarily otherwise. We study non-adaptive threshold group testing (in a possibly noisy setting) and show that, for this problem, O(dg+2(logd)log(n/d))O(d^{g+2} (\log d) \log(n/d)) measurements (where g:=u1g := u-\ell-1 and uu is any fixed constant) suffice to identify the defectives, and also present almost matching lower bounds. This significantly improves the previously known (non-constructive) upper bound O(du+1log(n/d))O(d^{u+1} \log(n/d)). Moreover, we obtain a framework for explicit construction of measurement schemes using lossless condensers. The number of measurements resulting from this scheme is ideally bounded by O(dg+3(logd)logn)O(d^{g+3} (\log d) \log n). Using state-of-the-art constructions of lossless condensers, however, we obtain explicit testing schemes with O(dg+3(logd)qpoly(logn))O(d^{g+3} (\log d) qpoly(\log n)) and O(dg+3+βpoly(logn))O(d^{g+3+\beta} poly(\log n)) measurements, for arbitrary constant β>0\beta > 0.Comment: Revised draft of the full version. Contains various edits and a new lower bounds section. Preliminary version appeared in Proceedings of the 37th International Colloquium on Automata, Languages and Programming (ICALP), 201

    Exact Learning from an Honest Teacher That Answers Membership Queries

    Full text link
    Given a teacher that holds a function f:XRf:X\to R from some class of functions CC. The teacher can receive from the learner an element~dd in the domain XX (a query) and returns the value of the function in dd, f(d)Rf(d)\in R. The learner goal is to find ff with a minimum number of queries, optimal time complexity, and optimal resources. In this survey, we present some of the results known from the literature, different techniques used, some new problems, and open problems

    Provable Bounds for Learning Some Deep Representations

    Full text link
    We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an nn node multilayer neural net that has degree at most nγn^{\gamma} for some γ<1\gamma <1 and each edge has a random edge weight in [1,1][-1,1]. Our algorithm learns {\em almost all} networks in this class with polynomial running time. The sample complexity is quadratic or cubic depending upon the details of the model. The algorithm uses layerwise learning. It is based upon a novel idea of observing correlations among features and using these to infer the underlying edge structure via a global graph recovery procedure. The analysis of the algorithm reveals interesting structure of neural networks with random edge weights.Comment: The first 18 pages serve as an extended abstract and a 36 pages long technical appendix follow

    A Survey on Learning to Hash

    Full text link
    Nearest neighbor search is a problem of finding the data points from the database such that the distances from them to the query point are the smallest. Learning to hash is one of the major solutions to this problem and has been widely studied recently. In this paper, we present a comprehensive survey of the learning to hash algorithms, categorize them according to the manners of preserving the similarities into: pairwise similarity preserving, multiwise similarity preserving, implicit similarity preserving, as well as quantization, and discuss their relations. We separate quantization from pairwise similarity preserving as the objective function is very different though quantization, as we show, can be derived from preserving the pairwise similarities. In addition, we present the evaluation protocols, and the general performance analysis, and point out that the quantization algorithms perform superiorly in terms of search accuracy, search time cost, and space cost. Finally, we introduce a few emerging topics.Comment: To appear in IEEE Transactions On Pattern Analysis and Machine Intelligence (TPAMI

    Multi-view Vector-valued Manifold Regularization for Multi-label Image Classification

    Full text link
    In computer vision, image datasets used for classification are naturally associated with multiple labels and comprised of multiple views, because each image may contain several objects (e.g. pedestrian, bicycle and tree) and is properly characterized by multiple visual features (e.g. color, texture and shape). Currently available tools ignore either the label relationship or the view complementary. Motivated by the success of the vector-valued function that constructs matrix-valued kernels to explore the multi-label structure in the output space, we introduce multi-view vector-valued manifold regularization (MV3\mathbf{^3}MR) to integrate multiple features. MV3\mathbf{^3}MR exploits the complementary property of different features and discovers the intrinsic local geometry of the compact support shared by different features under the theme of manifold regularization. We conducted extensive experiments on two challenging, but popular datasets, PASCAL VOC' 07 (VOC) and MIR Flickr (MIR), and validated the effectiveness of the proposed MV3\mathbf{^3}MR for image classification
    corecore