17,231 research outputs found

    Compressive Mining: Fast and Optimal Data Mining in the Compressed Domain

    Full text link
    Real-world data typically contain repeated and periodic patterns. This suggests that they can be effectively represented and compressed using only a few coefficients of an appropriate basis (e.g., Fourier, Wavelets, etc.). However, distance estimation when the data are represented using different sets of coefficients is still a largely unexplored area. This work studies the optimization problems related to obtaining the \emph{tightest} lower/upper bound on Euclidean distances when each data object is potentially compressed using a different set of orthonormal coefficients. Our technique leads to tighter distance estimates, which translates into more accurate search, learning and mining operations \textit{directly} in the compressed domain. We formulate the problem of estimating lower/upper distance bounds as an optimization problem. We establish the properties of optimal solutions, and leverage the theoretical analysis to develop a fast algorithm to obtain an \emph{exact} solution to the problem. The suggested solution provides the tightest estimation of the L2L_2-norm or the correlation. We show that typical data-analysis operations, such as k-NN search or k-Means clustering, can operate more accurately using the proposed compression and distance reconstruction technique. We compare it with many other prevalent compression and reconstruction techniques, including random projections and PCA-based techniques. We highlight a surprising result, namely that when the data are highly sparse in some basis, our technique may even outperform PCA-based compression. The contributions of this work are generic as our methodology is applicable to any sequential or high-dimensional data as well as to any orthogonal data transformation used for the underlying data compression scheme.Comment: 25 pages, 20 figures, accepted in VLD

    Three Puzzles on Mathematics, Computation, and Games

    Full text link
    In this lecture I will talk about three mathematical puzzles involving mathematics and computation that have preoccupied me over the years. The first puzzle is to understand the amazing success of the simplex algorithm for linear programming. The second puzzle is about errors made when votes are counted during elections. The third puzzle is: are quantum computers possible?Comment: ICM 2018 plenary lecture, Rio de Janeiro, 36 pages, 7 Figure

    Next nearest neighbour Ising models on random graphs

    Full text link
    This paper develops results for the next nearest neighbour Ising model on random graphs. Besides being an essential ingredient in classic models for frustrated systems, second neighbour interactions interactions arise naturally in several applications such as the colour diversity problem and graphical games. We demonstrate ensembles of random graphs, including regular connectivity graphs, that have a periodic variation of free energy, with either the ratio of nearest to next nearest couplings, or the mean number of nearest neighbours. When the coupling ratio is integer paramagnetic phases can be found at zero temperature. This is shown to be related to the locked or unlocked nature of the interactions. For anti-ferromagnetic couplings, spin glass phases are demonstrated at low temperature. The interaction structure is formulated as a factor graph, the solution on a tree is developed. The replica symmetric and energetic one-step replica symmetry breaking solution is developed using the cavity method. We calculate within these frameworks the phase diagram and demonstrate the existence of dynamical transitions at zero temperature for cases of anti-ferromagnetic coupling on regular and inhomogeneous random graphs.Comment: 55 pages, 15 figures, version 2 with minor revisions, to be published J. Stat. Mec

    Separable Convex Optimization with Nested Lower and Upper Constraints

    Full text link
    We study a convex resource allocation problem in which lower and upper bounds are imposed on partial sums of allocations. This model is linked to a large range of applications, including production planning, speed optimization, stratified sampling, support vector machines, portfolio management, and telecommunications. We propose an efficient gradient-free divide-and-conquer algorithm, which uses monotonicity arguments to generate valid bounds from the recursive calls, and eliminate linking constraints based on the information from sub-problems. This algorithm does not need strict convexity or differentiability. It produces an ϵ\epsilon-approximate solution for the continuous problem in O(nlogmlognBϵ)\mathcal{O}(n \log m \log \frac{n B}{\epsilon}) time and an integer solution in O(nlogmlogB)\mathcal{O}(n \log m \log B) time, where nn is the number of decision variables, mm is the number of constraints, and BB is the resource bound. A complexity of O(nlogm)\mathcal{O}(n \log m) is also achieved for the linear and quadratic cases. These are the best complexities known to date for this important problem class. Our experimental analyses confirm the good performance of the method, which produces optimal solutions for problems with up to 1,000,000 variables in a few seconds. Promising applications to the support vector ordinal regression problem are also investigated
    corecore