426 research outputs found

    Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification

    Full text link
    We introduce a family of unsupervised, domain-free, and (asymptotically) model-independent algorithms based on the principles of algorithmic probability and information theory designed to minimize the loss of algorithmic information, including a lossless-compression-based lossy compression algorithm. The methods can select and coarse-grain data in an algorithmic-complexity fashion (without the use of popular compression algorithms) by collapsing regions that may procedurally be regenerated from a computable candidate model. We show that the method can preserve the salient properties of objects and perform dimension reduction, denoising, feature selection, and network sparsification. As validation case, we demonstrate that the method preserves all the graph-theoretic indices measured on a well-known set of synthetic and real-world networks of very different nature, ranging from degree distribution and clustering coefficient to edge betweenness and degree and eigenvector centralities, achieving equal or significantly better results than other data reduction and some of the leading network sparsification methods. The methods (InfoRank, MILS) can also be applied to applications such as image segmentation based on algorithmic probability.Comment: 23 pages in double column including Appendix, online implementation at http://complexitycalculator.com/MILS

    Random Forests and Networks Analysis

    Full text link
    D. Wilson~\cite{[Wi]} in the 1990's described a simple and efficient algorithm based on loop-erased random walks to sample uniform spanning trees and more generally weighted trees or forests spanning a given graph. This algorithm provides a powerful tool in analyzing structures on networks and along this line of thinking, in recent works~\cite{AG1,AG2,ACGM1,ACGM2} we focused on applications of spanning rooted forests on finite graphs. The resulting main conclusions are reviewed in this paper by collecting related theorems, algorithms, heuristics and numerical experiments. A first foundational part on determinantal structures and efficient sampling procedures is followed by four main applications: 1) a random-walk-based notion of well-distributed points in a graph 2) how to describe metastable dynamics in finite settings by means of Markov intertwining dualities 3) coarse graining schemes for networks and associated processes 4) wavelets-like pyramidal algorithms for graph signals.Comment: Survey pape

    Distance Preserving Graph Simplification

    Full text link
    Large graphs are difficult to represent, visualize, and understand. In this paper, we introduce "gate graph" - a new approach to perform graph simplification. A gate graph provides a simplified topological view of the original graph. Specifically, we construct a gate graph from a large graph so that for any "non-local" vertex pair (distance higher than some threshold) in the original graph, their shortest-path distance can be recovered by consecutive "local" walks through the gate vertices in the gate graph. We perform a theoretical investigation on the gate-vertex set discovery problem. We characterize its computational complexity and reveal the upper bound of minimum gate-vertex set using VC-dimension theory. We propose an efficient mining algorithm to discover a gate-vertex set with guaranteed logarithmic bound. We further present a fast technique for pruning redundant edges in a gate graph. The detailed experimental results using both real and synthetic graphs demonstrate the effectiveness and efficiency of our approach.Comment: A short version of this paper will be published for ICDM'11, December 201

    On the Interaction Between Differential Privacy and Gradient Compression in Deep Learning

    Full text link
    While differential privacy and gradient compression are separately well-researched topics in machine learning, the study of interaction between these two topics is still relatively new. We perform a detailed empirical study on how the Gaussian mechanism for differential privacy and gradient compression jointly impact test accuracy in deep learning. The existing literature in gradient compression mostly evaluates compression in the absence of differential privacy guarantees, and demonstrate that sufficiently high compression rates reduce accuracy. Similarly, existing literature in differential privacy evaluates privacy mechanisms in the absence of compression, and demonstrates that sufficiently strong privacy guarantees reduce accuracy. In this work, we observe while gradient compression generally has a negative impact on test accuracy in non-private training, it can sometimes improve test accuracy in differentially private training. Specifically, we observe that when employing aggressive sparsification or rank reduction to the gradients, test accuracy is less affected by the Gaussian noise added for differential privacy. These observations are explained through an analysis how differential privacy and compression effects the bias and variance in estimating the average gradient. We follow this study with a recommendation on how to improve test accuracy under the context of differentially private deep learning and gradient compression. We evaluate this proposal and find that it can reduce the negative impact of noise added by differential privacy mechanisms on test accuracy by up to 24.6%, and reduce the negative impact of gradient sparsification on test accuracy by up to 15.1%
    • …
    corecore