Search CORE

426 research outputs found

Minimal Algorithmic Information Loss Methods for Dimension Reduction, Feature Selection and Network Sparsification

Author: Abrahão Felipe S.
Kiani Narsis A.
Rueda-Toicen Antonio
Tegnér Jesper
Zea Allan A.
Zenil Hector
Publication venue
Publication date: 23/09/2020
Field of study

We introduce a family of unsupervised, domain-free, and (asymptotically) model-independent algorithms based on the principles of algorithmic probability and information theory designed to minimize the loss of algorithmic information, including a lossless-compression-based lossy compression algorithm. The methods can select and coarse-grain data in an algorithmic-complexity fashion (without the use of popular compression algorithms) by collapsing regions that may procedurally be regenerated from a computable candidate model. We show that the method can preserve the salient properties of objects and perform dimension reduction, denoising, feature selection, and network sparsification. As validation case, we demonstrate that the method preserves all the graph-theoretic indices measured on a well-known set of synthetic and real-world networks of very different nature, ranging from degree distribution and clustering coefficient to edge betweenness and degree and eigenvector centralities, achieving equal or significantly better results than other data reduction and some of the leading network sparsification methods. The methods (InfoRank, MILS) can also be applied to applications such as image segmentation based on algorithmic probability.Comment: 23 pages in double column including Appendix, online implementation at http://complexitycalculator.com/MILS

arXiv.org e-Print Archive

Random Forests and Networks Analysis

Author: Avena L.
Castell F.
Gaudilliere A.
Melot C.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/11/2017
Field of study

D. Wilson~\cite{[Wi]} in the 1990's described a simple and efficient algorithm based on loop-erased random walks to sample uniform spanning trees and more generally weighted trees or forests spanning a given graph. This algorithm provides a powerful tool in analyzing structures on networks and along this line of thinking, in recent works~\cite{AG1,AG2,ACGM1,ACGM2} we focused on applications of spanning rooted forests on finite graphs. The resulting main conclusions are reviewed in this paper by collecting related theorems, algorithms, heuristics and numerical experiments. A first foundational part on determinantal structures and efficient sampling procedures is followed by four main applications: 1) a random-walk-based notion of well-distributed points in a graph 2) how to describe metastable dynamics in finite settings by means of Markov intertwining dualities 3) coarse graining schemes for networks and associated processes 4) wavelets-like pyramidal algorithms for graph signals.Comment: Survey pape

arXiv.org e-Print Archive

HAL AMU

Distance Preserving Graph Simplification

Author: Huang Yan
Jin Ruoming
Ruan Ning
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Large graphs are difficult to represent, visualize, and understand. In this paper, we introduce "gate graph" - a new approach to perform graph simplification. A gate graph provides a simplified topological view of the original graph. Specifically, we construct a gate graph from a large graph so that for any "non-local" vertex pair (distance higher than some threshold) in the original graph, their shortest-path distance can be recovered by consecutive "local" walks through the gate vertices in the gate graph. We perform a theoretical investigation on the gate-vertex set discovery problem. We characterize its computational complexity and reveal the upper bound of minimum gate-vertex set using VC-dimension theory. We propose an efficient mining algorithm to discover a gate-vertex set with guaranteed logarithmic bound. We further present a fast technique for pruning redundant edges in a gate graph. The detailed experimental results using both real and synthetic graphs demonstrate the effectiveness and efficiency of our approach.Comment: A short version of this paper will be published for ICDM'11, December 201

arXiv.org e-Print Archive

Federation ResearchOnline

On the Interaction Between Differential Privacy and Gradient Compression in Deep Learning

Author: Lin Jimmy
Publication venue
Publication date: 01/11/2022
Field of study

While differential privacy and gradient compression are separately well-researched topics in machine learning, the study of interaction between these two topics is still relatively new. We perform a detailed empirical study on how the Gaussian mechanism for differential privacy and gradient compression jointly impact test accuracy in deep learning. The existing literature in gradient compression mostly evaluates compression in the absence of differential privacy guarantees, and demonstrate that sufficiently high compression rates reduce accuracy. Similarly, existing literature in differential privacy evaluates privacy mechanisms in the absence of compression, and demonstrates that sufficiently strong privacy guarantees reduce accuracy. In this work, we observe while gradient compression generally has a negative impact on test accuracy in non-private training, it can sometimes improve test accuracy in differentially private training. Specifically, we observe that when employing aggressive sparsification or rank reduction to the gradients, test accuracy is less affected by the Gaussian noise added for differential privacy. These observations are explained through an analysis how differential privacy and compression effects the bias and variance in estimating the average gradient. We follow this study with a recommendation on how to improve test accuracy under the context of differentially private deep learning and gradient compression. We evaluate this proposal and find that it can reduce the negative impact of noise added by differential privacy mechanisms on test accuracy by up to 24.6%, and reduce the negative impact of gradient sparsification on test accuracy by up to 15.1%

arXiv.org e-Print Archive