68 research outputs found
One-shot neural network pruning via spectral graph sparsification
Neural network pruning has gained significant attention for its potential to reduce computational resources required for training and inference. A large body of research has shown that networks can be pruned both after training and at initialisation, while maintaining competitive accuracy compared to dense networks. However, current methods rely on iteratively pruning or repairing the network to avoid over-pruning and layer collapse. Recent work has found that by treating neural networks as a sequence of bipartite graphs, pruning can be studied through the lens of spectral graph theory. Therefore, in this work, we propose a novel pruning approach using spectral sparsification, which aims to preserve meaningful properties of a dense graph with a sparse subgraph, by preserving the spectrum of the dense graph's adjacency matrix. We empirically validate and investigate our method, and show that one-shot pruning using spectral sparsification preserves performance at higher levels of sparsity compared to its one-shot counterparts. Additionally, we theoretically analyse our method with respect to local and global connectivity
Sparse random hypergraphs: Non-backtracking spectra and community detection
We consider the community detection problem in a sparse -uniform
hypergraph , assuming that is generated according to the Hypergraph
Stochastic Block Model (HSBM). We prove that a spectral method based on the
non-backtracking operator for hypergraphs works with high probability down to
the generalized Kesten-Stigum detection threshold conjectured by Angelini et
al. (2015). We characterize the spectrum of the non-backtracking operator for
the sparse HSBM and provide an efficient dimension reduction procedure using
the Ihara-Bass formula for hypergraphs. As a result, community detection for
the sparse HSBM on vertices can be reduced to an eigenvector problem of a
non-normal matrix constructed from the adjacency matrix and the
degree matrix of the hypergraph. To the best of our knowledge, this is the
first provable and efficient spectral algorithm that achieves the conjectured
threshold for HSBMs with blocks generated according to a general symmetric
probability tensor.Comment: 61 pages, 8figures. To appear in Information and Inferenc
Spherical and Hyperbolic Toric Topology-Based Codes On Graph Embedding for Ising MRF Models: Classical and Quantum Topology Machine Learning
The paper introduces the application of information geometry to describe the
ground states of Ising models by utilizing parity-check matrices of cyclic and
quasi-cyclic codes on toric and spherical topologies. The approach establishes
a connection between machine learning and error-correcting coding. This
proposed approach has implications for the development of new embedding methods
based on trapping sets. Statistical physics and number geometry applied for
optimize error-correcting codes, leading to these embedding and sparse
factorization methods. The paper establishes a direct connection between DNN
architecture and error-correcting coding by demonstrating how state-of-the-art
architectures (ChordMixer, Mega, Mega-chunk, CDIL, ...) from the long-range
arena can be equivalent to of block and convolutional LDPC codes (Cage-graph,
Repeat Accumulate). QC codes correspond to certain types of chemical elements,
with the carbon element being represented by the mixed automorphism
Shu-Lin-Fossorier QC-LDPC code. The connections between Belief Propagation and
the Permanent, Bethe-Permanent, Nishimori Temperature, and Bethe-Hessian Matrix
are elaborated upon in detail. The Quantum Approximate Optimization Algorithm
(QAOA) used in the Sherrington-Kirkpatrick Ising model can be seen as analogous
to the back-propagation loss function landscape in training DNNs. This
similarity creates a comparable problem with TS pseudo-codeword, resembling the
belief propagation method. Additionally, the layer depth in QAOA correlates to
the number of decoding belief propagation iterations in the Wiberg decoding
tree. Overall, this work has the potential to advance multiple fields, from
Information Theory, DNN architecture design (sparse and structured prior graph
topology), efficient hardware design for Quantum and Classical DPU/TPU (graph,
quantize and shift register architect.) to Materials Science and beyond.Comment: 71 pages, 42 Figures, 1 Table, 1 Appendix. arXiv admin note: text
overlap with arXiv:2109.08184 by other author
More than the sum of its parts â pattern mining, neural networks, and how they complement each other
In this thesis we explore pattern mining and deep learning. Often seen as orthogonal, we show that these fields complement each other and propose to combine them to gain from each otherâs strengths. We, first, show how to efficiently discover succinct and non-redundant sets of patterns that provide insight into data beyond conjunctive statements. We leverage the interpretability of such patterns to unveil how and which information flows through neural networks, as well as what characterizes their decisions. Conversely, we show how to combine continuous optimization with pattern discovery, proposing a neural network that directly encodes discrete patterns, which allows us to apply pattern mining at a scale orders of magnitude larger than previously possible. Large neural networks are, however, exceedingly expensive to train for which âlottery ticketsâ â small, well-trainable sub-networks in randomly initialized neural networks â offer a remedy. We identify theoretical limitations of strong tickets and overcome them by equipping these tickets with the property of universal approximation. To analyze whether limitations in ticket sparsity are algorithmic or fundamental, we propose a framework to plant and hide lottery tickets. With novel ticket benchmarks we then conclude that the limitation is likely algorithmic, encouraging further developments for which our framework offers means to measure progress.In dieser Arbeit befassen wir uns mit Mustersuche und Deep Learning. Oft als gegensĂ€tzlich betrachtet, verbinden wir diese Felder, um von den StĂ€rken beider zu profitieren. Wir zeigen erst, wie man effizient prĂ€gnante Mengen von Mustern entdeckt, die Einsichten ĂŒber konjunktive Aussagen hinaus geben. Wir nutzen dann die Interpretierbarkeit solcher Muster, um zu verstehen wie und welche Information durch neuronale Netze flieĂen und was ihre Entscheidungen charakterisiert. Umgekehrt verbinden wir kontinuierliche Optimierung mit Mustererkennung durch ein neuronales Netz welches diskrete Muster direkt abbildet, was Mustersuche in einigen GröĂenordnungen höher erlaubt als bisher möglich. Das Training groĂer neuronaler Netze ist jedoch extrem teuer, fĂŒr das âLotterieticketsâ â kleine, gut trainierbare Subnetzwerke in zufĂ€llig initialisierten neuronalen Netzen â eine Lösung bieten. Wir zeigen theoretische EinschrĂ€nkungen von starken Tickets und wie man diese ĂŒberwindet, indem man die Tickets mit der Eigenschaft der universalen Approximierung ausstattet. Um zu beantworten, ob EinschrĂ€nkungen in TicketgröĂe algorithmischer oder fundamentaler Natur sind, entwickeln wir ein Rahmenwerk zum Einbetten und Verstecken von Tickets, die als ModellfĂ€lle dienen. Basierend auf unseren Ergebnissen schlieĂen wir, dass die EinschrĂ€nkungen algorithmische Ursachen haben, was weitere Entwicklungen begĂŒnstigt, fĂŒr die unser Rahmenwerk Fortschrittsevaluierungen ermöglicht
- âŠ