314 research outputs found

    Exploiting Structural Properties in the Analysis of High-dimensional Dynamical Systems

    Get PDF
    The physical and cyber domains with which we interact are filled with high-dimensional dynamical systems. In machine learning, for instance, the evolution of overparametrized neural networks can be seen as a dynamical system. In networked systems, numerous agents or nodes dynamically interact with each other. A deep understanding of these systems can enable us to predict their behavior, identify potential pitfalls, and devise effective solutions for optimal outcomes. In this dissertation, we will discuss two classes of high-dimensional dynamical systems with specific structural properties that aid in understanding their dynamic behavior. In the first scenario, we consider the training dynamics of multi-layer neural networks. The high dimensionality comes from overparametrization: a typical network has a large depth and hidden layer width. We are interested in the following question regarding convergence: Do network weights converge to an equilibrium point corresponding to a global minimum of our training loss, and how fast is the convergence rate? The key to those questions is the symmetry of the weights, a critical property induced by the multi-layer architecture. Such symmetry leads to a set of time-invariant quantities, called weight imbalance, that restrict the training trajectory to a low-dimensional manifold defined by the weight initialization. A tailored convergence analysis is developed over this low-dimensional manifold, showing improved rate bounds for several multi-layer network models studied in the literature, leading to novel characterizations of the effect of weight imbalance on the convergence rate. In the second scenario, we consider large-scale networked systems with multiple weakly-connected groups. Such a multi-cluster structure leads to a time-scale separation between the fast intra-group interaction due to high intra-group connectivity, and the slow inter-group oscillation, due to the weak inter-group connection. We develop a novel frequency-domain network coherence analysis that captures both the coherent behavior within each group, and the dynamical interaction between groups, leading to a structure-preserving model-reduction methodology for large-scale dynamic networks with multiple clusters under general node dynamics assumptions

    LIPIcs, Volume 251, ITCS 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 251, ITCS 2023, Complete Volum

    Estimating Higher-Order Mixed Memberships via the 2,\ell_{2,\infty} Tensor Perturbation Bound

    Full text link
    Higher-order multiway data is ubiquitous in machine learning and statistics and often exhibits community-like structures, where each component (node) along each different mode has a community membership associated with it. In this paper we propose the tensor mixed-membership blockmodel, a generalization of the tensor blockmodel positing that memberships need not be discrete, but instead are convex combinations of latent communities. We establish the identifiability of our model and propose a computationally efficient estimation procedure based on the higher-order orthogonal iteration algorithm (HOOI) for tensor SVD composed with a simplex corner-finding algorithm. We then demonstrate the consistency of our estimation procedure by providing a per-node error bound, which showcases the effect of higher-order structures on estimation accuracy. To prove our consistency result, we develop the 2,\ell_{2,\infty} tensor perturbation bound for HOOI under independent, possibly heteroskedastic, subgaussian noise that may be of independent interest. Our analysis uses a novel leave-one-out construction for the iterates, and our bounds depend only on spectral properties of the underlying low-rank tensor under nearly optimal signal-to-noise ratio conditions such that tensor SVD is computationally feasible. Whereas other leave-one-out analyses typically focus on sequences constructed by analyzing the output of a given algorithm with a small part of the noise removed, our leave-one-out analysis constructions use both the previous iterates and the additional tensor structure to eliminate a potential additional source of error. Finally, we apply our methodology to real and simulated data, including applications to two flight datasets and a trade network dataset, demonstrating some effects not identifiable from the model with discrete community memberships

    Fundamental Study of Photoluminescence-Shape Relationship of Fluorescent Nanodiamonds using Machine Learning Assisted Correlative Transmission Electron Microscopy and Photoluminescence Microscopy Method

    Full text link
    Luminescent nanoparticles have shown wide applications ranging from lighting, display, sensors, and biomedical diagnostics and imaging. Among these, fluorescent nanodiamonds (FNDs) containing nitrogen-vacancy (NV) color centers are posed as emerging materials particularly in biomedical and biological imaging applications due to their room-temperature emission, excellent photo- and chemical- stability, high bio-compatibility, and versatile functionalization potentials. The shape variation of nanoparticles has a decisive influence on their fluorescence. However, current relative studies are limited by the lack of reliable statistical analysis of nanoparticle shape and the difficulty of achieving a precise correlation between shape/structure and optical measurements of large numbers of individual nanoparticles. Therefore, new methods are urgently needed to overcome these challenges to assist in nanoparticle synthesis control and fluorescence performance optimization. In this thesis a new correlative TEM and photoluminescence (PL) microscopy (TEMPL) method has been developed that combines the measurements of the optical properties and the materials structure at the exact same particle and sample area, so that accurate correlation can be established to statistically study the FND morphology/structure and PL properties, at the single nanoparticle level. Moreover, machine learning based methods have been developed for categorizing the 2D and 3D shapes of a large number of nanoparticles generated in TEMPL method. This ML-assisted TEMPL method has been applied to understand the PL correlation with the size and shape of FNDs at the single particle level. In this thesis, a strong correlation between particle morphology and NV fluorescence in FND particles has been revealed: thin, flake-like particles produce enhanced fluorescence. The robustness of this trend is proven in FND with different surface oxidation treatments. This finding offers guidance for fluorescence-optimized sensing applications of FND, by controlling the shape of the particles in fabrication. Overall the TEMPL methodology developed in the thesis provides a versatile and general way to study the shape and fluorescence relationship of various nanoparticles and opens up the possibility of correlation methods between other characterisation techniques

    Graph Neural Networks for Natural Language Processing

    Get PDF
    By constructing graph-structured data from the input data, Graph Neural Network (GNN) enhances the performance of numerous Natural Language Processing (NLP) tasks. In this thesis, we mainly focus on two aspects of NLP: text classification and knowledge graph completion. TextGCN shows excellent performance in text classification by leveraging the graph structure of the entire corpus without using any external resources, especially under a limited labelled data setting. Two questions are explored: (1) Under the transductive semi-supervised setting, how to utilize the documents better and learn the complex relationship between nodes. (2) How to transform TextGCN into an inductive model and also reduce the time and space complexity? In detail, firstly, a comprehensive analysis was conducted on TextGCN and its variants. Secondly, we propose ME-GCN, a novel method for text classification that utilizes multi-dimensional edge features in a graph neural network (GNN) for the first time. It uses the corpus-trained word and document-based edge features for semi-supervised classification and has been shown to be effective through experiments on benchmark datasets under the limited labelled data setting. Thirdly, InducT-GCN, an inductive framework for GCN-based text classification that does not require additional resources is introduced. The framework introduces a novel approach to make transductive GCN-based text classification models inductive, improving performance and reducing time and space complexity. Most existing work for Temporal Knowledge Graph Completion (TKGC) overlooks the significance of explicit temporal information and fails to skip irrelevant snapshots based on the entity-related relation in the query. To address this, we introduced Re-Temp (Relation-Aware Temporal Representation Learning), a model that leverages explicit temporal embedding and a skip information flow after each timestamp to eliminate unnecessary information for prediction

    LIPIcs, Volume 261, ICALP 2023, Complete Volume

    Get PDF
    LIPIcs, Volume 261, ICALP 2023, Complete Volum

    Constructing hypergraphs from temporal data

    Full text link
    A wide range of systems across the social and natural sciences produce temporal data consisting of interaction events among nodes in disjoint sets. Online shopping, for example, generates purchasing events of the form (user, product, time of purchase), and mutualistic interactions in plant-pollinator systems generate pollination events of the form (insect, plant, time of pollination). These data sets can be meaningfully modeled as temporal hypergraph snapshots in which multiple nodes within one set (i.e. online shoppers) share a hyperedge if they interacted with a common node in the opposite set (i.e. purchased the same product) within a given time window, allowing for the application of a range of hypergraph analysis techniques. However, it is often unclear how to choose the number and duration of these temporal snapshots, which have a strong influence on the final hypergraph representations. Here we propose a principled, efficient, nonparametric solution to this longstanding problem by extracting temporal hypergraph snapshots that optimally capture structural regularities in temporal event data according to the minimum description length principle. We demonstrate our methods on real and synthetic datasets, finding that they can recover planted artificial hypergraph structure in the presence of considerable noise and reveal meaningful activity fluctuations in human mobility data

    Spherical and Hyperbolic Toric Topology-Based Codes On Graph Embedding for Ising MRF Models: Classical and Quantum Topology Machine Learning

    Full text link
    The paper introduces the application of information geometry to describe the ground states of Ising models by utilizing parity-check matrices of cyclic and quasi-cyclic codes on toric and spherical topologies. The approach establishes a connection between machine learning and error-correcting coding. This proposed approach has implications for the development of new embedding methods based on trapping sets. Statistical physics and number geometry applied for optimize error-correcting codes, leading to these embedding and sparse factorization methods. The paper establishes a direct connection between DNN architecture and error-correcting coding by demonstrating how state-of-the-art architectures (ChordMixer, Mega, Mega-chunk, CDIL, ...) from the long-range arena can be equivalent to of block and convolutional LDPC codes (Cage-graph, Repeat Accumulate). QC codes correspond to certain types of chemical elements, with the carbon element being represented by the mixed automorphism Shu-Lin-Fossorier QC-LDPC code. The connections between Belief Propagation and the Permanent, Bethe-Permanent, Nishimori Temperature, and Bethe-Hessian Matrix are elaborated upon in detail. The Quantum Approximate Optimization Algorithm (QAOA) used in the Sherrington-Kirkpatrick Ising model can be seen as analogous to the back-propagation loss function landscape in training DNNs. This similarity creates a comparable problem with TS pseudo-codeword, resembling the belief propagation method. Additionally, the layer depth in QAOA correlates to the number of decoding belief propagation iterations in the Wiberg decoding tree. Overall, this work has the potential to advance multiple fields, from Information Theory, DNN architecture design (sparse and structured prior graph topology), efficient hardware design for Quantum and Classical DPU/TPU (graph, quantize and shift register architect.) to Materials Science and beyond.Comment: 71 pages, 42 Figures, 1 Table, 1 Appendix. arXiv admin note: text overlap with arXiv:2109.08184 by other author

    New perspectives in statistical mechanics and high-dimensional inference

    Get PDF
    The main purpose of this thesis is to go beyond two usual assumptions that accompany theoretical analysis in spin-glasses and inference: the i.i.d. (independently and identically distributed) hypothesis on the noise elements and the finite rank regime. The first one appears since the early birth of spin-glasses. The second one instead concerns the inference viewpoint. Disordered systems and Bayesian inference have a well-established relation, evidenced by their continuous cross-fertilization. The thesis makes use of techniques coming both from the rigorous mathematical machinery of spin-glasses, such as the interpolation scheme, and from Statistical Physics, such as the replica method. The first chapter contains an introduction to the Sherrington-Kirkpatrick and spiked Wigner models. The first is a mean field spin-glass where the couplings are i.i.d. Gaussian random variables. The second instead amounts to establish the information theoretical limits in the reconstruction of a fixed low rank matrix, the “spike”, blurred by additive Gaussian noise. In chapters 2 and 3 the i.i.d. hypothesis on the noise is broken by assuming a noise with inhomogeneous variance profile. In spin-glasses this leads to multi-species models. The inferential counterpart is called spatial coupling. All the previous models are usually studied in the Bayes-optimal setting, where everything is known about the generating process of the data. In chapter 4 instead we study the spiked Wigner model where the prior on the signal to reconstruct is ignored. In chapter 5 we analyze the statistical limits of a spiked Wigner model where the noise is no longer Gaussian, but drawn from a random matrix ensemble, which makes its elements dependent. The thesis ends with chapter 6, where the challenging problem of high-rank probabilistic matrix factorization is tackled. Here we introduce a new procedure called "decimation" and we show that it is theoretically to perform matrix factorization through it

    On learning the structure of clusters in graphs

    Get PDF
    Graph clustering is a fundamental problem in unsupervised learning, with numerous applications in computer science and in analysing real-world data. In many real-world applications, we find that the clusters have a significant high-level structure. This is often overlooked in the design and analysis of graph clustering algorithms which make strong simplifying assumptions about the structure of the graph. This thesis addresses the natural question of whether the structure of clusters can be learned efficiently and describes four new algorithmic results for learning such structure in graphs and hypergraphs. The first part of the thesis studies the classical spectral clustering algorithm, and presents a tighter analysis on its performance. This result explains why it works under a much weaker and more natural condition than the ones studied in the literature, and helps to close the gap between the theoretical guarantees of the spectral clustering algorithm and its excellent empirical performance. The second part of the thesis builds on the theoretical guarantees of the previous part and shows that, when the clusters of the underlying graph have certain structures, spectral clustering with fewer than k eigenvectors is able to produce better output than classical spectral clustering in which k eigenvectors are employed, where k is the number of clusters. This presents the first work that discusses and analyses the performance of spectral clustering with fewer than k eigenvectors, and shows that general structures of clusters can be learned with spectral methods. The third part of the thesis considers efficient learning of the structure of clusters with local algorithms, whose runtime depends only on the size of the target clusters and is independent of the underlying input graph. While the objective of classical local clustering algorithms is to find a cluster which is sparsely connected to the rest of the graph, this part of the thesis presents a local algorithm that finds a pair of clusters which are densely connected to each other. This result demonstrates that certain structures of clusters can be learned efficiently in the local setting, even in the massive graphs which are ubiquitous in real-world applications. The final part of the thesis studies the problem of learning densely connected clusters in hypergraphs. The developed algorithm is based on a new heat diffusion process, whose analysis extends a sequence of recent work on the spectral theory of hypergraphs. It allows the structure of clusters to be learned in datasets modelling higher-order relations of objects and can be applied to efficiently analyse many complex datasets occurring in practice. All of the presented theoretical results are further extensively evaluated on both synthetic and real-word datasets of different domains, including image classification and segmentation, migration networks, co-authorship networks, and natural language processing. These experimental results demonstrate that the newly developed algorithms are practical, effective, and immediately applicable for learning the structure of clusters in real-world data
    corecore