54 research outputs found

    Correlation Detection in Trees for Planted Graph Alignment

    Get PDF
    Motivated by alignment of correlated sparse random graphs, we study a hypothesis problem of deciding whether two random trees are correlated or not. Based on this correlation detection problem, we propose MPAlign, a message-passing algorithm for graph alignment, which we prove to succeed in polynomial time at partial alignment whenever tree detection is feasible. As a result our analysis of tree detection reveals new ranges of parameters for which partial alignment of sparse random graphs is feasible in polynomial time. We conjecture that the connection between partial graph alignment and tree detection runs deeper, and that the parameter range where tree detection is impossible, which we partially characterize, corresponds to a region where partial graph alignment is hard (not polytime feasible)

    Structural engineering of evolving complex dynamical networks

    Get PDF
    Networks are ubiquitous in nature and many natural and man-made systems can be modelled as networked systems. Complex networks, systems comprising a number of nodes that are connected through edges, have been frequently used to model large-scale systems from various disciplines such as biology, ecology, and engineering. Dynamical systems interacting through a network may exhibit collective behaviours such as synchronisation, consensus, opinion formation, flocking and unusual phase transitions. Evolution of such collective behaviours is highly dependent on the structure of the interaction network. Optimisation of network topology to improve collective behaviours and network robustness can be achieved by intelligently modifying the network structure. Here, it is referred to as "Engineering of the Network". Although coupled dynamical systems can develop spontaneous synchronous patterns if their coupling strength lies in an appropriate range, in some applications one needs to control a fraction of nodes, known as driver nodes, in order to facilitate the synchrony. This thesis addresses the problem of identifying the set of best drivers, leading to the best pinning control performance. The eigen-ratio of the augmented Laplacian matrix, that is the largest eigenvalue divided by the second smallest one, is chosen as the controllability metric. The approach introduced in this thesis is to obtain the set of optimal drivers based on sensitivity analysis of the eigen-ratio, which requires only a single computation of the eigenvector associated with the largest eigenvalue, and thus is applicable for large-scale networks. This leads to a new "controllability centrality" metric for each subset of nodes. Simulation results reveal the effectiveness of the proposed metric in predicting the most important driver(s) correctly.     Interactions in complex networks might also facilitate the propagation of undesired effects, such as node/edge failure, which may crucially affect the performance of collective behaviours. In order to study the effect of node failure on network synchronisation, an analytical metric is proposed that measures the effect of a node removal on any desired eigenvalue of the Laplacian matrix. Using this metric, which is based on the local multiplicity of each eigenvalue at each node, one can approximate the impact of any node removal on the spectrum of a graph. The metric is computationally efficient as it only needs a single eigen-decomposition of the Laplacian matrix. It also provides a reliable approximation for the "Laplacian energy" of a network. Simulation results verify the accuracy of this metric in networks with different topologies. This thesis also considers formation control as an application of network synchronisation and studies the "rigidity maintenance" problem, which is one of the major challenges in this field. This problem is to preserve the rigidity of the sensing graph in a formation during motion, taking into consideration constraints such as line-of-sight requirements, sensing ranges and power limitations. By introducing a "Lattice of Configurations" for each node, a distributed rigidity maintenance algorithm is proposed to preserve the rigidity of the sensing network when failure in a sensing link would result in loss of rigidity. The proposed algorithm recovers rigidity by activating, almost always, the minimum number of new sensing links and considers real-time constraints of practical formations. A sufficient condition for this problem is proved and tested via numerical simulations. Based on the above results, a number of other areas and applications of network dynamics are studied and expounded upon in this thesis

    A Family of Tractable Graph Distances

    Full text link
    Important data mining problems such as nearest-neighbor search and clustering admit theoretical guarantees when restricted to objects embedded in a metric space. Graphs are ubiquitous, and clustering and classification over graphs arise in diverse areas, including, e.g., image processing and social networks. Unfortunately, popular distance scores used in these applications, that scale over large graphs, are not metrics and thus come with no guarantees. Classic graph distances such as, e.g., the chemical and the CKS distance are arguably natural and intuitive, and are indeed also metrics, but they are intractable: as such, their computation does not scale to large graphs. We define a broad family of graph distances, that includes both the chemical and the CKS distance, and prove that these are all metrics. Crucially, we show that our family includes metrics that are tractable. Moreover, we extend these distances by incorporating auxiliary node attributes, which is important in practice, while maintaining both the metric property and tractability.Comment: Extended version of paper appearing in SDM 201

    Multilayer Networks

    Full text link
    In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is important to take such "multilayer" features into account to try to improve our understanding of complex systems. Consequently, it is necessary to generalize "traditional" network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion. The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science. In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks. To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary of terminology to relate the numerous existing concepts to each other, and provide a thorough discussion that compares, contrasts, and translates between related notions such as multilayer networks, multiplex networks, interdependent networks, networks of networks, and many others. We also survey and discuss existing data sets that can be represented as multilayer networks. We review attempts to generalize single-layer-network diagnostics to multilayer networks. We also discuss the rapidly expanding research on multilayer-network models and notions like community structure, connected components, tensor decompositions, and various types of dynamical processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure

    Modularity-based approaches to community detection in multilayer networks with applications toward precision medicine

    Get PDF
    Networks have become an important tool for the analysis of complex systems across many different disciplines including computer science, biology, chemistry, social sciences, and importantly, cancer medicine. Networks in the real world typically exhibit many forms of higher order organization. The subfield of networks analysis known as community detection aims to provide tools for discovering and interpreting the global structure of a networks-based on the connectivity patterns of its edges. In this thesis, we provide an overview of the methods for community detection in networks with an emphasis on modularity-based approaches. We discuss several caveats and drawbacks of currently available methods. We also review the success that network analyses have had in interpreting large scale 'omics' data in the context of cancer biology. In the second and third chapters, we present CHAMP and multimodbp, two useful community detection tools that seek to overcome several of the deficiencies in modularity-based community detection. In the final chapter, we develop a networks-based significance test for addressing an important question in the field of oncology: are mutations in DNA damage repair genes associated with elevated levels of tumor mutational burden. We apply the tools of network analysis to this question and showcase how this approach yields new insight into the structure of the problem, revealing what we call the TMB Paradox. We close by demonstrating the clinical utility of our findings in predicting patient response to novel immunotherapies.Doctor of Philosoph

    Models for dynamic networks with metadata

    Get PDF
    There is increasing understanding that many complex systems of interest – everything from the global economy, to social group dynamics, to biochemical processes in the brain – require holistic modelling, rather than the consideration of units of the population in isolation. Network science techniques, which commence by viewing the system as a set of vertices or nodes (the units of the population, e.g. individual people) and edges (the relationships between them, e.g. friendship), are one popular approach to do so. Naturally, such complex systems express a wide array of important properties that we ought to account for when modelling them, beyond simply the presence or absence of a particular relationship. Most pertinently for this work, they evolve over time – i.e. they are dynamic – and the units of the population may have distinct properties, or attributes, which further differentiate them from each other. We define any such extra information we might possess outside of the simple node/edge paradigm to be metadata. Despite the potential utility of such metadata, it is only quite recently that methods have begun to jointly model both network and metadata together. In this thesis, we provide a new class of models that do so — specifically, with the purpose of finding groups in networks that change over time. We describe distinct versions of this class of models that allow the networks to be weighted and directed, as well as avoid the potential issue of placing nodes with similar degrees in the same group. In addition to elaborating such models, we derive novel requirements for the efficient detectability of groups given the presence of metadata — and in the process explain why a recent paper which claims to do the same for a similar static model is flawed. The inference method we leverage to investigate detectability is also highly scalable, and we further accelerate the process by proposing both a ‘greedy’ scheme, and a recursive procedure that effectively provides a top-down hierarchy of the network groups. We conclude by using our models as one component of a larger method, that provides an entirely novel means of estimating the influence of an author. We use a causal framing of the problem that to our knowledge has not previously been explored in this context, and depends upon recent ideas from the causal inference literature

    Robust Network Topology Inference and Processing of Graph Signals

    Full text link
    The abundance of large and heterogeneous systems is rendering contemporary data more pervasive, intricate, and with a non-regular structure. With classical techniques facing troubles to deal with the irregular (non-Euclidean) domain where the signals are defined, a popular approach at the heart of graph signal processing (GSP) is to: (i) represent the underlying support via a graph and (ii) exploit the topology of this graph to process the signals at hand. In addition to the irregular structure of the signals, another critical limitation is that the observed data is prone to the presence of perturbations, which, in the context of GSP, may affect not only the observed signals but also the topology of the supporting graph. Ignoring the presence of perturbations, along with the couplings between the errors in the signal and the errors in their support, can drastically hinder estimation performance. While many GSP works have looked at the presence of perturbations in the signals, much fewer have looked at the presence of perturbations in the graph, and almost none at their joint effect. While this is not surprising (GSP is a relatively new field), we expect this to change in the upcoming years. Motivated by the previous discussion, the goal of this thesis is to advance toward a robust GSP paradigm where the algorithms are carefully designed to incorporate the influence of perturbations in the graph signals, the graph support, and both. To do so, we consider different types of perturbations, evaluate their disruptive impact on fundamental GSP tasks, and design robust algorithms to address them.Comment: Dissertatio

    Large-Scale Intelligent Systems: From Network Dynamics to Optimization Algorithms

    Get PDF
    The expansion of large-scale technological systems such as electrical grids, transportation networks, health care systems, telecommunication networks, the Internet (of things), and other societal networks has created numerous challenges and opportunities at the same time. These systems are often not yet as robust, efficient, sustainable, or smart as we would want them to be. Fueled by the massive amounts of data generated by all these systems, and with the recent advances in making sense out of data, there is a strong desire to make them more intelligent. However, developing large-scale intelligent systems is a multifaceted problem, involving several major challenges. First, large-scale systems typically exhibit complex dynamics due to the large number of entities interacting over a network. Second, because the system is composed of many interacting entities, that make decentralized (and often self-interested) decisions, one has to properly design incentives and markets for such systems. Third, the massive computational needs caused by the scale of the system necessitate performing computations in a distributed fashion, which in turn requires devising new algorithms. Finally, one has to create algorithms that can learn from the copious amounts of data and generalize well. This thesis makes several contributions related to each of these four challenges. Analyzing and understanding the network dynamics exhibited in societal systems is crucial for developing systems that are robust and efficient. In Part I of this thesis, we study one of the most important families of network dynamics, namely, that of epidemics, or spreading processes. Studying such processes is relevant for understanding and controlling the spread of, e.g., contagious diseases among people, ideas or fake news in online social networks, computer viruses in computer networks, or cascading failures in societal networks. We establish several results on the exact Markov chain model and the nonlinear "mean-field" approximations for various kinds of epidemics (i.e., SIS, SIRS, SEIRS, SIV, SEIV, and their variants). Designing incentives and markets for large-scale systems is critical for their efficient operation and ensuring an alignment between the agents' decentralized decisions and the global goals of the system. To that end, in Part II of this thesis, we study these issues in markets with non-convex costs as well as networked markets, which are of vital importance for, e.g., the smart grid. We propose novel pricing schemes for such markets, which satisfy all the desired market properties. We also reveal issues in the current incentives for distributed energy resources, such as renewables, and design optimization algorithms for efficient management of aggregators of such resources. With the growing amounts of data generated by large-scale systems, and the fact that the data may already be dispersed across many units, it is becoming increasingly necessary to run computational tasks in a distributed fashion. Part III concerns developing algorithms for distributed computation. We propose a novel consensus-based algorithm for the task of solving large-scale systems of linear equations, which is one of the most fundamental problems in linear algebra, and a key step at the heart of many algorithms in scientific computing, machine learning, and beyond. In addition, in order to deal with the issue of heterogeneous delays in distributed computation, caused by slow machines, we develop a new coded computation technique. In both cases, the proposed methods offer significant speed-ups relative to the existing approaches. Over the past decade, deep learning methods have become the most successful learning algorithms in a wide variety of tasks. However, the reasons behind their success (as well as their failures in some respects) are largely unexplained. It is widely believed that the success of deep learning is not just due to the deep architecture of the models, but also due to the behavior of the optimization algorithms, such as stochastic gradient descent (SGD), used for training them. In Part IV of this thesis, we characterize several properties, such as minimax optimality and implicit regularization, of SGD, and more generally, of the family of stochastic mirror descent (SMD). While SGD performs an implicit regularization, this regularization can be effectively controlled using SMD with a proper choice of mirror, which in turn can improve the generalization error.</p
    • 

    corecore