17 research outputs found

    Escape times for subgraph detection and graph partitioning

    Full text link
    We provide a rearrangement based algorithm for fast detection of subgraphs of kk vertices with long escape times for directed or undirected networks. Complementing other notions of densest subgraphs and graph cuts, our method is based on the mean hitting time required for a random walker to leave a designated set and hit the complement. We provide a new relaxation of this notion of hitting time on a given subgraph and use that relaxation to construct a fast subgraph detection algorithm and a generalization to KK-partitioning schemes. Using a modification of the subgraph detector on each component, we propose a graph partitioner that identifies regions where random walks live for comparably large times. Importantly, our method implicitly respects the directed nature of the data for directed graphs while also being applicable to undirected graphs. We apply the partitioning method for community detection to a large class of model and real-world data sets.Comment: 22 pages, 10 figures, 1 table, comments welcome!

    Multilayer Networks

    Full text link
    In most natural and engineered systems, a set of entities interact with each other in complicated patterns that can encompass multiple types of relationships, change in time, and include other types of complications. Such systems include multiple subsystems and layers of connectivity, and it is important to take such "multilayer" features into account to try to improve our understanding of complex systems. Consequently, it is necessary to generalize "traditional" network theory by developing (and validating) a framework and associated tools to study multilayer systems in a comprehensive fashion. The origins of such efforts date back several decades and arose in multiple disciplines, and now the study of multilayer networks has become one of the most important directions in network science. In this paper, we discuss the history of multilayer networks (and related concepts) and review the exploding body of work on such networks. To unify the disparate terminology in the large body of recent work, we discuss a general framework for multilayer networks, construct a dictionary of terminology to relate the numerous existing concepts to each other, and provide a thorough discussion that compares, contrasts, and translates between related notions such as multilayer networks, multiplex networks, interdependent networks, networks of networks, and many others. We also survey and discuss existing data sets that can be represented as multilayer networks. We review attempts to generalize single-layer-network diagnostics to multilayer networks. We also discuss the rapidly expanding research on multilayer-network models and notions like community structure, connected components, tensor decompositions, and various types of dynamical processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure

    Asymptotically Normal Estimation of Local Latent Network Curvature

    Full text link
    Network data, commonly used throughout the physical, social, and biological sciences, consist of nodes (individuals) and the edges (interactions) between them. One way to represent the complex, high-dimensional structure in network data is to embed the graph into a low-dimensional geometric space. Curvature of this space, in particular, provides insights about structure in the graph, such as the propensity to form triangles or present tree-like structure. We derive an estimating function for curvature based on triangle side lengths and the midpoints between sides where the only input is a distance matrix and also establish asymptotic normality. We next introduce a novel latent distance matrix estimator for networks as well as an efficient algorithm to compute the estimate via solving iterative quadratic programs. We apply this method to the Los Alamos National Laboratory Unified Network and Host dataset and show how curvature estimates can be used to detect a red-team attack faster than naive methods, as well as discover non-constant latent curvature in coauthorship networks in physics.Comment: 77 page

    Efficient Methods for Mining Subgraphs in a Single Large Graph

    Get PDF
    Large and complex graphs are often used for simulation of the complex relationships among objects in many applications in various fields, such as social networks, maps, computer networks, chemical structures, bioinformatics, computer vision and web analysis. Frequent subgraph mining (FSM) is a vital issue and has attracted numerous researchers in recent years, among them, MNI-based approaches are considered as state-of-the-art, such as the GraMi algorithm. FSM plays an important role in various tasks, such as data mining, model analysis, and decision support systems. It is defined as finding all subgraphs whose occurrences in the dataset are greater than or equal to a given frequency threshold. In recent applications, such as social networks, the underlying graphs are very large, therefore algorithms for mining frequent subgraphs from a single large graph have been developing rapidly lately but all of them have huge search spaces, and therefore still needs a lot of time and memory to process. For frequent subgraph mining field, in this thesis, we have proposed a method to record the support of mined subgraphs; a sorting strategy to reduce the number of generated subgraphs; a parallel processing approach to reduce the mining time; early pruning of invalid values in the domain to balance the search space. Our experiments on four real datasets (both of the directed and undirected graphs) showed that the four proposed algorithms had better results with respect to the search space, the running time and the memory requirements and enhance the performance. Besides that, closed frequent subgraph mining was also developed. This has many practical applications and is a fundamental premise for many studies. We propose a closed frequent subgraph mining algorithm based on GraMi to find all closed frequent subgraphs in a single large graph; two strategies are also developed: namely early determining for closed frequent subgraphs and early pruning non-closed subgraphs; and these are used to improve the performance of the proposed algorithm. All our experiments for closed frequent subgraph mining are performed on five real directed/undirected graph datasets and the results show that the running time as well as the memory requirements of our algorithm are better than those of the GraMi-based algorithm.Velké a složité grafy se často používají pro simulaci komplexních vztahů mezi objekty v mnoha aplikacích v různých oblastech, jako jsou sociální sítě, mapy, počítačové sítě, chemické struktury, bioinformatika, počítačové vidění a webové analýzy. Časté dolování podgrafů (FSM) je zásadní problém a v posledních letech přitahuje řadu výzkumníků, mezi nimi jsou přístupy založené na MNI považovány za nejmodernější, jako je algoritmus GraMi. FSM hraje důležitou roli v různých úkolech, jako je dolování dat, analýza modelů a systémy na podporu rozhodování. Je definována jako nalezení všech podgrafů, jejichž výskyty v datové sadě jsou větší nebo rovné danému frekvenčnímu prahu. V nedávných aplikacích, jako jsou sociální sítě, jsou podkladové grafy velmi velké, a proto se algoritmy pro dolování častých podgrafů z jednoho velkého grafu v poslední době rychle vyvíjejí, ale všechny mají obrovské vyhledávací prostory, a proto stale potřebují spoustu času a paměti ke zpracování. Pro frekventované podgrafní těžební pole jsme v této práci navrhli metodu pro záznam podpory vytěžených podgrafů; strategii třídění pro snížení počtu generovaných podgrafů; přístup paralelního zpracování pro zkrácení doby těžby; včasné ořezávání neplatných hodnot v doméně, aby se vyrovnal prostor pro vyhledávání. Naše experiment na čtyřech reálných souborech dat (jak orientovaných, tak neorientovaných grafů) ukázaly, že naše čtyři navržené algoritmy měly lepší výsledky s ohledem na prohledávací prostor, dobu běhu a požadavky na paměť a zvýšily výkon výpočtu. Mimo to byla rovněž rozvinuta metoda hkedání uzavřených (closed) grafů. To má mnoho praktických aplikací a je základním předpokladem pro mnoho studií. Navrhujeme uzavřený algoritmus dolování častých podgrafů založený na GraMi k nalezení všech uzavřených častých podgrafů v jediném velkém grafu; jsou také vyvinuty dvě strategie: jmenovitě včasné určování pro uzavřené časté podgrafy a včasné ořezávání neuzavřených podgrafů; a ty se používají ke zlepšení výkonu navrhovaného algoritmu. Všechny naše experimenty pro uzavřené časté dolování podgrafů jsou prováděny na pěti skutečných řízených/ neorientovaných grafových datových sadách a výsledky ukazují, že doba běhu a paměťové požadavky našeho algoritmu jsou lepší než u algoritmu založeného na GraMi.460 - Katedra informatikyvyhově

    Topological Deep Learning: Going Beyond Graph Data

    Full text link
    Topological deep learning is a rapidly growing field that pertains to the development of deep learning models for data supported on topological domains such as simplicial complexes, cell complexes, and hypergraphs, which generalize many domains encountered in scientific computations. In this paper, we present a unifying deep learning framework built upon a richer data structure that includes widely adopted topological domains. Specifically, we first introduce combinatorial complexes, a novel type of topological domain. Combinatorial complexes can be seen as generalizations of graphs that maintain certain desirable properties. Similar to hypergraphs, combinatorial complexes impose no constraints on the set of relations. In addition, combinatorial complexes permit the construction of hierarchical higher-order relations, analogous to those found in simplicial and cell complexes. Thus, combinatorial complexes generalize and combine useful traits of both hypergraphs and cell complexes, which have emerged as two promising abstractions that facilitate the generalization of graph neural networks to topological spaces. Second, building upon combinatorial complexes and their rich combinatorial and algebraic structure, we develop a general class of message-passing combinatorial complex neural networks (CCNNs), focusing primarily on attention-based CCNNs. We characterize permutation and orientation equivariances of CCNNs, and discuss pooling and unpooling operations within CCNNs in detail. Third, we evaluate the performance of CCNNs on tasks related to mesh shape analysis and graph learning. Our experiments demonstrate that CCNNs have competitive performance as compared to state-of-the-art deep learning models specifically tailored to the same tasks. Our findings demonstrate the advantages of incorporating higher-order relations into deep learning models in different applications

    Knowledge Extraction from Textual Resources through Semantic Web Tools and Advanced Machine Learning Algorithms for Applications in Various Domains

    Get PDF
    Nowadays there is a tremendous amount of unstructured data, often represented by texts, which is created and stored in variety of forms in many domains such as patients' health records, social networks comments, scientific publications, and so on. This volume of data represents an invaluable source of knowledge, but unfortunately it is challenging its mining for machines. At the same time, novel tools as well as advanced methodologies have been introduced in several domains, improving the efficacy and the efficiency of data-based services. Following this trend, this thesis shows how to parse data from text with Semantic Web based tools, feed data into Machine Learning methodologies, and produce services or resources to facilitate the execution of some tasks. More precisely, the use of Semantic Web technologies powered by Machine Learning algorithms has been investigated in the Healthcare and E-Learning domains through not yet experimented methodologies. Furthermore, this thesis investigates the use of some state-of-the-art tools to move data from texts to graphs for representing the knowledge contained in scientific literature. Finally, the use of a Semantic Web ontology and novel heuristics to detect insights from biological data in form of graph are presented. The thesis contributes to the scientific literature in terms of results and resources. Most of the material presented in this thesis derives from research papers published in international journals or conference proceedings

    Integration of multi-scale protein interactions for biomedical data analysis

    Get PDF
    With the advancement of modern technologies, we observe an increasing accumulation of biomedical data about diseases. There is a need for computational methods to sift through and extract knowledge from the diverse data available in order to improve our mechanistic understanding of diseases and improve patient care. Biomedical data come in various forms as exemplified by the various omics data. Existing studies have shown that each form of omics data gives only partial information on cells state and motivated jointly mining multi-omics, multi-modal data to extract integrated system knowledge. The interactome is of particular importance as it enables the modelling of dependencies arising from molecular interactions. This Thesis takes a special interest in the multi-scale protein interactome and its integration with computational models to extract relevant information from biomedical data. We define multi-scale interactions at different omics scale that involve proteins: pairwise protein-protein interactions, multi-protein complexes, and biological pathways. Using hypergraph representations, we motivate considering higher-order protein interactions, highlighting the complementary biological information contained in the multi-scale interactome. Based on those results, we further investigate how those multi-scale protein interactions can be used as either prior knowledge, or auxiliary data to develop machine learning algorithms. First, we design a neural network using the multi-scale organization of proteins in a cell into biological pathways as prior knowledge and train it to predict a patient's diagnosis based on transcriptomics data. From the trained models, we develop a strategy to extract biomedical knowledge pertaining to the diseases investigated. Second, we propose a general framework based on Non-negative Matrix Factorization to integrate the multi-scale protein interactome with multi-omics data. We show that our approach outperforms the existing methods, provide biomedical insights and relevant hypotheses for specific cancer types

    Discrete Mathematics and Symmetry

    Get PDF
    Some of the most beautiful studies in Mathematics are related to Symmetry and Geometry. For this reason, we select here some contributions about such aspects and Discrete Geometry. As we know, Symmetry in a system means invariance of its elements under conditions of transformations. When we consider network structures, symmetry means invariance of adjacency of nodes under the permutations of node set. The graph isomorphism is an equivalence relation on the set of graphs. Therefore, it partitions the class of all graphs into equivalence classes. The underlying idea of isomorphism is that some objects have the same structure if we omit the individual character of their components. A set of graphs isomorphic to each other is denominated as an isomorphism class of graphs. The automorphism of a graph will be an isomorphism from G onto itself. The family of all automorphisms of a graph G is a permutation group

    29th International Symposium on Algorithms and Computation: ISAAC 2018, December 16-19, 2018, Jiaoxi, Yilan, Taiwan

    Get PDF
    corecore