33,166 research outputs found

    Deep Representation Learning for Social Network Analysis

    Full text link
    Social network analysis is an important problem in data mining. A fundamental step for analyzing social networks is to encode network data into low-dimensional representations, i.e., network embeddings, so that the network topology structure and other attribute information can be effectively preserved. Network representation leaning facilitates further applications such as classification, link prediction, anomaly detection and clustering. In addition, techniques based on deep neural networks have attracted great interests over the past a few years. In this survey, we conduct a comprehensive review of current literature in network representation learning utilizing neural network models. First, we introduce the basic models for learning node representations in homogeneous networks. Meanwhile, we will also introduce some extensions of the base models in tackling more complex scenarios, such as analyzing attributed networks, heterogeneous networks and dynamic networks. Then, we introduce the techniques for embedding subgraphs. After that, we present the applications of network representation learning. At the end, we discuss some promising research directions for future work

    Towards combinatorial clustering: preliminary research survey

    Full text link
    The paper describes clustering problems from the combinatorial viewpoint. A brief systemic survey is presented including the following: (i) basic clustering problems (e.g., classification, clustering, sorting, clustering with an order over cluster), (ii) basic approaches to assessment of objects and object proximities (i.e., scales, comparison, aggregation issues), (iii) basic approaches to evaluation of local quality characteristics for clusters and total quality characteristics for clustering solutions, (iv) clustering as multicriteria optimization problem, (v) generalized modular clustering framework, (vi) basic clustering models/methods (e.g., hierarchical clustering, k-means clustering, minimum spanning tree based clustering, clustering as assignment, detection of clisue/quasi-clique based clustering, correlation clustering, network communities based clustering), Special attention is targeted to formulation of clustering as multicriteria optimization models. Combinatorial optimization models are used as auxiliary problems (e.g., assignment, partitioning, knapsack problem, multiple choice problem, morphological clique problem, searching for consensus/median for structures). Numerical examples illustrate problem formulations, solving methods, and applications. The material can be used as follows: (a) a research survey, (b) a fundamental for designing the structure/architecture of composite modular clustering software, (c) a bibliography reference collection, and (d) a tutorial.Comment: 102 pages, 66 figures, 67 table

    Measuring Two-Event Structural Correlations on Graphs

    Full text link
    Real-life graphs usually have various kinds of events happening on them, e.g., product purchases in online social networks and intrusion alerts in computer networks. The occurrences of events on the same graph could be correlated, exhibiting either attraction or repulsion. Such structural correlations can reveal important relationships between different events. Unfortunately, correlation relationships on graph structures are not well studied and cannot be captured by traditional measures. In this work, we design a novel measure for assessing two-event structural correlations on graphs. Given the occurrences of two events, we choose uniformly a sample of "reference nodes" from the vicinity of all event nodes and employ the Kendall's tau rank correlation measure to compute the average concordance of event density changes. Significance can be efficiently assessed by tau's nice property of being asymptotically normal under the null hypothesis. In order to compute the measure in large scale networks, we develop a scalable framework using different sampling strategies. The complexity of these strategies is analyzed. Experiments on real graph datasets with both synthetic and real events demonstrate that the proposed framework is not only efficacious, but also efficient and scalable.Comment: VLDB201

    Mining Attribute-structure Correlated Patterns in Large Attributed Graphs

    Full text link
    In this work, we study the correlation between attribute sets and the occurrence of dense subgraphs in large attributed graphs, a task we call structural correlation pattern mining. A structural correlation pattern is a dense subgraph induced by a particular attribute set. Existing methods are not able to extract relevant knowledge regarding how vertex attributes interact with dense subgraphs. Structural correlation pattern mining combines aspects of frequent itemset and quasi-clique mining problems. We propose statistical significance measures that compare the structural correlation of attribute sets against their expected values using null models. Moreover, we evaluate the interestingness of structural correlation patterns in terms of size and density. An efficient algorithm that combines search and pruning strategies in the identification of the most relevant structural correlation patterns is presented. We apply our method for the analysis of three real-world attributed graphs: a collaboration, a music, and a citation network, verifying that it provides valuable knowledge in a feasible time.Comment: VLDB201

    Deep Learning on Graphs: A Survey

    Full text link
    Deep learning has been shown to be successful in a number of domains, ranging from acoustics, images, to natural language processing. However, applying deep learning to the ubiquitous graph data is non-trivial because of the unique characteristics of graphs. Recently, substantial research efforts have been devoted to applying deep learning methods to graphs, resulting in beneficial advances in graph analysis techniques. In this survey, we comprehensively review the different types of deep learning methods on graphs. We divide the existing methods into five categories based on their model architectures and training strategies: graph recurrent neural networks, graph convolutional networks, graph autoencoders, graph reinforcement learning, and graph adversarial methods. We then provide a comprehensive overview of these methods in a systematic manner mainly by following their development history. We also analyze the differences and compositions of different methods. Finally, we briefly outline the applications in which they have been used and discuss potential future research directions.Comment: Accepted by Transactions on Knowledge and Data Engineering. 24 pages, 11 figure

    Graph Embedding Techniques, Applications, and Performance: A Survey

    Full text link
    Graphs, such as social networks, word co-occurrence networks, and communication networks, occur naturally in various real-world applications. Analyzing them yields insight into the structure of society, language, and different patterns of communication. Many approaches have been proposed to perform the analysis. Recently, methods which use the representation of graph nodes in vector space have gained traction from the research community. In this survey, we provide a comprehensive and structured analysis of various graph embedding techniques proposed in the literature. We first introduce the embedding task and its challenges such as scalability, choice of dimensionality, and features to be preserved, and their possible solutions. We then present three categories of approaches based on factorization methods, random walks, and deep learning, with examples of representative algorithms in each category and analysis of their performance on various tasks. We evaluate these state-of-the-art methods on a few common datasets and compare their performance against one another. Our analysis concludes by suggesting some potential applications and future directions. We finally present the open-source Python library we developed, named GEM (Graph Embedding Methods, available at https://github.com/palash1992/GEM), which provides all presented algorithms within a unified interface to foster and facilitate research on the topic.Comment: Submitted to Knowledge Based Systems for revie

    SUMMARIZED: Efficient Framework for Analyzing Multidimensional Process Traces under Edit-distance Constraint

    Full text link
    Domains such as scientific workflows and business processes exhibit data models with complex relationships between objects. This relationship is typically represented as sequences, where each data item is annotated with multi-dimensional attributes. There is a need to analyze this data for operational insights. For example, in business processes, users are interested in clustering process traces into smaller subsets to discover less complex process models. This requires expensive computation of similarity metrics between sequence-based data. Related work on dimension reduction and embedding methods do not take into account the multi-dimensional attributes of data, and do not address the interpretability of data in the embedding space (i.e., by favoring vector-based representation). In this work, we introduce Summarized, a framework for efficient analysis on sequence-based multi-dimensional data using intuitive and user-controlled summarizations. We introduce summarization schemes that provide tunable trade-offs between the quality and efficiency of analysis tasks and derive an error model for summary-based similarity under an edit-distance constraint. Evaluations using real-world datasets show the effectives of our framework

    GROUPS-NET: Group Meetings Aware Routing in Multi-Hop D2D Networks

    Full text link
    In the next generation cellular networks, device-to-device (D2D) communication is already considered a fundamental feature. A problem of multi-hop D2D networks is on how to define forwarding algorithms that achieve, at the same time, high delivery ratio and low network overhead. In this paper we aim to understand group meetings' properties by looking at their structure and regularity with the final goal of applying such knowledge in the design of a forwarding algorithm for D2D multi-hop networks. We introduce a forwarding protocol, namely GROUPS-NET, which is aware of social group meetings and their evolution over time. Our algorithm is parameter-calibration free and does not require any knowledge about the social network structure of the system. In particular, different from the state of the art algorithms, GROUPS-NET does not need communities detection, which is a complex and expensive task. We validate our algorithm using different publicly available data-sources. In real large scale scenarios, our algorithm achieves approximately the same delivery ratio of the state-of-art solution with up to 40% less network overhead.Comment: arXiv admin note: text overlap with arXiv:1512.0482

    Node Embedding with Adaptive Similarities for Scalable Learning over Graphs

    Full text link
    Node embedding is the task of extracting informative and descriptive features over the nodes of a graph. The importance of node embeddings for graph analytics, as well as learning tasks such as node classification, link prediction and community detection, has led to increased interest on the problem leading to a number of recent advances. Much like PCA in the feature domain, node embedding is an inherently \emph{unsupervised} task; in lack of metadata used for validation, practical methods may require standardization and limiting the use of tunable hyperparameters. Finally, node embedding methods are faced with maintaining scalability in the face of large-scale real-world graphs of ever-increasing sizes. In the present work, we propose an adaptive node embedding framework that adjusts the embedding process to a given underlying graph, in a fully unsupervised manner. To achieve this, we adopt the notion of a tunable node similarity matrix that assigns weights on paths of different length. The design of the multilength similarities ensures that the resulting embeddings also inherit interpretable spectral properties. The proposed model is carefully studied, interpreted, and numerically evaluated using stochastic block models. Moreover, an algorithmic scheme is proposed for training the model parameters effieciently and in an unsupervised manner. We perform extensive node classification, link prediction, and clustering experiments on many real world graphs from various domains, and compare with state-of-the-art scalable and unsupervised node embedding alternatives. The proposed method enjoys superior performance in many cases, while also yielding interpretable information on the underlying structure of the graph

    An Automated System for Discovering Neighborhood Patterns in Ego Networks

    Full text link
    Generally, social network analysis has often focused on the topology of the network without considering the characteristics of individuals involved in them. Less attention is given to study the behavior of individuals, considering they are the basic entity of a graph. Given a mobile social network graph, what are good features to extract key information from the nodes? How many distinct neighborhood patterns exist for ego nodes? What clues does such information provide to study nodes over a long period of time? In this report, we develop an automated system in order to discover the occurrences of prototypical ego-centric patterns from data. We aim to provide a data-driven instrument to be used in behavioral sciences for graph interpretations. We analyze social networks derived from real-world data collected with smart-phones. We select 13 well-known network measures, especially those concerned with ego graphs. We form eight feature subsets and then assess their performance using unsupervised clustering techniques to discover distinguishing ego-centric patterns. From clustering analysis, we discover that eight distinct neighborhood patterns have emerged. This categorization allows concise analysis of users' data as they change over time. The results provide a fine-grained analysis for the contribution of different feature sets to detect unique clustering patterns. Last, as a case study, two datasets are studied over long periods to demonstrate the utility of this method. The study shows the effectiveness of the proposed approach in discovering important trends from data
    • …
    corecore