76,094 research outputs found

    LINE: Large-scale Information Network Embedding

    Full text link
    This paper studies the problem of embedding very large information networks into low-dimensional vector spaces, which is useful in many tasks such as visualization, node classification, and link prediction. Most existing graph embedding methods do not scale for real world information networks which usually contain millions of nodes. In this paper, we propose a novel network embedding method called the "LINE," which is suitable for arbitrary types of information networks: undirected, directed, and/or weighted. The method optimizes a carefully designed objective function that preserves both the local and global network structures. An edge-sampling algorithm is proposed that addresses the limitation of the classical stochastic gradient descent and improves both the effectiveness and the efficiency of the inference. Empirical experiments prove the effectiveness of the LINE on a variety of real-world information networks, including language networks, social networks, and citation networks. The algorithm is very efficient, which is able to learn the embedding of a network with millions of vertices and billions of edges in a few hours on a typical single machine. The source code of the LINE is available online.Comment: WWW 201

    Structural Logistic Regression for Link Analysis

    Get PDF
    We present Structural Logistic Regression, an extension of logistic regression to modeling relational data. It is an integrated approach to building regression models from data stored in relational databases in which potential predictors, both boolean and real-valued, are generated by structured search in the space of queries to the database, and then tested with statistical information criteria for inclusion in a logistic regression. Using statistics and relational representation allows modeling in noisy domains with complex structure. Link prediction is a task of high interest with exactly such characteristics. Be it in the domain of scientific citations, social networks or hypertext, the underlying data are extremely noisy and the features useful for prediction are not readily available in a flat file format. We propose the application of Structural Logistic Regression to building link prediction models, and present experimental results for the task of predicting citations made in scientific literature using relational data taken from the CiteSeer search engine. This data includes the citation graph, authorship and publication venues of papers, as well as their word content

    Factorization threshold models for scale-free networks generation

    Get PDF
    Many real networks such as the World Wide Web, financial, biological, citation and social networks have a power-law degree distribution. Networks with this feature are also called scale-free. Several models for producing scale-free networks have been obtained by now and most of them are based on the preferential attachment approach. We will offer the model with another scale-free property explanation. The main idea is to approximate the network's adjacency matrix by multiplication of the matrices VV and VTV^T, where VV is the matrix of vertices' latent features. This approach is called matrix factorization and is successfully used in the link prediction problem. To create a generative model of scale-free networks we will sample latent features VV from some probabilistic distribution and try to generate a network's adjacency matrix. Entries in the generated matrix are dot products of latent features which are real numbers. In order to create an adjacency matrix, we approximate entries with the Boolean domain {0,1}\{0, 1\}. We have incorporated the threshold parameter θ\theta into the model for discretization of a dot product. Actually, we have been influenced by the geographical threshold models which were recently proven to have good results in a scale-free networks generation. The overview of our results is the following. First, we will describe our model formally. Second, we will tune the threshold θ\theta in order to generate sparse growing networks. Finally, we will show that our model produces scale-free networks with the fixed power-law exponent which equals two. In order to generate oriented networks with tunable power-law exponents and to obtain other model properties, we will offer different modifications of our model. Some of our results will be demonstrated using computer simulation

    Prediction and modelling of complex social networks and their evolution.

    Get PDF
    This thesis focuses on complex social networks in the context of computational approaches for their prediction and modelling. The increasing popularity and advancement of social net- works paired with the availability of social network data enable empirical analysis, inference, prediction and modelling of social patterns. This data-driven approach towards social science is continuously evolving and is crucial for modelling and understanding of human social behaviour including predicting future social interactions for a wide range of applications. The main difference between traditional datasets and network datasets is the presence of the relational components (links) between instances (nodes) of the network. These links and nodes induce intricate local and global patterns, defining the topology of a network. The topology is ever evolving, determining the dynamics of such a networked system. The work presented in this thesis starts with an extensive analysis of three standard network models, in terms of their properties and self-interactions as well as the size and density of the resultant graphs. These crucial analysis and understanding of the main network models are utilised to later develop a comprehensive network simulation framework. A set of novel nature-inspired link prediction approaches are then developed to predict the evolution of networks, based solely on their topologies. Building on top of these approaches, enhanced topological representations of networks are subsequently combined with node characteristics for the purpose of node classification. Finally, the proposed classification methods are extensively evaluated using simulated networks from our network simulation framework as well as two real-world citation networks. The link prediction approaches proposed in this research show that the topology of the network can be further exploited to improve the prediction of future relationships. Moreover, this research demonstrates the potential of blending state-of-the-art Machine Learning techniques with graph theory. To accelerate such advancements in the field of network science, this research also offers an open- source software to provide high-quality synthetic datasets

    Neural‑Brane: Neural Bayesian Personalized Ranking for Attributed Network Embedding

    Get PDF
    Network embedding methodologies, which learn a distributed vector representation for each vertex in a network, have attracted considerable interest in recent years. Existing works have demonstrated that vertex representation learned through an embedding method provides superior performance in many real-world applications, such as node classification, link prediction, and community detection. However, most of the existing methods for network embedding only utilize topological information of a vertex, ignoring a rich set of nodal attributes (such as user profiles of an online social network, or textual contents of a citation network), which is abundant in all real-life networks. A joint network embedding that takes into account both attributional and relational information entails a complete network information and could further enrich the learned vector representations. In this work, we present Neural-Brane, a novel Neural Bayesian Personalized Ranking based Attributed Network Embedding. For a given network, Neural-Brane extracts latent feature representation of its vertices using a designed neural network model that unifies network topological information and nodal attributes. Besides, it utilizes Bayesian personalized ranking objective, which exploits the proximity ordering between a similar node pair and a dissimilar node pair. We evaluate the quality of vertex embedding produced by Neural-Brane by solving the node classification and clustering tasks on four real-world datasets. Experimental results demonstrate the superiority of our proposed method over the state-of-the-art existing methods
    • …
    corecore