Search CORE

13,828 research outputs found

Multi-Scale Link Prediction

Author: Dhillon Inderjit S.
Shin Donghyuk
Si Si
Publication venue
Publication date: 01/01/2012
Field of study

The automated analysis of social networks has become an important problem due to the proliferation of social networks, such as LiveJournal, Flickr and Facebook. The scale of these social networks is massive and continues to grow rapidly. An important problem in social network analysis is proximity estimation that infers the closeness of different users. Link prediction, in turn, is an important application of proximity estimation. However, many methods for computing proximity measures have high computational complexity and are thus prohibitive for large-scale link prediction problems. One way to address this problem is to estimate proximity measures via low-rank approximation. However, a single low-rank approximation may not be sufficient to represent the behavior of the entire network. In this paper, we propose Multi-Scale Link Prediction (MSLP), a framework for link prediction, which can handle massive networks. The basis idea of MSLP is to construct low rank approximations of the network at multiple scales in an efficient manner. Based on this approach, MSLP combines predictions at multiple scales to make robust and accurate predictions. Experimental results on real-life datasets with more than a million nodes show the superior performance and scalability of our method.Comment: 20 pages, 10 figure

arXiv.org e-Print Archive

CiteSeerX

Correlation Clustering with Low-Rank Matrices

Author: Arthur D.
Asteris M.
Garey M. R.
Kim S.
McCallum A.
Papailiopoulos D. S.
Papailiopoulos D. S.
Van Gael J.
Publication venue
Publication date: 17/03/2017
Field of study

Correlation clustering is a technique for aggregating data based on qualitative information about which pairs of objects are labeled 'similar' or 'dissimilar.' Because the optimization problem is NP-hard, much of the previous literature focuses on finding approximation algorithms. In this paper we explore how to solve the correlation clustering objective exactly when the data to be clustered can be represented by a low-rank matrix. We prove in particular that correlation clustering can be solved in polynomial time when the underlying matrix is positive semidefinite with small constant rank, but that the task remains NP-hard in the presence of even one negative eigenvalue. Based on our theoretical results, we develop an algorithm for efficiently "solving" low-rank positive semidefinite correlation clustering by employing a procedure for zonotope vertex enumeration. We demonstrate the effectiveness and speed of our algorithm by using it to solve several clustering problems on both synthetic and real-world data

arXiv.org e-Print Archive

Crossref

Low-rank Similarity Measure for Role Model Extraction

Author: Arnaud Browet
Arnaud Browet
Paul Michel
Paul Van Dooren
Van Dooren
Publication venue
Publication date: 24/07/2014
Field of study

Computing meaningful clusters of nodes is crucial to analyze large networks. In this paper, we present a pairwise node similarity measure that allows to extract roles, i.e. group of nodes sharing similar flow patterns within a network. We propose a low rank iterative scheme to approximate the similarity measure for very large networks. Finally, we show that our low rank similarity score successfully extracts the different roles in random graphs and that its performances are similar to the pairwise similarity measure.Comment: 7 pages, 2 columns, 4 figures, conference paper for MTNS201

arXiv.org e-Print Archive

CiteSeerX

Optimization via Low-rank Approximation for Community Detection in Networks

Author: Le Can M.
Levina Elizaveta
Vershynin Roman
Publication venue
Publication date: 10/05/2015
Field of study

Community detection is one of the fundamental problems of network analysis, for which a number of methods have been proposed. Most model-based or criteria-based methods have to solve an optimization problem over a discrete set of labels to find communities, which is computationally infeasible. Some fast spectral algorithms have been proposed for specific methods or models, but only on a case-by-case basis. Here we propose a general approach for maximizing a function of a network adjacency matrix over discrete labels by projecting the set of labels onto a subspace approximating the leading eigenvectors of the expected adjacency matrix. This projection onto a low-dimensional space makes the feasible set of labels much smaller and the optimization problem much easier. We prove a general result about this method and show how to apply it to several previously proposed community detection criteria, establishing its consistency for label estimation in each case and demonstrating the fundamental connection between spectral properties of the network and various model-based approaches to community detection. Simulations and applications to real-world data are included to demonstrate our method performs well for multiple problems over a wide range of parameters.Comment: 45 pages, 7 figures; added discussions about computational complexity and extension to more than two communitie

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Stability of graph communities across time scales

Author: Arenas
Gfeller
J.- C. Delvenne
M. Barahona
Reichardt
S. N. Yaliraki
Strogatz
Publication venue
Publication date: 11/03/2009
Field of study

The complexity of biological, social and engineering networks makes it desirable to find natural partitions into communities that can act as simplified descriptions and provide insight into the structure and function of the overall system. Although community detection methods abound, there is a lack of consensus on how to quantify and rank the quality of partitions. We show here that the quality of a partition can be measured in terms of its stability, defined in terms of the clustered autocovariance of a Markov process taking place on the graph. Because the stability has an intrinsic dependence on time scales of the graph, it allows us to compare and rank partitions at each time and also to establish the time spans over which partitions are optimal. Hence the Markov time acts effectively as an intrinsic resolution parameter that establishes a hierarchy of increasingly coarser clusterings. Within our framework we can then provide a unifying view of several standard partitioning measures: modularity and normalized cut size can be interpreted as one-step time measures, whereas Fiedler's spectral clustering emerges at long times. We apply our method to characterize the relevance and persistence of partitions over time for constructive and real networks, including hierarchical graphs and social networks. We also obtain reduced descriptions for atomic level protein structures over different time scales.Comment: submitted; updated bibliography from v

arXiv.org e-Print Archive

Crossref

PubMed Central

Spiral - Imperial College Digital Repository

DIAL UCLouvain

Regression and Singular Value Decomposition in Dynamic Graphs

Author: Chehreghani Mostafa Haghir
Publication venue
Publication date: 09/04/2020
Field of study

Most of real-world graphs are {\em dynamic}, i.e., they change over time. However, while problems such as regression and Singular Value Decomposition (SVD) have been studied for {\em static} graphs, they have not been investigated for {\em dynamic} graphs, yet. In this paper, we introduce, motivate and study regression and SVD over dynamic graphs. First, we present the notion of {\em update-efficient matrix embedding} that defines the conditions sufficient for a matrix embedding to be used for the dynamic graph regression problem (under

l_2

norm). We prove that given an

n \times m

update-efficient matrix embedding (e.g., adjacency matrix), after an update operation in the graph, the optimal solution of the graph regression problem for the revised graph can be computed in

O(nm)

time. We also study dynamic graph regression under least absolute deviation. Then, we characterize a class of matrix embeddings that can be used to efficiently update SVD of a dynamic graph. For adjacency matrix and Laplacian matrix, we study those graph update operations for which SVD (and low rank approximation) can be updated efficiently

arXiv.org e-Print Archive

Spatial Random Sampling: A Structure-Preserving Data Sketching Tool

Author: Atia George
Rahmani Mostafa
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 12/07/2017
Field of study

Random column sampling is not guaranteed to yield data sketches that preserve the underlying structures of the data and may not sample sufficiently from less-populated data clusters. Also, adaptive sampling can often provide accurate low rank approximations, yet may fall short of producing descriptive data sketches, especially when the cluster centers are linearly dependent. Motivated by that, this paper introduces a novel randomized column sampling tool dubbed Spatial Random Sampling (SRS), in which data points are sampled based on their proximity to randomly sampled points on the unit sphere. The most compelling feature of SRS is that the corresponding probability of sampling from a given data cluster is proportional to the surface area the cluster occupies on the unit sphere, independently from the size of the cluster population. Although it is fully randomized, SRS is shown to provide descriptive and balanced data representations. The proposed idea addresses a pressing need in data science and holds potential to inspire many novel approaches for analysis of big data

arXiv.org e-Print Archive

Crossref

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)