Search CORE

220,799 research outputs found

Efficient Triangle Counting in Large Graphs via Degree-based Vertex Partitioning

Author: A. Hajnal
A. Magen
C. Papadimitriou
D. Knuth
F. Chung
H. Chernoff
H. Jowhari
J. Feigenbaum
J.H. Kim
M. Latapy
N. Alon
O. Frank
S. Wasserman
T. Schank
T. Schank
V.H. Vu
W. Johnson
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The number of triangles is a computationally expensive graph statistic which is frequently used in complex network analysis (e.g., transitivity ratio), in various random graph models (e.g., exponential random graph model) and in important real world applications such as spam detection, uncovering of the hidden thematic structure of the Web and link recommendation. Counting triangles in graphs with millions and billions of edges requires algorithms which run fast, use small amount of space, provide accurate estimates of the number of triangles and preferably are parallelizable. In this paper we present an efficient triangle counting algorithm which can be adapted to the semistreaming model. The key idea of our algorithm is to combine the sampling algorithm of Tsourakakis et al. and the partitioning of the set of vertices into a high degree and a low degree subset respectively as in the Alon, Yuster and Zwick work treating each set appropriately. We obtain a running time

O \left(m + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)

and an

\epsilon

approximation (multiplicative error), where

n

is the number of vertices,

m

the number of edges and

\Delta

the maximum number of triangles an edge is contained. Furthermore, we show how this algorithm can be adapted to the semistreaming model with space usage

O\left(m^{1/2}\log{n} + \frac{m^{3/2} \Delta \log{n}}{t \epsilon^2} \right)

and a constant number of passes (three) over the graph stream. We apply our methods in various networks with several millions of edges and we obtain excellent results. Finally, we propose a random projection based method for triangle counting and provide a sufficient condition to obtain an estimate with low variance.Comment: 1) 12 pages 2) To appear in the 7th Workshop on Algorithms and Models for the Web Graph (WAW 2010

arXiv.org e-Print Archive

CiteSeerX

Crossref

Beyond Triangles: A Distributed Framework for Estimating 3-profiles of Large Graphs

Author: Borokhovich Michael
Dimakis Alexandros G.
Elenberg Ethan R.
Shanmugam Karthikeyan
Publication venue
Publication date: 22/06/2015
Field of study

We study the problem of approximating the

3

-profile of a large graph.

3

-profiles are generalizations of triangle counts that specify the number of times a small graph appears as an induced subgraph of a large graph. Our algorithm uses the novel concept of

3

-profile sparsifiers: sparse graphs that can be used to approximate the full

3

-profile counts for a given large graph. Further, we study the problem of estimating local and ego

3

-profiles, two graph quantities that characterize the local neighborhood of each vertex of a graph. Our algorithm is distributed and operates as a vertex program over the GraphLab PowerGraph framework. We introduce the concept of edge pivoting which allows us to collect

2

-hop information without maintaining an explicit

2

-hop neighborhood list at each vertex. This enables the computation of all the local

3

-profiles in parallel with minimal communication. We test out implementation in several experiments scaling up to

640

cores on Amazon EC2. We find that our algorithm can estimate the

3

-profile of a graph in approximately the same time as triangle counting. For the harder problem of ego

3

-profiles, we introduce an algorithm that can estimate profiles of hundreds of thousands of vertices in parallel, in the timescale of minutes.Comment: To appear in part at KDD'1

arXiv.org e-Print Archive

CiteSeerX

Crawling Facebook for Social Network Analysis Purposes

Author: Catanese Salvatore
De Meo Pasquale
Ferrara Emilio
Fiumara Giacomo
Provetti Alessandro
Publication venue: ACM
Publication date: 01/01/2011
Field of study

We describe our work in the collection and analysis of massive data describing the connections between participants to online social networks. Alternative approaches to social network data collection are defined and evaluated in practice, against the popular Facebook Web site. Thanks to our ad-hoc, privacy-compliant crawlers, two large samples, comprising millions of connections, have been collected; the data is anonymous and organized as an undirected graph. We describe a set of tools that we developed to analyze specific properties of such social-network graphs, i.e., among others, degree distribution, centrality measures, scaling laws and distribution of friendship.\u

arXiv.org e-Print Archive

CiteSeerX

Crossref

CogPrints Cognitive Sciences Eprint Archive

Extraction and Analysis of Facebook Friendship Relations

Author: Catanese Salvatore
De Meo Pasquale
Ferrara Emilio
Fiumara Giacomo
Provetti Alessandro
Publication venue: Springer Verlag, London
Publication date: 01/01/2011
Field of study

Online Social Networks (OSNs) are a unique Web and social phenomenon, affecting tastes and behaviors of their users and helping them to maintain/create friendships. It is interesting to analyze the growth and evolution of Online Social Networks both from the point of view of marketing and other of new services and from a scientific viewpoint, since their structure and evolution may share similarities with real-life social networks. In social sciences, several techniques for analyzing (online) social networks have been developed, to evaluate quantitative properties (e.g., defining metrics and measures of structural characteristics of the networks) or qualitative aspects (e.g., studying the attachment model for the network evolution, the binary trust relationships, and the link prediction problem).\ud However, OSN analysis poses novel challenges both to Computer and Social scientists. We present our long-term research effort in analyzing Facebook, the largest and arguably most successful OSN today: it gathers more than 500 million users. Access to data about Facebook users and their friendship relations, is restricted; thus, we acquired the necessary information directly from the front-end of the Web site, in order to reconstruct a sub-graph representing anonymous interconnections among a significant subset of users. We describe our ad-hoc, privacy-compliant crawler for Facebook data extraction. To minimize bias, we adopt two different graph mining techniques: breadth-first search (BFS) and rejection sampling. To analyze the structural properties of samples consisting of millions of nodes, we developed a specific tool for analyzing quantitative and qualitative properties of social networks, adopting and improving existing Social Network Analysis (SNA) techniques and algorithms

CogPrints Cognitive Sciences Eprint Archive

FS^3: A Sampling based method for top-k Frequent Subgraph Mining

Author: Hasan Mohammad Al
Saha Tanay Kumar
Publication venue
Publication date: 02/09/2014
Field of study

Mining labeled subgraph is a popular research task in data mining because of its potential application in many different scientific domains. All the existing methods for this task explicitly or implicitly solve the subgraph isomorphism task which is computationally expensive, so they suffer from the lack of scalability problem when the graphs in the input database are large. In this work, we propose FS^3, which is a sampling based method. It mines a small collection of subgraphs that are most frequent in the probabilistic sense. FS^3 performs a Markov Chain Monte Carlo (MCMC) sampling over the space of a fixed-size subgraphs such that the potentially frequent subgraphs are sampled more often. Besides, FS^3 is equipped with an innovative queue manager. It stores the sampled subgraph in a finite queue over the course of mining in such a manner that the top-k positions in the queue contain the most frequent subgraphs. Our experiments on database of large graphs show that FS^3 is efficient, and it obtains subgraphs that are the most frequent amongst the subgraphs of a given size

arXiv.org e-Print Archive

CiteSeerX

IUPUIScholarWorks