5,580 research outputs found
Similarity-Based Classification in Partially Labeled Networks
We propose a similarity-based method, using the similarity between nodes, to
address the problem of classification in partially labeled networks. The basic
assumption is that two nodes are more likely to be categorized into the same
class if they are more similar. In this paper, we introduce ten similarity
indices, including five local ones and five global ones. Empirical results on
the co-purchase network of political books show that the similarity-based
method can give high accurate classification even when the labeled nodes are
sparse which is one of the difficulties in classification. Furthermore, we find
that when the target network has many labeled nodes, the local indices can
perform as good as those global indices do, while when the data is sparce the
global indices perform better. Besides, the similarity-based method can to some
extent overcome the unconsistency problem which is another difficulty in
classification.Comment: 13 pages,3 figures,1 tabl
Graph Summarization
The continuous and rapid growth of highly interconnected datasets, which are
both voluminous and complex, calls for the development of adequate processing
and analytical techniques. One method for condensing and simplifying such
datasets is graph summarization. It denotes a series of application-specific
algorithms designed to transform graphs into more compact representations while
preserving structural patterns, query answers, or specific property
distributions. As this problem is common to several areas studying graph
topologies, different approaches, such as clustering, compression, sampling, or
influence detection, have been proposed, primarily based on statistical and
optimization methods. The focus of our chapter is to pinpoint the main graph
summarization methods, but especially to focus on the most recent approaches
and novel research trends on this topic, not yet covered by previous surveys.Comment: To appear in the Encyclopedia of Big Data Technologie
Uncovering nodes that spread information between communities in social networks
From many datasets gathered in online social networks, well defined community
structures have been observed. A large number of users participate in these
networks and the size of the resulting graphs poses computational challenges.
There is a particular demand in identifying the nodes responsible for
information flow between communities; for example, in temporal Twitter networks
edges between communities play a key role in propagating spikes of activity
when the connectivity between communities is sparse and few edges exist between
different clusters of nodes. The new algorithm proposed here is aimed at
revealing these key connections by measuring a node's vicinity to nodes of
another community. We look at the nodes which have edges in more than one
community and the locality of nodes around them which influence the information
received and broadcasted to them. The method relies on independent random walks
of a chosen fixed number of steps, originating from nodes with edges in more
than one community. For the large networks that we have in mind, existing
measures such as betweenness centrality are difficult to compute, even with
recent methods that approximate the large number of operations required. We
therefore design an algorithm that scales up to the demand of current big data
requirements and has the ability to harness parallel processing capabilities.
The new algorithm is illustrated on synthetic data, where results can be judged
carefully, and also on a real, large scale Twitter activity data, where new
insights can be gained
Mining Frequent Graph Patterns with Differential Privacy
Discovering frequent graph patterns in a graph database offers valuable
information in a variety of applications. However, if the graph dataset
contains sensitive data of individuals such as mobile phone-call graphs and
web-click graphs, releasing discovered frequent patterns may present a threat
to the privacy of individuals. {\em Differential privacy} has recently emerged
as the {\em de facto} standard for private data analysis due to its provable
privacy guarantee. In this paper we propose the first differentially private
algorithm for mining frequent graph patterns.
We first show that previous techniques on differentially private discovery of
frequent {\em itemsets} cannot apply in mining frequent graph patterns due to
the inherent complexity of handling structural information in graphs. We then
address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling
based algorithm. Unlike previous work on frequent itemset mining, our
techniques do not rely on the output of a non-private mining algorithm.
Instead, we observe that both frequent graph pattern mining and the guarantee
of differential privacy can be unified into an MCMC sampling framework. In
addition, we establish the privacy and utility guarantee of our algorithm and
propose an efficient neighboring pattern counting technique as well.
Experimental results show that the proposed algorithm is able to output
frequent patterns with good precision
- …