165,365 research outputs found
Fast k-means based on KNN Graph
In the era of big data, k-means clustering has been widely adopted as a basic
processing tool in various contexts. However, its computational cost could be
prohibitively high as the data size and the cluster number are large. It is
well known that the processing bottleneck of k-means lies in the operation of
seeking closest centroid in each iteration. In this paper, a novel solution
towards the scalability issue of k-means is presented. In the proposal, k-means
is supported by an approximate k-nearest neighbors graph. In the k-means
iteration, each data sample is only compared to clusters that its nearest
neighbors reside. Since the number of nearest neighbors we consider is much
less than k, the processing cost in this step becomes minor and irrelevant to
k. The processing bottleneck is therefore overcome. The most interesting thing
is that k-nearest neighbor graph is constructed by iteratively calling the fast
-means itself. Comparing with existing fast k-means variants, the proposed
algorithm achieves hundreds to thousands times speed-up while maintaining high
clustering quality. As it is tested on 10 million 512-dimensional data, it
takes only 5.2 hours to produce 1 million clusters. In contrast, to fulfill the
same scale of clustering, it would take 3 years for traditional k-means
Popularity versus Similarity in Growing Networks
Popularity is attractive -- this is the formula underlying preferential
attachment, a popular explanation for the emergence of scaling in growing
networks. If new connections are made preferentially to more popular nodes,
then the resulting distribution of the number of connections that nodes have
follows power laws observed in many real networks. Preferential attachment has
been directly validated for some real networks, including the Internet.
Preferential attachment can also be a consequence of different underlying
processes based on node fitness, ranking, optimization, random walks, or
duplication. Here we show that popularity is just one dimension of
attractiveness. Another dimension is similarity. We develop a framework where
new connections, instead of preferring popular nodes, optimize certain
trade-offs between popularity and similarity. The framework admits a geometric
interpretation, in which popularity preference emerges from local optimization.
As opposed to preferential attachment, the optimization framework accurately
describes large-scale evolution of technological (Internet), social (web of
trust), and biological (E.coli metabolic) networks, predicting the probability
of new links in them with a remarkable precision. The developed framework can
thus be used for predicting new links in evolving networks, and provides a
different perspective on preferential attachment as an emergent phenomenon
Extraction and Analysis of Facebook Friendship Relations
Online Social Networks (OSNs) are a unique Web and social phenomenon, affecting tastes and behaviors of their users and helping them to maintain/create friendships. It is interesting to analyze the growth and evolution of Online Social Networks both from the point of view of marketing and other of new services and from a scientific viewpoint, since their structure and evolution may share similarities with real-life social networks. In social sciences, several techniques for analyzing (online) social networks have been developed, to evaluate quantitative properties (e.g., defining metrics and measures of structural characteristics of the networks) or qualitative aspects (e.g., studying the attachment model for the network evolution, the binary trust relationships, and the link prediction problem).\ud
However, OSN analysis poses novel challenges both to Computer and Social scientists. We present our long-term research effort in analyzing Facebook, the largest and arguably most successful OSN today: it gathers more than 500 million users. Access to data about Facebook users and their friendship relations, is restricted; thus, we acquired the necessary information directly from the front-end of the Web site, in order to reconstruct a sub-graph representing anonymous interconnections among a significant subset of users. We describe our ad-hoc, privacy-compliant crawler for Facebook data extraction. To minimize bias, we adopt two different graph mining techniques: breadth-first search (BFS) and rejection sampling. To analyze the structural properties of samples consisting of millions of nodes, we developed a specific tool for analyzing quantitative and qualitative properties of social networks, adopting and improving existing Social Network Analysis (SNA) techniques and algorithms
Recommended from our members
The role of human factors in stereotyping behavior and perception of digital library users: A robust clustering approach
To deliver effective personalization for digital library users, it is necessary to identify which human factors are most relevant in determining the behavior and perception of these users. This paper examines three key human factors: cognitive styles, levels of expertise and gender differences, and utilizes three individual clustering techniques: k-means, hierarchical clustering and fuzzy clustering to understand user behavior and perception. Moreover, robust clustering, capable of correcting the bias of individual clustering techniques, is used to obtain a deeper understanding. The robust clustering approach produced results that highlighted the relevance of cognitive style for user behavior, i.e., cognitive style dominates and justifies each of the robust clusters created. We also found that perception was mainly determined by the level of expertise of a user. We conclude that robust clustering is an effective technique to analyze user behavior and perception
Complex Networks
An outline of recent work on complex networks is given from the point of view
of a physicist. Motivation, achievements and goals are discussed with some of
the typical applications from a wide range of academic fields. An introduction
to the relevant literature and useful resources is also given.Comment: Review for Contemporary Physics, 31 page
- …