1,621 research outputs found
Distribution of PageRank Mass Among Principle Components of the Web
We study the PageRank mass of principal components in a bow-tie Web Graph, as
a function of the damping factor c. Using a singular perturbation approach, we
show that the PageRank share of IN and SCC components remains high even for
very large values of the damping factor, in spite of the fact that it drops to
zero when c goes to one. However, a detailed study of the OUT component reveals
the presence ``dead-ends'' (small groups of pages linking only to each other)
that receive an unfairly high ranking when c is close to one. We argue that
this problem can be mitigated by choosing c as small as 1/2
Random Surfing Without Teleportation
In the standard Random Surfer Model, the teleportation matrix is necessary to
ensure that the final PageRank vector is well-defined. The introduction of this
matrix, however, results in serious problems and imposes fundamental
limitations to the quality of the ranking vectors. In this work, building on
the recently proposed NCDawareRank framework, we exploit the decomposition of
the underlying space into blocks, and we derive easy to check necessary and
sufficient conditions for random surfing without teleportation.Comment: 13 pages. Published in the Volume: "Algorithms, Probability, Networks
and Games, Springer-Verlag, 2015". (The updated version corrects small
typos/errors
LexRank: Graph-based Lexical Centrality as Salience in Text Summarization
We introduce a stochastic graph-based method for computing relative
importance of textual units for Natural Language Processing. We test the
technique on the problem of Text Summarization (TS). Extractive TS relies on
the concept of sentence salience to identify the most important sentences in a
document or set of documents. Salience is typically defined in terms of the
presence of particular important words or in terms of similarity to a centroid
pseudo-sentence. We consider a new approach, LexRank, for computing sentence
importance based on the concept of eigenvector centrality in a graph
representation of sentences. In this model, a connectivity matrix based on
intra-sentence cosine similarity is used as the adjacency matrix of the graph
representation of sentences. Our system, based on LexRank ranked in first place
in more than one task in the recent DUC 2004 evaluation. In this paper we
present a detailed analysis of our approach and apply it to a larger data set
including data from earlier DUC evaluations. We discuss several methods to
compute centrality using the similarity graph. The results show that
degree-based methods (including LexRank) outperform both centroid-based methods
and other systems participating in DUC in most of the cases. Furthermore, the
LexRank with threshold method outperforms the other degree-based techniques
including continuous LexRank. We also show that our approach is quite
insensitive to the noise in the data that may result from an imperfect topical
clustering of documents
Clustering and Community Detection in Directed Networks: A Survey
Networks (or graphs) appear as dominant structures in diverse domains,
including sociology, biology, neuroscience and computer science. In most of the
aforementioned cases graphs are directed - in the sense that there is
directionality on the edges, making the semantics of the edges non symmetric.
An interesting feature that real networks present is the clustering or
community structure property, under which the graph topology is organized into
modules commonly called communities or clusters. The essence here is that nodes
of the same community are highly similar while on the contrary, nodes across
communities present low similarity. Revealing the underlying community
structure of directed complex networks has become a crucial and
interdisciplinary topic with a plethora of applications. Therefore, naturally
there is a recent wealth of research production in the area of mining directed
graphs - with clustering being the primary method and tool for community
detection and evaluation. The goal of this paper is to offer an in-depth review
of the methods presented so far for clustering directed networks along with the
relevant necessary methodological background and also related applications. The
survey commences by offering a concise review of the fundamental concepts and
methodological base on which graph clustering algorithms capitalize on. Then we
present the relevant work along two orthogonal classifications. The first one
is mostly concerned with the methodological principles of the clustering
algorithms, while the second one approaches the methods from the viewpoint
regarding the properties of a good cluster in a directed network. Further, we
present methods and metrics for evaluating graph clustering results,
demonstrate interesting application domains and provide promising future
research directions.Comment: 86 pages, 17 figures. Physics Reports Journal (To Appear
Complex Beauty
Complex systems and their underlying convoluted networks are ubiquitous, all
we need is an eye for them. They pose problems of organized complexity which
cannot be approached with a reductionist method. Complexity science and its
emergent sister network science both come to grips with the inherent complexity
of complex systems with an holistic strategy. The relevance of complexity,
however, transcends the sciences. Complex systems and networks are the focal
point of a philosophical, cultural and artistic turn of our tightly
interrelated and interdependent postmodern society. Here I take a different,
aesthetic perspective on complexity. I argue that complex systems can be
beautiful and can the object of artification - the neologism refers to
processes in which something that is not regarded as art in the traditional
sense of the word is changed into art. Complex systems and networks are
powerful sources of inspiration for the generative designer, for the artful
data visualizer, as well as for the traditional artist. I finally discuss the
benefits of a cross-fertilization between science and art
- …