8,392 research outputs found
DEMON: a Local-First Discovery Method for Overlapping Communities
Community discovery in complex networks is an interesting problem with a
number of applications, especially in the knowledge extraction task in social
and information networks. However, many large networks often lack a particular
community organization at a global level. In these cases, traditional graph
partitioning algorithms fail to let the latent knowledge embedded in modular
structure emerge, because they impose a top-down global view of a network. We
propose here a simple local-first approach to community discovery, able to
unveil the modular organization of real complex networks. This is achieved by
democratically letting each node vote for the communities it sees surrounding
it in its limited view of the global system, i.e. its ego neighborhood, using a
label propagation algorithm; finally, the local communities are merged into a
global collection. We tested this intuition against the state-of-the-art
overlapping and non-overlapping community discovery methods, and found that our
new method clearly outperforms the others in the quality of the obtained
communities, evaluated by using the extracted communities to predict the
metadata about the nodes of several real world networks. We also show how our
method is deterministic, fully incremental, and has a limited time complexity,
so that it can be used on web-scale real networks.Comment: 9 pages; Proceedings of the 18th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, Beijing, China, August 12-16, 201
Flow-based Influence Graph Visual Summarization
Visually mining a large influence graph is appealing yet challenging. People
are amazed by pictures of newscasting graph on Twitter, engaged by hidden
citation networks in academics, nevertheless often troubled by the unpleasant
readability of the underlying visualization. Existing summarization methods
enhance the graph visualization with blocked views, but have adverse effect on
the latent influence structure. How can we visually summarize a large graph to
maximize influence flows? In particular, how can we illustrate the impact of an
individual node through the summarization? Can we maintain the appealing graph
metaphor while preserving both the overall influence pattern and fine
readability?
To answer these questions, we first formally define the influence graph
summarization problem. Second, we propose an end-to-end framework to solve the
new problem. Our method can not only highlight the flow-based influence
patterns in the visual summarization, but also inherently support rich graph
attributes. Last, we present a theoretic analysis and report our experiment
results. Both evidences demonstrate that our framework can effectively
approximate the proposed influence graph summarization objective while
outperforming previous methods in a typical scenario of visually mining
academic citation networks.Comment: to appear in IEEE International Conference on Data Mining (ICDM),
Shen Zhen, China, December 201
Data Management and Mining in Astrophysical Databases
We analyse the issues involved in the management and mining of astrophysical
data. The traditional approach to data management in the astrophysical field is
not able to keep up with the increasing size of the data gathered by modern
detectors. An essential role in the astrophysical research will be assumed by
automatic tools for information extraction from large datasets, i.e. data
mining techniques, such as clustering and classification algorithms. This asks
for an approach to data management based on data warehousing, emphasizing the
efficiency and simplicity of data access; efficiency is obtained using
multidimensional access methods and simplicity is achieved by properly handling
metadata. Clustering and classification techniques, on large datasets, pose
additional requirements: computational and memory scalability with respect to
the data size, interpretability and objectivity of clustering or classification
results. In this study we address some possible solutions.Comment: 10 pages, Late
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Perspects in astrophysical databases
Astrophysics has become a domain extremely rich of scientific data. Data
mining tools are needed for information extraction from such large datasets.
This asks for an approach to data management emphasizing the efficiency and
simplicity of data access; efficiency is obtained using multidimensional access
methods and simplicity is achieved by properly handling metadata. Moreover,
clustering and classification techniques on large datasets pose additional
requirements in terms of computation and memory scalability and
interpretability of results. In this study we review some possible solutions
- âŠ