232,220 research outputs found
Community Detection in Networks with Node Attributes
Community detection algorithms are fundamental tools that allow us to uncover
organizational principles in networks. When detecting communities, there are
two possible sources of information one can use: the network structure, and the
features and attributes of nodes. Even though communities form around nodes
that have common edges and common attributes, typically, algorithms have only
focused on one of these two data modalities: community detection algorithms
traditionally focus only on the network structure, while clustering algorithms
mostly consider only node attributes. In this paper, we develop Communities
from Edge Structure and Node Attributes (CESNA), an accurate and scalable
algorithm for detecting overlapping communities in networks with node
attributes. CESNA statistically models the interaction between the network
structure and the node attributes, which leads to more accurate community
detection as well as improved robustness in the presence of noise in the
network structure. CESNA has a linear runtime in the network size and is able
to process networks an order of magnitude larger than comparable approaches.
Last, CESNA also helps with the interpretation of detected communities by
finding relevant node attributes for each community.Comment: Published in the proceedings of IEEE ICDM '1
DEMON: a Local-First Discovery Method for Overlapping Communities
Community discovery in complex networks is an interesting problem with a
number of applications, especially in the knowledge extraction task in social
and information networks. However, many large networks often lack a particular
community organization at a global level. In these cases, traditional graph
partitioning algorithms fail to let the latent knowledge embedded in modular
structure emerge, because they impose a top-down global view of a network. We
propose here a simple local-first approach to community discovery, able to
unveil the modular organization of real complex networks. This is achieved by
democratically letting each node vote for the communities it sees surrounding
it in its limited view of the global system, i.e. its ego neighborhood, using a
label propagation algorithm; finally, the local communities are merged into a
global collection. We tested this intuition against the state-of-the-art
overlapping and non-overlapping community discovery methods, and found that our
new method clearly outperforms the others in the quality of the obtained
communities, evaluated by using the extracted communities to predict the
metadata about the nodes of several real world networks. We also show how our
method is deterministic, fully incremental, and has a limited time complexity,
so that it can be used on web-scale real networks.Comment: 9 pages; Proceedings of the 18th ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, Beijing, China, August 12-16, 201
Scalable SD Erlang Computation Model
The technical report presents implementation of s groups and semi-explicit placement of the Scalable Distributed (SD) Erlang. The implementation is done on the basis of Erlang/OTP 17.4. The source code can be found in https://github.com/release-project/otp/tree/17.4-rebased. We start with a discussion of differences between distributed Erlang global groups and SD Erlang s groups (Chapter 1). Then we discuss the implementation of s groups and the features of sixteen functions that were modified and introduced in global and s group modules (Chapter 2). After that we discuss semi-explicit placement, node attributes and choose node/1 function (Chapter 3). These functions were unit tested (Chapter 4). Finally, we discuss future work (Chapter 5)
Query Modification in Object-oriented Database Federation
We discuss the modification of queries against an integrated view in a federation of object-oriented databases. We present a generalisation of existing algorithms for simple global query processing that works for arbitrarily defined integration classes. We then extend this algorithm to deal with object-oriented features such as queries involving path expressions and nesting. We show how properties of the OO-style of modelling relationships through object references can be exploited to reduce the number of subqueries necessary to evaluate such querie
On bicluster aggregation and its benefits for enumerative solutions
Biclustering involves the simultaneous clustering of objects and their
attributes, thus defining local two-way clustering models. Recently, efficient
algorithms were conceived to enumerate all biclusters in real-valued datasets.
In this case, the solution composes a complete set of maximal and non-redundant
biclusters. However, the ability to enumerate biclusters revealed a challenging
scenario: in noisy datasets, each true bicluster may become highly fragmented
and with a high degree of overlapping. It prevents a direct analysis of the
obtained results. To revert the fragmentation, we propose here two approaches
for properly aggregating the whole set of enumerated biclusters: one based on
single linkage and the other directly exploring the rate of overlapping. Both
proposals were compared with each other and with the actual state-of-the-art in
several experiments, and they not only significantly reduced the number of
biclusters but also consistently increased the quality of the solution.Comment: 15 pages, will be published by Springer Verlag in the LNAI Series in
the book Advances in Data Minin
Ordered community structure in networks
Community structure in networks is often a consequence of homophily, or
assortative mixing, based on some attribute of the vertices. For example,
researchers may be grouped into communities corresponding to their research
topic. This is possible if vertex attributes have discrete values, but many
networks exhibit assortative mixing by some continuous-valued attribute, such
as age or geographical location. In such cases, no discrete communities can be
identified. We consider how the notion of community structure can be
generalized to networks that are based on continuous-valued attributes: in
general, a network may contain discrete communities which are ordered according
to their attribute values. We propose a method of generating synthetic ordered
networks and investigate the effect of ordered community structure on the
spread of infectious diseases. We also show that community detection algorithms
fail to recover community structure in ordered networks, and evaluate an
alternative method using a layout algorithm to recover the ordering.Comment: This is an extended preprint version that includes an extra example:
the college football network as an ordered (spatial) network. Further
improvements, not included here, appear in the journal version. Original
title changed (from "Ordered and continuous community structure in networks")
to match journal versio
- âŠ