6 research outputs found
Community characterization of heterogeneous complex systems
We introduce an analytical statistical method to characterize the communities
detected in heterogeneous complex systems. By posing a suitable null
hypothesis, our method makes use of the hypergeometric distribution to assess
the probability that a given property is over-expressed in the elements of a
community with respect to all the elements of the investigated set. We apply
our method to two specific complex networks, namely a network of world movies
and a network of physics preprints. The characterization of the elements and of
the communities is done in terms of languages and countries for the movie
network and of journals and subject categories for papers. We find that our
method is able to characterize clearly the identified communities. Moreover our
method works well both for large and for small communities.Comment: 8 pages, 1 figure and 2 table
Identification of clusters of investors from their real trading activity in a financial market
We use statistically validated networks, a recently introduced method to
validate links in a bipartite system, to identify clusters of investors trading
in a financial market. Specifically, we investigate a special database allowing
to track the trading activity of individual investors of the stock Nokia. We
find that many statistically detected clusters of investors show a very high
degree of synchronization in the time when they decide to trade and in the
trading action taken. We investigate the composition of these clusters and we
find that several of them show an over-expression of specific categories of
investors.Comment: 25 pages, 5 figure
Statistically validated networks in bipartite complex systems
Many complex systems present an intrinsic bipartite nature and are often
described and modeled in terms of networks [1-5]. Examples include movies and
actors [1, 2, 4], authors and scientific papers [6-9], email accounts and
emails [10], plants and animals that pollinate them [11, 12]. Bipartite
networks are often very heterogeneous in the number of relationships that the
elements of one set establish with the elements of the other set. When one
constructs a projected network with nodes from only one set, the system
heterogeneity makes it very difficult to identify preferential links between
the elements. Here we introduce an unsupervised method to statistically
validate each link of the projected network against a null hypothesis taking
into account the heterogeneity of the system. We apply our method to three
different systems, namely the set of clusters of orthologous genes (COG) in
completely sequenced genomes [13, 14], a set of daily returns of 500 US
financial stocks, and the set of world movies of the IMDb database [15]. In all
these systems, both different in size and level of heterogeneity, we find that
our method is able to detect network structures which are informative about the
system and are not simply expression of its heterogeneity. Specifically, our
method (i) identifies the preferential relationships between the elements, (ii)
naturally highlights the clustered structure of investigated systems, and (iii)
allows to classify links according to the type of statistically validated
relationships between the connected nodes.Comment: Main text: 13 pages, 3 figures, and 1 Table. Supplementary
information: 15 pages, 3 figures, and 2 Table