30,257 research outputs found
Generating, Visualizing and Evaluating High Quality Clusters for Information Organization
We present and analyze the star clustering algorithm. We discuss an implementation of this algorithm that supports browsing and document retrieval through information organization. We define three parameters for evaluating a clustering algorithm to measure the topic separation and topic aggregation achieved by the algorithm. In the absence of benchmarks, we present a method for randomly generating clustering data. Data from our user study shows evidence that the star algorithm is effective for organizing information
Variational Inference for Stochastic Block Models from Sampled Data
This paper deals with non-observed dyads during the sampling of a network and
consecutive issues in the inference of the Stochastic Block Model (SBM). We
review sampling designs and recover Missing At Random (MAR) and Not Missing At
Random (NMAR) conditions for the SBM. We introduce variants of the variational
EM algorithm for inferring the SBM under various sampling designs (MAR and
NMAR) all available as an R package. Model selection criteria based on
Integrated Classification Likelihood are derived for selecting both the number
of blocks and the sampling design. We investigate the accuracy and the range of
applicability of these algorithms with simulations. We explore two real-world
networks from ethnology (seed circulation network) and biology (protein-protein
interaction network), where the interpretations considerably depends on the
sampling designs considered
Communities as Well Separated Subgraphs With Cohesive Cores: Identification of Core-Periphery Structures in Link Communities
Communities in networks are commonly considered as highly cohesive subgraphs
which are well separated from the rest of the network. However, cohesion and
separation often cannot be maximized at the same time, which is why a
compromise is sought by some methods. When a compromise is not suitable for the
problem to be solved it might be advantageous to separate the two criteria. In
this paper, we explore such an approach by defining communities as well
separated subgraphs which can have one or more cohesive cores surrounded by
peripheries. We apply this idea to link communities and present an algorithm
for constructing hierarchical core-periphery structures in link communities and
first test results.Comment: 12 pages, 2 figures, submitted version of a paper accepted for the
7th International Conference on Complex Networks and Their Applications,
December 11-13, 2018, Cambridge, UK; revised version at
http://141.20.126.227/~qm/papers
Analysis of Professional Trajectories using Disconnected Self-Organizing Maps
In this paper we address an important economic question. Is there, as
mainstream economic theory asserts it, an homogeneous labor market with
mechanisms which govern supply and demand for work, producing an equilibrium
with its remarkable properties? Using the Panel Study of Income Dynamics (PSID)
collected on the period 1984-2003, we study the situations of American workers
with respect to employment. The data include all heads of household (men or
women) as well as the partners who are on the labor market, working or not.
They are extracted from the complete survey and we compute a few relevant
features which characterize the worker's situations. To perform this analysis,
we suggest using a Self-Organizing Map (SOM, Kohonen algorithm) with specific
structure based on planar graphs, with disconnected components (called D-SOM),
especially interesting for clustering. We compare the results to those obtained
with a classical SOM grid and a star-shaped map (called SOS). Each component of
D-SOM takes the form of a string and corresponds to an organized cluster. From
this clustering, we study the trajectories of the individuals among the classes
by using the transition probability matrices for each period and the
corresponding stationary distributions. As a matter of fact, we find clear
evidence of heterogeneous parts, each one with high homo-geneity, representing
situations well identified in terms of activity and wage levels and in degree
of stability in the workplace. These results and their interpretation in
economic terms contribute to the debate about flexibility which is commonly
seen as a way to obtain a better level of equilibrium on the labor market
Characterizing the community structure of complex networks
Community structure is one of the key properties of complex networks and
plays a crucial role in their topology and function. While an impressive amount
of work has been done on the issue of community detection, very little
attention has been so far devoted to the investigation of communities in real
networks. We present a systematic empirical analysis of the statistical
properties of communities in large information, communication, technological,
biological, and social networks. We find that the mesoscopic organization of
networks of the same category is remarkably similar. This is reflected in
several characteristics of community structure, which can be used as
``fingerprints'' of specific network categories. While community size
distributions are always broad, certain categories of networks consist mainly
of tree-like communities, while others have denser modules. Average path
lengths within communities initially grow logarithmically with community size,
but the growth saturates or slows down for communities larger than a
characteristic size. This behaviour is related to the presence of hubs within
communities, whose roles differ across categories. Also the community
embeddedness of nodes, measured in terms of the fraction of links within their
communities, has a characteristic distribution for each category. Our findings
are verified by the use of two fundamentally different community detection
methods.Comment: 15 pages, 20 figures, 4 table
- …