8,329 research outputs found
Detecting Communities in Networks by Merging Cliques
Many algorithms have been proposed for detecting disjoint communities
(relatively densely connected subgraphs) in networks. One popular technique is
to optimize modularity, a measure of the quality of a partition in terms of the
number of intracommunity and intercommunity edges. Greedy approximate
algorithms for maximizing modularity can be very fast and effective. We propose
a new algorithm that starts by detecting disjoint cliques and then merges these
to optimize modularity. We show that this performs better than other similar
algorithms in terms of both modularity and execution speed.Comment: 5 pages, 7 figure
Comparing Community Structure to Characteristics in Online Collegiate Social Networks
We study the structure of social networks of students by examining the graphs
of Facebook "friendships" at five American universities at a single point in
time. We investigate each single-institution network's community structure and
employ graphical and quantitative tools, including standardized pair-counting
methods, to measure the correlations between the network communities and a set
of self-identified user characteristics (residence, class year, major, and high
school). We review the basic properties and statistics of the pair-counting
indices employed and recall, in simplified notation, a useful analytical
formula for the z-score of the Rand coefficient. Our study illustrates how to
examine different instances of social networks constructed in similar
environments, emphasizes the array of social forces that combine to form
"communities," and leads to comparative observations about online social lives
that can be used to infer comparisons about offline social structures. In our
illustration of this methodology, we calculate the relative contributions of
different characteristics to the community structure of individual universities
and subsequently compare these relative contributions at different
universities, measuring for example the importance of common high school
affiliation to large state universities and the varying degrees of influence
common major can have on the social structure at different universities. The
heterogeneity of communities that we observe indicates that these networks
typically have multiple organizing factors rather than a single dominant one.Comment: Version 3 (17 pages, 5 multi-part figures), accepted in SIAM Revie
Observer-biased bearing condition monitoring: from fault detection to multi-fault classification
Bearings are simultaneously a fundamental component and one of the principal causes of failure in rotary machinery. The work focuses on the employment of fuzzy clustering for bearing condition monitoring, i.e., fault detection and classification. The output of a clustering algorithm is a data partition (a set of clusters) which is merely a hypothesis on the structure of the data. This hypothesis requires validation by domain experts. In general, clustering algorithms allow a limited usage of domain knowledge on the cluster formation process. In this study, a novel method allowing for interactive clustering in bearing fault diagnosis is proposed. The method resorts to shrinkage to generalize an otherwise unbiased clustering algorithm into a biased one. In this way, the method provides a natural and intuitive way to control the cluster formation process, allowing for the employment of domain knowledge to guiding it. The domain expert can select a desirable level of granularity ranging from fault detection to classification of a variable number of faults and can select a specific region of the feature space for detailed analysis. Moreover, experimental results under realistic conditions show that the adopted algorithm outperforms the corresponding unbiased algorithm (fuzzy c-means) which is being widely used in this type of problems. (C) 2016 Elsevier Ltd. All rights reserved.Grant number: 145602
Structural Equation Modeling and simultaneous clustering through the Partial Least Squares algorithm
The identification of different homogeneous groups of observations and their
appropriate analysis in PLS-SEM has become a critical issue in many appli-
cation fields. Usually, both SEM and PLS-SEM assume the homogeneity of all
units on which the model is estimated, and approaches of segmentation present
in literature, consist in estimating separate models for each segments of
statistical units, which have been obtained either by assigning the units to
segments a priori defined. However, these approaches are not fully accept- able
because no causal structure among the variables is postulated. In other words,
a modeling approach should be used, where the obtained clusters are homogeneous
with respect to the structural causal relationships. In this paper, a new
methodology for simultaneous non-hierarchical clus- tering and PLS-SEM is
proposed. This methodology is motivated by the fact that the sequential
approach of applying first SEM or PLS-SEM and second the clustering algorithm
such as K-means on the latent scores of the SEM/PLS-SEM may fail to find the
correct clustering structure existing in the data. A simulation study and an
application on real data are included to evaluate the performance of the
proposed methodology
Preprocessing Solar Images while Preserving their Latent Structure
Telescopes such as the Atmospheric Imaging Assembly aboard the Solar Dynamics
Observatory, a NASA satellite, collect massive streams of high resolution
images of the Sun through multiple wavelength filters. Reconstructing
pixel-by-pixel thermal properties based on these images can be framed as an
ill-posed inverse problem with Poisson noise, but this reconstruction is
computationally expensive and there is disagreement among researchers about
what regularization or prior assumptions are most appropriate. This article
presents an image segmentation framework for preprocessing such images in order
to reduce the data volume while preserving as much thermal information as
possible for later downstream analyses. The resulting segmented images reflect
thermal properties but do not depend on solving the ill-posed inverse problem.
This allows users to avoid the Poisson inverse problem altogether or to tackle
it on each of 10 segments rather than on each of 10 pixels,
reducing computing time by a factor of 10. We employ a parametric
class of dissimilarities that can be expressed as cosine dissimilarity
functions or Hellinger distances between nonlinearly transformed vectors of
multi-passband observations in each pixel. We develop a decision theoretic
framework for choosing the dissimilarity that minimizes the expected loss that
arises when estimating identifiable thermal properties based on segmented
images rather than on a pixel-by-pixel basis. We also examine the efficacy of
different dissimilarities for recovering clusters in the underlying thermal
properties. The expected losses are computed under scientifically motivated
prior distributions. Two simulation studies guide our choices of dissimilarity
function. We illustrate our method by segmenting images of a coronal hole
observed on 26 February 2015
Hierarchical information clustering by means of topologically embedded graphs
We introduce a graph-theoretic approach to extract clusters and hierarchies
in complex data-sets in an unsupervised and deterministic manner, without the
use of any prior information. This is achieved by building topologically
embedded networks containing the subset of most significant links and analyzing
the network structure. For a planar embedding, this method provides both the
intra-cluster hierarchy, which describes the way clusters are composed, and the
inter-cluster hierarchy which describes how clusters gather together. We
discuss performance, robustness and reliability of this method by first
investigating several artificial data-sets, finding that it can outperform
significantly other established approaches. Then we show that our method can
successfully differentiate meaningful clusters and hierarchies in a variety of
real data-sets. In particular, we find that the application to gene expression
patterns of lymphoma samples uncovers biologically significant groups of genes
which play key-roles in diagnosis, prognosis and treatment of some of the most
relevant human lymphoid malignancies.Comment: 33 Pages, 18 Figures, 5 Table
- …