5,963 research outputs found
On morphological hierarchical representations for image processing and spatial data clustering
Hierarchical data representations in the context of classi cation and data
clustering were put forward during the fties. Recently, hierarchical image
representations have gained renewed interest for segmentation purposes. In this
paper, we briefly survey fundamental results on hierarchical clustering and
then detail recent paradigms developed for the hierarchical representation of
images in the framework of mathematical morphology: constrained connectivity
and ultrametric watersheds. Constrained connectivity can be viewed as a way to
constrain an initial hierarchy in such a way that a set of desired constraints
are satis ed. The framework of ultrametric watersheds provides a generic scheme
for computing any hierarchical connected clustering, in particular when such a
hierarchy is constrained. The suitability of this framework for solving
practical problems is illustrated with applications in remote sensing
Recommended from our members
Approaches to conceptual clustering
Methods for Conceptual Clustering may be explicated in two lights. Conceptual Clustering methods may be viewed as extensions to techniques of numerical taxonomy, a collection of methods developed by social and natural scientists for creating classification schemes over object sets. Alternatively, conceptual clustering may be viewed as a form of learning by observation or concept formation, as opposed to methods of learning from examples or concept identification. In this paper we survey and compare a number of conceptual clustering methods along dimensions suggested by each of these views. The point we most wish to clarify is that conceptual clustering processes can be explicated as being composed of three distinct but inter-dependent subprocesses: the process of deriving a hierarchical classification scheme; the process of aggregating objects into individual classes; and the process of assigning conceptual descriptions to object classes. Each subprocess may be characterized along a number of dimensions related to search, thus facilitating a better understanding of the conceptual clustering process as a whole
Statistical Mechanics of Community Detection
Starting from a general \textit{ansatz}, we show how community detection can
be interpreted as finding the ground state of an infinite range spin glass. Our
approach applies to weighted and directed networks alike. It contains the
\textit{at hoc} introduced quality function from \cite{ReichardtPRL} and the
modularity as defined by Newman and Girvan \cite{Girvan03} as special
cases. The community structure of the network is interpreted as the spin
configuration that minimizes the energy of the spin glass with the spin states
being the community indices. We elucidate the properties of the ground state
configuration to give a concise definition of communities as cohesive subgroups
in networks that is adaptive to the specific class of network under study.
Further we show, how hierarchies and overlap in the community structure can be
detected. Computationally effective local update rules for optimization
procedures to find the ground state are given. We show how the \textit{ansatz}
may be used to discover the community around a given node without detecting all
communities in the full network and we give benchmarks for the performance of
this extension. Finally, we give expectation values for the modularity of
random graphs, which can be used in the assessment of statistical significance
of community structure
Communities in Networks
We survey some of the concepts, methods, and applications of community
detection, which has become an increasingly important area of network science.
To help ease newcomers into the field, we provide a guide to available
methodology and open problems, and discuss why scientists from diverse
backgrounds are interested in these problems. As a running theme, we emphasize
the connections of community detection to problems in statistical physics and
computational optimization.Comment: survey/review article on community structure in networks; published
version is available at
http://people.maths.ox.ac.uk/~porterm/papers/comnotices.pd
Business-oriented Analysis of a Social Network of University Students
Despites the great interest caused by social networks in Business Science, their analysis is rarely performed both in a global and systematic way in this field: most authors focus on parts of the studied network, or on a few nodes considered individually. This could be explained by the fact that practical extraction of social networks is a difficult and costly task, since the specific relational data it requires are often difficult to access and thereby expensive. One may ask if equivalent information could be extracted from less expensive individual data, i.e. data concerning single individuals instead of several ones. In this work, we try to tackle this problem through group detection. We gather both types of data from a population of students, and estimate groups separately using individual and relational data, leading to sets of clusters and communities, respectively. We found out there is no strong overlapping between them, meaning both types of data do not convey the same information in this specific context, and can therefore be considered as complementary. However, a link, even if weak, exists and appears when we identify the most discriminant attributes relatively to the communities. Implications in Business Science include community prediction using individual data.Social Networks; Business Science; Cluster Analysis; Community Detection; Community Comparison; Individual Data; Relational Data
Analysis of Network Clustering Algorithms and Cluster Quality Metrics at Scale
Notions of community quality underlie network clustering. While studies
surrounding network clustering are increasingly common, a precise understanding
of the realtionship between different cluster quality metrics is unknown. In
this paper, we examine the relationship between stand-alone cluster quality
metrics and information recovery metrics through a rigorous analysis of four
widely-used network clustering algorithms -- Louvain, Infomap, label
propagation, and smart local moving. We consider the stand-alone quality
metrics of modularity, conductance, and coverage, and we consider the
information recovery metrics of adjusted Rand score, normalized mutual
information, and a variant of normalized mutual information used in previous
work. Our study includes both synthetic graphs and empirical data sets of sizes
varying from 1,000 to 1,000,000 nodes.
We find significant differences among the results of the different cluster
quality metrics. For example, clustering algorithms can return a value of 0.4
out of 1 on modularity but score 0 out of 1 on information recovery. We find
conductance, though imperfect, to be the stand-alone quality metric that best
indicates performance on information recovery metrics. Our study shows that the
variant of normalized mutual information used in previous work cannot be
assumed to differ only slightly from traditional normalized mutual information.
Smart local moving is the best performing algorithm in our study, but
discrepancies between cluster evaluation metrics prevent us from declaring it
absolutely superior. Louvain performed better than Infomap in nearly all the
tests in our study, contradicting the results of previous work in which Infomap
was superior to Louvain. We find that although label propagation performs
poorly when clusters are less clearly defined, it scales efficiently and
accurately to large graphs with well-defined clusters
Interpretable Clustering using Unsupervised Binary Trees
We herein introduce a new method of interpretable clustering that uses
unsupervised binary trees. It is a three-stage procedure, the first stage of
which entails a series of recursive binary splits to reduce the heterogeneity
of the data within the new subsamples. During the second stage (pruning),
consideration is given to whether adjacent nodes can be aggregated. Finally,
during the third stage (joining), similar clusters are joined together, even if
they do not descend from the same node originally. Consistency results are
obtained, and the procedure is used on simulated and real data sets.Comment: 25 pages, 6 figure
Considerations about multistep community detection
The problem and implications of community detection in networks have raised a
huge attention, for its important applications in both natural and social
sciences. A number of algorithms has been developed to solve this problem,
addressing either speed optimization or the quality of the partitions
calculated. In this paper we propose a multi-step procedure bridging the
fastest, but less accurate algorithms (coarse clustering), with the slowest,
most effective ones (refinement). By adopting heuristic ranking of the nodes,
and classifying a fraction of them as `critical', a refinement step can be
restricted to this subset of the network, thus saving computational time.
Preliminary numerical results are discussed, showing improvement of the final
partition.Comment: 12 page
- …