3,630 research outputs found
Clustering Sets of Objects Using Concepts-Objects Bipartite Graphs
International audienceIn this paper we deal with data stated under the form of abinary relation between objects and properties. We propose an approachfor clustering the objects and labeling them with characteristic subsetsof properties. The approach is based on a parallel between formal con-cept analysis and graph clustering. The problem is made tricky due tothe fact that generally there is no partitioning of the objects that can beassociated with a partitioning of properties. Indeed a relevant partitionof objects may exist, whereas it is not the case for properties. In order toobtain a conceptual clustering of the objects, we work with a bipartitegraph relating objects with formal concepts. Experiments on artificialbenchmarks and real examples show the effectiveness of the method,more particularly the fact that the results remain stable when an in-creasing number of properties are shared between objects of differentclusters
Simultaneous Embeddability of Two Partitions
We study the simultaneous embeddability of a pair of partitions of the same
underlying set into disjoint blocks. Each element of the set is mapped to a
point in the plane and each block of either of the two partitions is mapped to
a region that contains exactly those points that belong to the elements in the
block and that is bounded by a simple closed curve. We establish three main
classes of simultaneous embeddability (weak, strong, and full embeddability)
that differ by increasingly strict well-formedness conditions on how different
block regions are allowed to intersect. We show that these simultaneous
embeddability classes are closely related to different planarity concepts of
hypergraphs. For each embeddability class we give a full characterization. We
show that (i) every pair of partitions has a weak simultaneous embedding, (ii)
it is NP-complete to decide the existence of a strong simultaneous embedding,
and (iii) the existence of a full simultaneous embedding can be tested in
linear time.Comment: 17 pages, 7 figures, extended version of a paper to appear at GD 201
Soft clustering analysis of galaxy morphologies: A worked example with SDSS
Context: The huge and still rapidly growing amount of galaxies in modern sky
surveys raises the need of an automated and objective classification method.
Unsupervised learning algorithms are of particular interest, since they
discover classes automatically. Aims: We briefly discuss the pitfalls of
oversimplified classification methods and outline an alternative approach
called "clustering analysis". Methods: We categorise different classification
methods according to their capabilities. Based on this categorisation, we
present a probabilistic classification algorithm that automatically detects the
optimal classes preferred by the data. We explore the reliability of this
algorithm in systematic tests. Using a small sample of bright galaxies from the
SDSS, we demonstrate the performance of this algorithm in practice. We are able
to disentangle the problems of classification and parametrisation of galaxy
morphologies in this case. Results: We give physical arguments that a
probabilistic classification scheme is necessary. The algorithm we present
produces reasonable morphological classes and object-to-class assignments
without any prior assumptions. Conclusions: There are sophisticated automated
classification algorithms that meet all necessary requirements, but a lot of
work is still needed on the interpretation of the results.Comment: 18 pages, 19 figures, 2 tables, submitted to A
Clones in Graphs
Finding structural similarities in graph data, like social networks, is a
far-ranging task in data mining and knowledge discovery. A (conceptually)
simple reduction would be to compute the automorphism group of a graph.
However, this approach is ineffective in data mining since real world data does
not exhibit enough structural regularity. Here we step in with a novel approach
based on mappings that preserve the maximal cliques. For this we exploit the
well known correspondence between bipartite graphs and the data structure
formal context from Formal Concept Analysis. From there we utilize
the notion of clone items. The investigation of these is still an open problem
to which we add new insights with this work. Furthermore, we produce a
substantial experimental investigation of real world data. We conclude with
demonstrating the generalization of clone items to permutations.Comment: 11 pages, 2 figures, 1 tabl
Ground truth? Concept-based communities versus the external classification of physics manuscripts
Community detection techniques are widely used to infer hidden structures
within interconnected systems. Despite demonstrating high accuracy on
benchmarks, they reproduce the external classification for many real-world
systems with a significant level of discrepancy. A widely accepted reason
behind such outcome is the unavoidable loss of non-topological information
(such as node attributes) encountered when the original complex system is
represented as a network. In this article we emphasize that the observed
discrepancies may also be caused by a different reason: the external
classification itself. For this end we use scientific publication data which i)
exhibit a well defined modular structure and ii) hold an expert-made
classification of research articles. Having represented the articles and the
extracted scientific concepts both as a bipartite network and as its unipartite
projection, we applied modularity optimization to uncover the inner thematic
structure. The resulting clusters are shown to partly reflect the author-made
classification, although some significant discrepancies are observed. A
detailed analysis of these discrepancies shows that they carry essential
information about the system, mainly related to the use of similar techniques
and methods across different (sub)disciplines, that is otherwise omitted when
only the external classification is considered.Comment: 15 pages, 2 figure
Visualizing and Interacting with Concept Hierarchies
Concept Hierarchies and Formal Concept Analysis are theoretically well
grounded and largely experimented methods. They rely on line diagrams called
Galois lattices for visualizing and analysing object-attribute sets. Galois
lattices are visually seducing and conceptually rich for experts. However they
present important drawbacks due to their concept oriented overall structure:
analysing what they show is difficult for non experts, navigation is
cumbersome, interaction is poor, and scalability is a deep bottleneck for
visual interpretation even for experts. In this paper we introduce semantic
probes as a means to overcome many of these problems and extend usability and
application possibilities of traditional FCA visualization methods. Semantic
probes are visual user centred objects which extract and organize reduced
Galois sub-hierarchies. They are simpler, clearer, and they provide a better
navigation support through a rich set of interaction possibilities. Since probe
driven sub-hierarchies are limited to users focus, scalability is under control
and interpretation is facilitated. After some successful experiments, several
applications are being developed with the remaining problem of finding a
compromise between simplicity and conceptual expressivity
- …