137,761 research outputs found
Element-centric clustering comparison unifies overlaps and hierarchy
Clustering is one of the most universal approaches for understanding complex
data. A pivotal aspect of clustering analysis is quantitatively comparing
clusterings; clustering comparison is the basis for many tasks such as
clustering evaluation, consensus clustering, and tracking the temporal
evolution of clusters. In particular, the extrinsic evaluation of clustering
methods requires comparing the uncovered clusterings to planted clusterings or
known metadata. Yet, as we demonstrate, existing clustering comparison measures
have critical biases which undermine their usefulness, and no measure
accommodates both overlapping and hierarchical clusterings. Here we unify the
comparison of disjoint, overlapping, and hierarchically structured clusterings
by proposing a new element-centric framework: elements are compared based on
the relationships induced by the cluster structure, as opposed to the
traditional cluster-centric philosophy. We demonstrate that, in contrast to
standard clustering similarity measures, our framework does not suffer from
critical biases and naturally provides unique insights into how the clusterings
differ. We illustrate the strengths of our framework by revealing new insights
into the organization of clusters in two applications: the improved
classification of schizophrenia based on the overlapping and hierarchical
community structure of fMRI brain networks, and the disentanglement of various
social homophily factors in Facebook social networks. The universality of
clustering suggests far-reaching impact of our framework throughout all areas
of science
Hierarchical modularity in human brain functional networks
The idea that complex systems have a hierarchical modular organization
originates in the early 1960s and has recently attracted fresh support from
quantitative studies of large scale, real-life networks. Here we investigate
the hierarchical modular (or "modules-within-modules") decomposition of human
brain functional networks, measured using functional magnetic resonance imaging
(fMRI) in 18 healthy volunteers under no-task or resting conditions. We used a
customized template to extract networks with more than 1800 regional nodes, and
we applied a fast algorithm to identify nested modular structure at several
hierarchical levels. We used mutual information, 0 < I < 1, to estimate the
similarity of community structure of networks in different subjects, and to
identify the individual network that is most representative of the group.
Results show that human brain functional networks have a hierarchical modular
organization with a fair degree of similarity between subjects, I=0.63. The
largest 5 modules at the highest level of the hierarchy were medial occipital,
lateral occipital, central, parieto-frontal and fronto-temporal systems;
occipital modules demonstrated less sub-modular organization than modules
comprising regions of multimodal association cortex. Connector nodes and hubs,
with a key role in inter-modular connectivity, were also concentrated in
association cortical areas. We conclude that methods are available for
hierarchical modular decomposition of large numbers of high resolution brain
functional networks using computationally expedient algorithms. This could
enable future investigations of Simon's original hypothesis that hierarchy or
near-decomposability of physical symbol systems is a critical design feature
for their fast adaptivity to changing environmental conditions
On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search
Databases of genomic documents contain substantial amounts of structured information in addition to the texts of titles and abstracts. Unstructured information retrieval techniques fail to take advantage of the structured information available. This paper describes a technique to
improve upon traditional retrieval methods by clustering the retrieval result set into two distinct clusters using additional structural information. Our hypothesis is that the relevant documents are to be found in the tightest cluster of the two, as suggested by van Rijsbergen's cluster
hypothesis. We present an experimental evaluation of these ideas based on the relevance judgments of the 2004 TREC workshop Genomics track, and the CLUTO software clustering
package
Beyond 2D-grids: a dependence maximization view on image browsing
Ideally, one would like to perform image search using an intuitive and friendly approach. Many existing image search engines, however, present users with sets of images arranged in some default order on the screen, typically the relevance to a query, only. While this certainly has its advantages, arguably, a more flexible and intuitive way would be to sort images into arbitrary structures such as grids, hierarchies, or spheres so that images that are visually or semantically alike are placed together. This paper focuses on designing such a navigation system for image browsers. This is a challenging task because arbitrary layout structure makes it difficult -- if not impossible -- to compute cross-similarities between images and structure coordinates, the main ingredient of traditional layouting approaches. For this reason, we resort to a recently developed machine learning technique: kernelized sorting. It is a general technique for matching pairs of objects from different domains without requiring cross-domain similarity measures and hence elegantly allows sorting images into arbitrary structures. Moreover, we extend it so that some images can be preselected for instance forming the tip of the hierarchy allowing to subsequently navigate through the search results in the lower levels in an intuitive way
Reconstructing Native Language Typology from Foreign Language Usage
Linguists and psychologists have long been studying cross-linguistic
transfer, the influence of native language properties on linguistic performance
in a foreign language. In this work we provide empirical evidence for this
process in the form of a strong correlation between language similarities
derived from structural features in English as Second Language (ESL) texts and
equivalent similarities obtained from the typological features of the native
languages. We leverage this finding to recover native language typological
similarity structure directly from ESL text, and perform prediction of
typological features in an unsupervised fashion with respect to the target
languages. Our method achieves 72.2% accuracy on the typology prediction task,
a result that is highly competitive with equivalent methods that rely on
typological resources.Comment: CoNLL 201
XML Schema Clustering with Semantic and Hierarchical Similarity Measures
With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis
Recommended from our members
Hierarchical classification for multiple, distributed web databases
The proliferation of online information resources increases the importance of effective and efficient distributed searching. Our research aims to provide an alternative hierarchical categorization and search capability based on a Bayesian network learning algorithm. Our proposed approach, which is grounded on automatic textual analysis of subject content of online web databases, attempts to address the database selection problem by first classifying web databases into a hierarchy of topic categories. The experimental results reported demonstrate that such a classification approach not only effectively reduces the class search space, but also helps to significantly improve the accuracy of classification performance
An experiment with ontology mapping using concept similarity
This paper describes a system for automatically mapping between concepts in different ontologies. The motivation for the research stems from the Diogene project, in which the project's own ontology covering the ICT domain is mapped to external ontologies, in order that their associated content can automatically be included in the Diogene system. An approach involving measuring the similarity of concepts is introduced, in which standard Information Retrieval indexing techniques are applied to concept descriptions. A matrix representing the similarity of concepts in two ontologies is generated, and a mapping is performed based on two parameters: the domain coverage of the ontologies, and their levels of granularity. Finally, some initial experimentation is presented which suggests that our approach meets the project's unique set of requirements
Hierarchical mutual information for the comparison of hierarchical community structures in complex networks
The quest for a quantitative characterization of community and modular
structure of complex networks produced a variety of methods and algorithms to
classify different networks. However, it is not clear if such methods provide
consistent, robust and meaningful results when considering hierarchies as a
whole. Part of the problem is the lack of a similarity measure for the
comparison of hierarchical community structures. In this work we give a
contribution by introducing the {\it hierarchical mutual information}, which is
a generalization of the traditional mutual information, and allows to compare
hierarchical partitions and hierarchical community structures. The {\it
normalized} version of the hierarchical mutual information should behave
analogously to the traditional normalized mutual information. Here, the correct
behavior of the hierarchical mutual information is corroborated on an extensive
battery of numerical experiments. The experiments are performed on artificial
hierarchies, and on the hierarchical community structure of artificial and
empirical networks. Furthermore, the experiments illustrate some of the
practical applications of the hierarchical mutual information. Namely, the
comparison of different community detection methods, and the study of the
consistency, robustness and temporal evolution of the hierarchical modular
structure of networks.Comment: 14 pages and 12 figure
- …