137,761 research outputs found

    Element-centric clustering comparison unifies overlaps and hierarchy

    Full text link
    Clustering is one of the most universal approaches for understanding complex data. A pivotal aspect of clustering analysis is quantitatively comparing clusterings; clustering comparison is the basis for many tasks such as clustering evaluation, consensus clustering, and tracking the temporal evolution of clusters. In particular, the extrinsic evaluation of clustering methods requires comparing the uncovered clusterings to planted clusterings or known metadata. Yet, as we demonstrate, existing clustering comparison measures have critical biases which undermine their usefulness, and no measure accommodates both overlapping and hierarchical clusterings. Here we unify the comparison of disjoint, overlapping, and hierarchically structured clusterings by proposing a new element-centric framework: elements are compared based on the relationships induced by the cluster structure, as opposed to the traditional cluster-centric philosophy. We demonstrate that, in contrast to standard clustering similarity measures, our framework does not suffer from critical biases and naturally provides unique insights into how the clusterings differ. We illustrate the strengths of our framework by revealing new insights into the organization of clusters in two applications: the improved classification of schizophrenia based on the overlapping and hierarchical community structure of fMRI brain networks, and the disentanglement of various social homophily factors in Facebook social networks. The universality of clustering suggests far-reaching impact of our framework throughout all areas of science

    Hierarchical modularity in human brain functional networks

    Get PDF
    The idea that complex systems have a hierarchical modular organization originates in the early 1960s and has recently attracted fresh support from quantitative studies of large scale, real-life networks. Here we investigate the hierarchical modular (or "modules-within-modules") decomposition of human brain functional networks, measured using functional magnetic resonance imaging (fMRI) in 18 healthy volunteers under no-task or resting conditions. We used a customized template to extract networks with more than 1800 regional nodes, and we applied a fast algorithm to identify nested modular structure at several hierarchical levels. We used mutual information, 0 < I < 1, to estimate the similarity of community structure of networks in different subjects, and to identify the individual network that is most representative of the group. Results show that human brain functional networks have a hierarchical modular organization with a fair degree of similarity between subjects, I=0.63. The largest 5 modules at the highest level of the hierarchy were medial occipital, lateral occipital, central, parieto-frontal and fronto-temporal systems; occipital modules demonstrated less sub-modular organization than modules comprising regions of multimodal association cortex. Connector nodes and hubs, with a key role in inter-modular connectivity, were also concentrated in association cortical areas. We conclude that methods are available for hierarchical modular decomposition of large numbers of high resolution brain functional networks using computationally expedient algorithms. This could enable future investigations of Simon's original hypothesis that hierarchy or near-decomposability of physical symbol systems is a critical design feature for their fast adaptivity to changing environmental conditions

    On the use of clustering and the MeSH controlled vocabulary to improve MEDLINE abstract search

    Get PDF
    Databases of genomic documents contain substantial amounts of structured information in addition to the texts of titles and abstracts. Unstructured information retrieval techniques fail to take advantage of the structured information available. This paper describes a technique to improve upon traditional retrieval methods by clustering the retrieval result set into two distinct clusters using additional structural information. Our hypothesis is that the relevant documents are to be found in the tightest cluster of the two, as suggested by van Rijsbergen's cluster hypothesis. We present an experimental evaluation of these ideas based on the relevance judgments of the 2004 TREC workshop Genomics track, and the CLUTO software clustering package

    Beyond 2D-grids: a dependence maximization view on image browsing

    Get PDF
    Ideally, one would like to perform image search using an intuitive and friendly approach. Many existing image search engines, however, present users with sets of images arranged in some default order on the screen, typically the relevance to a query, only. While this certainly has its advantages, arguably, a more flexible and intuitive way would be to sort images into arbitrary structures such as grids, hierarchies, or spheres so that images that are visually or semantically alike are placed together. This paper focuses on designing such a navigation system for image browsers. This is a challenging task because arbitrary layout structure makes it difficult -- if not impossible -- to compute cross-similarities between images and structure coordinates, the main ingredient of traditional layouting approaches. For this reason, we resort to a recently developed machine learning technique: kernelized sorting. It is a general technique for matching pairs of objects from different domains without requiring cross-domain similarity measures and hence elegantly allows sorting images into arbitrary structures. Moreover, we extend it so that some images can be preselected for instance forming the tip of the hierarchy allowing to subsequently navigate through the search results in the lower levels in an intuitive way

    Reconstructing Native Language Typology from Foreign Language Usage

    Get PDF
    Linguists and psychologists have long been studying cross-linguistic transfer, the influence of native language properties on linguistic performance in a foreign language. In this work we provide empirical evidence for this process in the form of a strong correlation between language similarities derived from structural features in English as Second Language (ESL) texts and equivalent similarities obtained from the typological features of the native languages. We leverage this finding to recover native language typological similarity structure directly from ESL text, and perform prediction of typological features in an unsupervised fashion with respect to the target languages. Our method achieves 72.2% accuracy on the typology prediction task, a result that is highly competitive with equivalent methods that rely on typological resources.Comment: CoNLL 201

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    An experiment with ontology mapping using concept similarity

    Get PDF
    This paper describes a system for automatically mapping between concepts in different ontologies. The motivation for the research stems from the Diogene project, in which the project's own ontology covering the ICT domain is mapped to external ontologies, in order that their associated content can automatically be included in the Diogene system. An approach involving measuring the similarity of concepts is introduced, in which standard Information Retrieval indexing techniques are applied to concept descriptions. A matrix representing the similarity of concepts in two ontologies is generated, and a mapping is performed based on two parameters: the domain coverage of the ontologies, and their levels of granularity. Finally, some initial experimentation is presented which suggests that our approach meets the project's unique set of requirements

    Hierarchical mutual information for the comparison of hierarchical community structures in complex networks

    Get PDF
    The quest for a quantitative characterization of community and modular structure of complex networks produced a variety of methods and algorithms to classify different networks. However, it is not clear if such methods provide consistent, robust and meaningful results when considering hierarchies as a whole. Part of the problem is the lack of a similarity measure for the comparison of hierarchical community structures. In this work we give a contribution by introducing the {\it hierarchical mutual information}, which is a generalization of the traditional mutual information, and allows to compare hierarchical partitions and hierarchical community structures. The {\it normalized} version of the hierarchical mutual information should behave analogously to the traditional normalized mutual information. Here, the correct behavior of the hierarchical mutual information is corroborated on an extensive battery of numerical experiments. The experiments are performed on artificial hierarchies, and on the hierarchical community structure of artificial and empirical networks. Furthermore, the experiments illustrate some of the practical applications of the hierarchical mutual information. Namely, the comparison of different community detection methods, and the study of the consistency, robustness and temporal evolution of the hierarchical modular structure of networks.Comment: 14 pages and 12 figure
    corecore