24 research outputs found

    Several Remarks on Dissimilarities and Ultrametrics

    Get PDF
    We investigate the relationships between tolerance relations, equivalence relations, and ultrametrics. The set of spheres associated to an ultrametric space has a tree structure that rejects a hierarchy on the set of equivalences associated to that space. We show that every ultrametric defined on a finite space is a linear combination of binary ultrametric and we introduce the notion of ultrametricity for dissimilarities, which has applications in many data mining problems

    Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks

    Full text link
    This paper considers networks where relationships between nodes are represented by directed dissimilarities. The goal is to study methods for the determination of hierarchical clusters, i.e., a family of nested partitions indexed by a connectivity parameter, induced by the given dissimilarity structures. Our construction of hierarchical clustering methods is based on defining admissible methods to be those methods that abide by the axioms of value - nodes in a network with two nodes are clustered together at the maximum of the two dissimilarities between them - and transformation - when dissimilarities are reduced, the network may become more clustered but not less. Several admissible methods are constructed and two particular methods, termed reciprocal and nonreciprocal clustering, are shown to provide upper and lower bounds in the space of admissible methods. Alternative clustering methodologies and axioms are further considered. Allowing the outcome of hierarchical clustering to be asymmetric, so that it matches the asymmetry of the original data, leads to the inception of quasi-clustering methods. The existence of a unique quasi-clustering method is shown. Allowing clustering in a two-node network to proceed at the minimum of the two dissimilarities generates an alternative axiomatic construction. There is a unique clustering method in this case too. The paper also develops algorithms for the computation of hierarchical clusters using matrix powers on a min-max dioid algebra and studies the stability of the methods proposed. We proved that most of the methods introduced in this paper are such that similar networks yield similar hierarchical clustering results. Algorithms are exemplified through their application to networks describing internal migration within states of the United States (U.S.) and the interrelation between sectors of the U.S. economy.Comment: This is a largely extended version of the previous conference submission under the same title. The current version contains the material in the previous version (published in ICASSP 2013) as well as material presented at the Asilomar Conference on Signal, Systems, and Computers 2013, GlobalSIP 2013, and ICML 2014. Also, unpublished material is included in the current versio

    Directed binary hierarchies and directed ultrametrics

    Get PDF
    Directed binary hierarchies have been introduced in order to give a graphical reduced representation of a family of association rules. This type of structure extends in a very specific way that underlying binary hierarchical classification. In this paper an accurate formalization of this new structure is studied. A binary directed hierarchy is defined as a set of ordered pairs of subsets of the initial individual set satisfying specific conditions. New notion of directed ultrametricity is studied. The main result consists of establishing a bijective correspondence between a directed ultrametric space and a directed binary hierarchy. Moreover, an algorithm is proposed in order to transform a directed ultrametric structure into a graphical representation associated with a directed binary hierarchy

    Directed binary hierarchies and directed ultrametrics

    Get PDF
    Les hiérarchies binaires orientées ont été introduites pour fournir une représentation graphique orientée d'une famille de règles implicatives d'association. Une telle structure étend d'une façon très spécifique celle sous jacente aux arbres binaires hiérarchiques de classification. Nous proposons ici une formalisation précise de ce nouveau type de structure. Une hiérarchie binaire orientée est définie comme une famille de couples (ordonnés) de parties de l'ensemble à organiser remplissant des conditions spécifiques. Une nouvelle notion d'ultramétricité binaire orientée est construite. le résultat fondamental consiste en la mise en correspondance bijective entre une structure binaire ultramétrique orientée et une hiérarchie binaire orientée. De plus, un algorithme est proposé pour passer de la structure ultramétrique à celle graphique d'un arbre binaire orienté et valué

    Metric Representations Of Networks

    Get PDF
    The goal of this thesis is to analyze networks by first projecting them onto structured metric-like spaces -- governed by a generalized triangle inequality -- and then leveraging this structure to facilitate the analysis. Networks encode relationships between pairs of nodes, however, the relationship between two nodes can be independent of the other ones and need not be defined for every pair. This is not true for metric spaces, where the triangle inequality imposes conditions that must be satisfied by triads of distances and these must be defined for every pair of nodes. In general terms, this additional structure facilitates the analysis and algorithm design in metric spaces. In deriving metric projections for networks, an axiomatic approach is pursued where we encode as axioms intuitively desirable properties and then seek for admissible projections satisfying these axioms. Although small variations are introduced throughout the thesis, the axioms of projection -- a network that already has the desired metric structure must remain unchanged -- and transformation -- when reducing dissimilarities in a network the projected distances cannot increase -- shape all of the axiomatic constructions considered. Notwithstanding their apparent weakness, the aforementioned axioms serve as a solid foundation for the theory of metric representations of networks. We begin by focusing on hierarchical clustering of asymmetric networks, which can be framed as a network projection problem onto ultrametric spaces. We show that the set of admissible methods is infinite but bounded in a well-defined sense and state additional desirable properties to further winnow the admissibility landscape. Algorithms for the clustering methods developed are also derived and implemented. We then shift focus to projections onto generalized q-metric spaces, a parametric family containing among others the (regular) metric and ultrametric spaces. A uniqueness result is shown for the projection of symmetric networks whereas for asymmetric networks we prove that all admissible projections are contained between two extreme methods. Furthermore, projections are illustrated via their implementation for efficient search and data visualization. Lastly, our analysis is extended to encompass projections of dioid spaces, natural algebraic generalizations of weighted networks

    The Metric Nearness Problem

    Get PDF
    Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness

    Benchmarking in cluster analysis: A white paper

    Get PDF
    To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made