Search CORE

24 research outputs found

Several Remarks on Dissimilarities and Ultrametrics

Author: D. A. Simovici
Publication venue: 'Scientific Annals of Computer Science'
Publication date: 01/06/2015
Field of study

We investigate the relationships between tolerance relations, equivalence relations, and ultrametrics. The set of spheres associated to an ultrametric space has a tree structure that rejects a hierarchy on the set of equivalences associated to that space. We show that every ultrametric defined on a finite space is a linear combination of binary ultrametric and we introduce the notion of ultrametricity for dissimilarities, which has applications in many data mining problems

Directory of Open Access Journals

Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks

Author: Carlsson Gunnar
Mémoli Facundo
Ribeiro Alejandro
Segarra Santiago
Publication venue
Publication date: 01/01/2013
Field of study

This paper considers networks where relationships between nodes are represented by directed dissimilarities. The goal is to study methods for the determination of hierarchical clusters, i.e., a family of nested partitions indexed by a connectivity parameter, induced by the given dissimilarity structures. Our construction of hierarchical clustering methods is based on defining admissible methods to be those methods that abide by the axioms of value - nodes in a network with two nodes are clustered together at the maximum of the two dissimilarities between them - and transformation - when dissimilarities are reduced, the network may become more clustered but not less. Several admissible methods are constructed and two particular methods, termed reciprocal and nonreciprocal clustering, are shown to provide upper and lower bounds in the space of admissible methods. Alternative clustering methodologies and axioms are further considered. Allowing the outcome of hierarchical clustering to be asymmetric, so that it matches the asymmetry of the original data, leads to the inception of quasi-clustering methods. The existence of a unique quasi-clustering method is shown. Allowing clustering in a two-node network to proceed at the minimum of the two dissimilarities generates an alternative axiomatic construction. There is a unique clustering method in this case too. The paper also develops algorithms for the computation of hierarchical clusters using matrix powers on a min-max dioid algebra and studies the stability of the methods proposed. We proved that most of the methods introduced in this paper are such that similar networks yield similar hierarchical clustering results. Algorithms are exemplified through their application to networks describing internal migration within states of the United States (U.S.) and the interrelation between sectors of the U.S. economy.Comment: This is a largely extended version of the previous conference submission under the same title. The current version contains the material in the previous version (published in ICASSP 2013) as well as material presented at the Asilomar Conference on Signal, Systems, and Computers 2013, GlobalSIP 2013, and ICML 2014. Also, unpublished material is included in the current versio

arXiv.org e-Print Archive

Adelaide Research & Scholarship

Directed binary hierarchies and directed ultrametrics

Author: Kuntz Pascale
Lerman Israël-César
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

Directed binary hierarchies have been introduced in order to give a graphical reduced representation of a family of association rules. This type of structure extends in a very specific way that underlying binary hierarchical classification. In this paper an accurate formalization of this new structure is studied. A binary directed hierarchy is defined as a set of ordered pairs of subsets of the initial individual set satisfying specific conditions. New notion of directed ultrametricity is studied. The main result consists of establishing a bijective correspondence between a directed ultrametric space and a directed binary hierarchy. Moreover, an algorithm is proposed in order to transform a directed ultrametric structure into a graphical representation associated with a directed binary hierarchy

Directed binary hierarchies and directed ultrametrics

Author: Kuntz Pascale
Lerman Israël-César
Publication venue: HAL CCSD
Publication date: 01/01/2009
Field of study

Les hiérarchies binaires orientées ont été introduites pour fournir une représentation graphique orientée d'une famille de règles implicatives d'association. Une telle structure étend d'une façon très spécifique celle sous jacente aux arbres binaires hiérarchiques de classification. Nous proposons ici une formalisation précise de ce nouveau type de structure. Une hiérarchie binaire orientée est définie comme une famille de couples (ordonnés) de parties de l'ensemble à organiser remplissant des conditions spécifiques. Une nouvelle notion d'ultramétricité binaire orientée est construite. le résultat fondamental consiste en la mise en correspondance bijective entre une structure binaire ultramétrique orientée et une hiérarchie binaire orientée. De plus, un algorithme est proposé pour passer de la structure ultramétrique à celle graphique d'un arbre binaire orienté et valué

Metric Representations Of Networks

Author: Segarra Santiago
Publication venue: ScholarlyCommons
Publication date: 01/01/2016
Field of study

The goal of this thesis is to analyze networks by first projecting them onto structured metric-like spaces -- governed by a generalized triangle inequality -- and then leveraging this structure to facilitate the analysis. Networks encode relationships between pairs of nodes, however, the relationship between two nodes can be independent of the other ones and need not be defined for every pair. This is not true for metric spaces, where the triangle inequality imposes conditions that must be satisfied by triads of distances and these must be defined for every pair of nodes. In general terms, this additional structure facilitates the analysis and algorithm design in metric spaces. In deriving metric projections for networks, an axiomatic approach is pursued where we encode as axioms intuitively desirable properties and then seek for admissible projections satisfying these axioms. Although small variations are introduced throughout the thesis, the axioms of projection -- a network that already has the desired metric structure must remain unchanged -- and transformation -- when reducing dissimilarities in a network the projected distances cannot increase -- shape all of the axiomatic constructions considered. Notwithstanding their apparent weakness, the aforementioned axioms serve as a solid foundation for the theory of metric representations of networks. We begin by focusing on hierarchical clustering of asymmetric networks, which can be framed as a network projection problem onto ultrametric spaces. We show that the set of admissible methods is infinite but bounded in a well-defined sense and state additional desirable properties to further winnow the admissibility landscape. Algorithms for the clustering methods developed are also derived and implemented. We then shift focus to projections onto generalized q-metric spaces, a parametric family containing among others the (regular) metric and ultrametric spaces. A uniqueness result is shown for the projection of symmetric networks whereas for asymmetric networks we prove that all admissible projections are contained between two extreme methods. Furthermore, projections are illustrated via their implementation for efficient search and data visualization. Lastly, our analysis is extended to encompass projections of dioid spaces, natural algebraic generalizations of weighted networks

ScholarlyCommons@Penn

The Metric Nearness Problem

Author: Brickell Justin
Dhillon Inderjit S.
Sra Suvrit
Tropp Joel A.
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 23/04/2008
Field of study

Metric nearness refers to the problem of optimally restoring metric properties to distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric data can be important in various settings, for example, in clustering, classification, metric-based indexing, query processing, and graph theoretic approximation algorithms. This paper formulates and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set of distances that satisfy the properties of a metric—principally the triangle inequality. For solving this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative projection method. An intriguing aspect of the metric nearness problem is that a special case turns out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and develops a new algorithm for the latter problem using a primal-dual method. Applications to graph clustering are provided as an illustration. We include experiments that demonstrate the computational superiority of triangle fixing over general purpose convex programming software. Finally, we conclude by suggesting various useful extensions and generalizations to metric nearness

Caltech Authors

Benchmarking in cluster analysis: A white paper

Author: Boulesteix Anne-Laure
Dangl Rainer
Dean Nema
Guyon Isabelle
Hennig Christian
Leisch Friedrich
Steinley Douglas
Van Mechelen Iven
Publication venue
Publication date: 01/10/2018
Field of study

To achieve scientific progress in terms of building a cumulative body of knowledge, careful attention to benchmarking is of the utmost importance. This means that proposals of new methods of data pre-processing, new data-analytic techniques, and new methods of output post-processing, should be extensively and carefully compared with existing alternatives, and that existing methods should be subjected to neutral comparison studies. To date, benchmarking and recommendations for benchmarking have been frequently seen in the context of supervised learning. Unfortunately, there has been a dearth of guidelines for benchmarking in an unsupervised setting, with the area of clustering as an important subdomain. To address this problem, discussion is given to the theoretical conceptual underpinnings of benchmarking in the field of cluster analysis by means of simulated as well as empirical data. Subsequently, the practicalities of how to address benchmarking questions in clustering are dealt with, and foundational recommendations are made

arXiv.org e-Print Archive

Proceedings - University of Groningen

ARTS repository - University of Groningen

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

Enlighten

Dissertations of the University of Groningen