24 research outputs found
Several Remarks on Dissimilarities and Ultrametrics
We investigate the relationships between tolerance relations, equivalence relations, and ultrametrics. The set of spheres associated to an ultrametric space has a tree structure that rejects a hierarchy on the set of equivalences associated to that space. We show that every ultrametric defined on a finite space is a linear combination of binary ultrametric and we introduce the notion of ultrametricity for dissimilarities, which has applications in many data mining problems
Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks
This paper considers networks where relationships between nodes are
represented by directed dissimilarities. The goal is to study methods for the
determination of hierarchical clusters, i.e., a family of nested partitions
indexed by a connectivity parameter, induced by the given dissimilarity
structures. Our construction of hierarchical clustering methods is based on
defining admissible methods to be those methods that abide by the axioms of
value - nodes in a network with two nodes are clustered together at the maximum
of the two dissimilarities between them - and transformation - when
dissimilarities are reduced, the network may become more clustered but not
less. Several admissible methods are constructed and two particular methods,
termed reciprocal and nonreciprocal clustering, are shown to provide upper and
lower bounds in the space of admissible methods. Alternative clustering
methodologies and axioms are further considered. Allowing the outcome of
hierarchical clustering to be asymmetric, so that it matches the asymmetry of
the original data, leads to the inception of quasi-clustering methods. The
existence of a unique quasi-clustering method is shown. Allowing clustering in
a two-node network to proceed at the minimum of the two dissimilarities
generates an alternative axiomatic construction. There is a unique clustering
method in this case too. The paper also develops algorithms for the computation
of hierarchical clusters using matrix powers on a min-max dioid algebra and
studies the stability of the methods proposed. We proved that most of the
methods introduced in this paper are such that similar networks yield similar
hierarchical clustering results. Algorithms are exemplified through their
application to networks describing internal migration within states of the
United States (U.S.) and the interrelation between sectors of the U.S. economy.Comment: This is a largely extended version of the previous conference
submission under the same title. The current version contains the material in
the previous version (published in ICASSP 2013) as well as material presented
at the Asilomar Conference on Signal, Systems, and Computers 2013, GlobalSIP
2013, and ICML 2014. Also, unpublished material is included in the current
versio
Directed binary hierarchies and directed ultrametrics
Directed binary hierarchies have been introduced in order to give a graphical reduced representation of a family of association rules. This type of structure extends in a very specific way that underlying binary hierarchical classification. In this paper an accurate formalization of this new structure is studied. A binary directed hierarchy is defined as a set of ordered pairs of subsets of the initial individual set satisfying specific conditions. New notion of directed ultrametricity is studied. The main result consists of establishing a bijective correspondence between a directed ultrametric space and a directed binary hierarchy. Moreover, an algorithm is proposed in order to transform a directed ultrametric structure into a graphical representation associated with a directed binary hierarchy
Directed binary hierarchies and directed ultrametrics
Les hiérarchies binaires orientées ont été introduites pour fournir une représentation graphique orientée d'une famille de règles implicatives d'association. Une telle structure étend d'une façon très spécifique celle sous jacente aux arbres binaires hiérarchiques de classification. Nous proposons ici une formalisation précise de ce nouveau type de structure. Une hiérarchie binaire orientée est définie comme une famille de couples (ordonnés) de parties de l'ensemble à organiser remplissant des conditions spécifiques. Une nouvelle notion d'ultramétricité binaire orientée est construite. le résultat fondamental consiste en la mise en correspondance bijective entre une structure binaire ultramétrique orientée et une hiérarchie binaire orientée. De plus, un algorithme est proposé pour passer de la structure ultramétrique à celle graphique d'un arbre binaire orienté et valué
Metric Representations Of Networks
The goal of this thesis is to analyze networks by first projecting them onto structured metric-like spaces -- governed by a generalized triangle inequality -- and then leveraging this structure to facilitate the analysis. Networks encode relationships between pairs of nodes, however, the relationship between two nodes can be independent of the other ones and need not be defined for every pair. This is not true for metric spaces, where the triangle inequality imposes conditions that must be satisfied by triads of distances and these must be defined for every pair of nodes. In general terms, this additional structure facilitates the analysis and algorithm design in metric spaces. In deriving metric projections for networks, an axiomatic approach is pursued where we encode as axioms intuitively desirable properties and then seek for admissible projections satisfying these axioms. Although small variations are introduced throughout the thesis, the axioms of projection -- a network that already has the desired metric structure must remain unchanged -- and transformation -- when reducing dissimilarities in a network the projected distances cannot increase -- shape all of the axiomatic constructions considered. Notwithstanding their apparent weakness, the aforementioned axioms serve as a solid foundation for the theory of metric representations of networks.
We begin by focusing on hierarchical clustering of asymmetric networks, which can be framed as a network projection problem onto ultrametric spaces. We show that the set of admissible methods is infinite but bounded in a well-defined sense and state additional desirable properties to further winnow the admissibility landscape. Algorithms for the clustering methods developed are also derived and implemented. We then shift focus to projections onto generalized q-metric spaces, a parametric family containing among others the (regular) metric and ultrametric spaces. A uniqueness result is shown for the projection of symmetric networks whereas for asymmetric networks we prove that all admissible projections are contained between two extreme methods. Furthermore, projections are illustrated via their implementation for efficient search and data visualization. Lastly, our analysis is extended to encompass projections of dioid spaces, natural algebraic generalizations of weighted networks
The Metric Nearness Problem
Metric nearness refers to the problem of optimally restoring metric properties to
distance measurements that happen to be nonmetric due to measurement errors or otherwise. Metric
data can be important in various settings, for example, in clustering, classification, metric-based
indexing, query processing, and graph theoretic approximation algorithms. This paper formulates
and solves the metric nearness problem: Given a set of pairwise dissimilarities, find a “nearest” set
of distances that satisfy the properties of a metric—principally the triangle inequality. For solving
this problem, the paper develops efficient triangle fixing algorithms that are based on an iterative
projection method. An intriguing aspect of the metric nearness problem is that a special case turns
out to be equivalent to the all pairs shortest paths problem. The paper exploits this equivalence and
develops a new algorithm for the latter problem using a primal-dual method. Applications to graph
clustering are provided as an illustration. We include experiments that demonstrate the computational
superiority of triangle fixing over general purpose convex programming software. Finally, we
conclude by suggesting various useful extensions and generalizations to metric nearness
Benchmarking in cluster analysis: A white paper
To achieve scientific progress in terms of building a cumulative body of
knowledge, careful attention to benchmarking is of the utmost importance. This
means that proposals of new methods of data pre-processing, new data-analytic
techniques, and new methods of output post-processing, should be extensively
and carefully compared with existing alternatives, and that existing methods
should be subjected to neutral comparison studies. To date, benchmarking and
recommendations for benchmarking have been frequently seen in the context of
supervised learning. Unfortunately, there has been a dearth of guidelines for
benchmarking in an unsupervised setting, with the area of clustering as an
important subdomain. To address this problem, discussion is given to the
theoretical conceptual underpinnings of benchmarking in the field of cluster
analysis by means of simulated as well as empirical data. Subsequently, the
practicalities of how to address benchmarking questions in clustering are dealt
with, and foundational recommendations are made