Search CORE

3 research outputs found

Measure based metrics for aggregated data

Author: Rayward-Smith V. J.
Publication venue: 'IOS Press'
Publication date: 01/01/2011
Field of study

Aggregated data arises commonly from surveys and censuses where groups of individuals are studied as coherent entities. The aggregated data can take many forms including sets, intervals, distributions and histograms. The data analyst needs to measure the similarity between such aggregated data items and a range of metrics are reported in the literature to achieve this (e.g. the Jaccard metric for sets and the Wasserstein metric for histograms). In this paper, a unifying theory based on measure theory is developed that establishes not only that known metrics are essentially similar but also suggests new metrics

University of East Anglia digital repository

Clustering an interval data set : are the main partitions similar to a priori partition?

Author: Bacelar-Nicolau Helena
Nicolau Fernando C.
Silva Osvaldo
Sousa Áurea
Publication venue: 'International Journal of Current Research in Science, Engineering & Technology (IJCRSET)'
Publication date: 01/11/2015
Field of study

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.In this paper we compare the best partitions of data units (cities) obtained from different algorithms of Ascendant Hierarchical Cluster Analysis (AHCA) of a well-known data set of the literature on symbolic data analysis (“city temperature interval data set”) with a priori partition of cities given by a panel of human observers. The AHCA was based on the weighted generalised affinity with equal weights, and on the probabilistic coefficient associated with the asymptotic standardized weighted generalized affinity coefficient by the method of Wald and Wolfowitz. These similarity coefficients between elements were combined with three aggregation criteria, one classical, Single Linkage (SL), and the other ones probabilistic, AV1 and AVB, the last ones in the scope of the VL methodology. The evaluation of the partitions in order to find the partitioning that best fits the underlying data was carried out using some validation measures based on the similarity matrices. In general, global satisfactory results have been obtained using our methods, being the best partitions quite close (or even coinciding) with the a priori partition provided by the panel of human observers

Repositório da Universidade dos Açores

Representações euclidianas de dados : uma abordagem para variáveis heterogéneas

Author: Dória Isabel Maria Tudela Reimão Pinto de França, 1952-
Publication venue
Publication date: 01/01/2008
Field of study

Tese de doutoramento, Medicina (Biomatemática), Universidade de Lisboa, Faculdade de Medicina, 2009Disponível no document

Universidade de Lisboa: Repositório.UL