Search CORE

11,180 research outputs found

Methods of Hierarchical Clustering

Author: Contreras Pedro
Murtagh Fionn
Publication venue
Publication date: 01/01/2011
Field of study

We survey agglomerative hierarchical clustering algorithms and discuss efficient implementations that are available in R and other software environments. We look at hierarchical self-organizing maps, and mixture models. We review grid-based clustering, focusing on hierarchical density-based approaches. Finally we describe a recently developed very efficient (linear time) hierarchical clustering algorithm, which can also be viewed as a hierarchical grid-based algorithm.Comment: 21 pages, 2 figures, 1 table, 69 reference

arXiv.org e-Print Archive

Royal Holloway Research Online

Royal Holloway - Pure

Discriminative Link Prediction using Local Links, Node Features and Community Structure

Author: Chakrabarti Soumen
De Abir
Ganguly Niloy
Publication venue
Publication date: 17/10/2013
Field of study

A link prediction (LP) algorithm is given a graph, and has to rank, for each node, other nodes that are candidates for new linkage. LP is strongly motivated by social search and recommendation applications. LP techniques often focus on global properties (graph conductance, hitting or commute times, Katz score) or local properties (Adamic-Adar and many variations, or node feature vectors), but rarely combine these signals. Furthermore, neither of these extremes exploit link densities at the intermediate level of communities. In this paper we describe a discriminative LP algorithm that exploits two new signals. First, a co-clustering algorithm provides community level link density estimates, which are used to qualify observed links with a surprise value. Second, links in the immediate neighborhood of the link to be predicted are not interpreted at face value, but through a local model of node feature similarities. These signals are combined into a discriminative link predictor. We evaluate the new predictor using five diverse data sets that are standard in the literature. We report on significant accuracy boosts compared to standard LP methods (including Adamic-Adar and random walk). Apart from the new predictor, another contribution is a rigorous protocol for benchmarking and reporting LP algorithms, which reveals the regions of strengths and weaknesses of all the predictors studied here, and establishes the new proposal as the most robust.Comment: 10 pages, 5 figure

arXiv.org e-Print Archive

Crossref

Local Variation as a Statistical Hypothesis Test

Author: Baltaxe Michael
Lindenbaum Michael
Meer Peter
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 24/04/2015
Field of study

The goal of image oversegmentation is to divide an image into several pieces, each of which should ideally be part of an object. One of the simplest and yet most effective oversegmentation algorithms is known as local variation (LV) (Felzenszwalb and Huttenlocher 2004). In this work, we study this algorithm and show that algorithms similar to LV can be devised by applying different statistical models and decisions, thus providing further theoretical justification and a well-founded explanation for the unexpected high performance of the LV approach. Some of these algorithms are based on statistics of natural images and on a hypothesis testing decision; we denote these algorithms probabilistic local variation (pLV). The best pLV algorithm, which relies on censored estimation, presents state-of-the-art results while keeping the same computational complexity of the LV algorithm

arXiv.org e-Print Archive

A Robust Clustering Method Using Compositional Data Restrictions: Studying Wood Properties in the Reforestation of Portugal

Author: Chiroque-Solano Pamela M.
Moreira Guido A.
Publication venue: DigitalCommons@USU
Publication date: 19/05/2022
Field of study

Classification of multivariate observations while preserving the data’s natural restriction is a challenge. Special properties such as identifiability, interpretability, and others need to be cared for to build a new approach. To avoid these complications, many transformation algorithms have been developed to use traditional models.In this context, the aim of this work is to propose a robust probabilistic distance algorithm to classify compositional data. Based on the probabilistic distance (PD) clustering approach, the proposal identifies clusters minimizing a joint distance function, JDF, which is part of a dissimilarity measure. This measure combines the PD clustering approach with the density of the Dirichlet distribution. This procedure allows us to create clusters, and define the number of clusters by accommodating the data’s natural data compositional restriction.This work was motivated by the forestry area in the restoration context.The composition dataset of the populations of Pinus nigra was analyzed via the proposed robust probabilistic distance clustering algorithm. The proposed method allows us to classify the new physical, chemical, and mechanical P. nigra’ properties into clusters. The main results identify compositional clusters which provide support for wider areas’ recognition. In addition, the results can be used in decisions to spread sustainable forest management

DigitalCommons@USU

A Short Survey on Data Clustering Algorithms

Author: Wong Ka-Chun
Publication venue
Publication date: 25/11/2015
Field of study

With rapidly increasing data, clustering algorithms are important tools for data analytics in modern research. They have been successfully applied to a wide range of domains; for instance, bioinformatics, speech recognition, and financial analysis. Formally speaking, given a set of data instances, a clustering algorithm is expected to divide the set of data instances into the subsets which maximize the intra-subset similarity and inter-subset dissimilarity, where a similarity measure is defined beforehand. In this work, the state-of-the-arts clustering algorithms are reviewed from design concept to methodology; Different clustering paradigms are discussed. Advanced clustering algorithms are also discussed. After that, the existing clustering evaluation metrics are reviewed. A summary with future insights is provided at the end

arXiv.org e-Print Archive

Crossref