965 research outputs found

    Batch and median neural gas

    Full text link
    Neural Gas (NG) constitutes a very robust clustering algorithm given euclidian data which does not suffer from the problem of local minima like simple vector quantization, or topological restrictions like the self-organizing map. Based on the cost function of NG, we introduce a batch variant of NG which shows much faster convergence and which can be interpreted as an optimization of the cost function by the Newton method. This formulation has the additional benefit that, based on the notion of the generalized median in analogy to Median SOM, a variant for non-vectorial proximity data can be introduced. We prove convergence of batch and median versions of NG, SOM, and k-means in a unified formulation, and we investigate the behavior of the algorithms in several experiments.Comment: In Special Issue after WSOM 05 Conference, 5-8 september, 2005, Pari

    Clustering Permutations: New Techniques with Streaming Applications

    Get PDF
    10.4230/LIPIcs.ITCS.2023.3125

    A generic framework for median graph computation based on a recursive embedding approach

    Get PDF
    The median graph has been shown to be a good choice to obtain a representative of a set of graphs. However, its computation is a complex problem. Recently, graph embedding into vector spaces has been proposed to obtain approximations of the median graph. The problem with such an approach is how to go from a point in the vector space back to a graph in the graph space. The main contribution of this paper is the generalization of this previous method, proposing a generic recursive procedure that permits to recover the graph corresponding to a point in the vector space, introducing only the amount of approximation inherent to the use of graph matching algorithms. In order to evaluate the proposed method, we compare it with the set median and with the other state-of-the-art embedding-based methods for the median graph computation. The experiments are carried out using four different databases (one semi-artificial and three containing real-world data). Results show that with the proposed approach we can obtain better medians, in terms of the sum of distances to the training graphs, than with the previous existing methods. © 2011 Elsevier Inc. All rights reserved.This work has been supported by the Spanish research programmes Consolider Ingenio 2010 CSD2007-00018, TIN2006-15694-C02-02 and TIN2008-04998 and the fellowship RYC-2009-05031.Peer Reviewe

    Exploring the Use of Rasch Models to Construct Measures of Firms’ Profitability with Multiple Discretization Ratio-type Data

    Get PDF
    Ratio-type data plays an important role in real-world data analysis. Mass ratios have been created for different purposes, depending on time and people’s needs. Then, it is necessary to create a comprehensive score to extract information from those mass ratios when they measure the same concept from different perspectives. Therefore, this study adopts the same logic of psychometrics to systematically conduct scale development on ratio-type data under the Rasch model. However, it is first necessary to discretize the ratio-type data for use in the Rasch model. Therefore, this study also explores the effect of different data discretization methods on scale development by using financial profitability ratios as a demonstration. Results show that retaining more ratio categories can benefit Rasch modeling because it can better inform the model. The dynamic clustering algorithm, k-median is a better method for extracting characteristic patterns of the ratio-type data and preparing the data for the Rasch model. This study illustrates that there is no one-way good discretization method for ratio-type data under the Rasch model. It is more reasonable to use the traditional algorithm if each ratio has a target benchmark, whereas the k-median clustering algorithm achieves good modeling results when benchmark information is lacking
    corecore