Search CORE

4 research outputs found

Fast Algorithm and Implementation of Dissimilarity Self-Organizing Maps

Author: Atkinson
Aïcha El Golli
Bahlmann
Brieuc Conan-Guez
Buhmann
El Golli
El Golli
Fabrice Rossi
Graepel
Graepel
Hammer
Hofmann
Hofmann
Kaufman
Kohonen
Kohonen
Kohonen
Kohonen
Levenshtein
MacQueen
Porter
Seo
Wei
Publication venue: 'Elsevier BV'
Publication date: 01/01/2006
Field of study

In many real world applications, data cannot be accurately represented by vectors. In those situations, one possible solution is to rely on dissimilarity measures that enable sensible comparison between observations. Kohonen's Self-Organizing Map (SOM) has been adapted to data described only through their dissimilarity matrix. This algorithm provides both non linear projection and clustering of non vector data. Unfortunately, the algorithm suffers from a high cost that makes it quite difficult to use with voluminous data sets. In this paper, we propose a new algorithm that provides an important reduction of the theoretical cost of the dissimilarity SOM without changing its outcome (the results are exactly the same as the ones obtained with the original algorithm). Moreover, we introduce implementation methods that result in very short running times. Improvements deduced from the theoretical cost model are validated on simulated and real world data (a word list clustering problem). We also demonstrate that the proposed implementation methods reduce by a factor up to 3 the running time of the fast algorithm over a standard implementation

arXiv.org e-Print Archive

CiteSeerX

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Accélération des cartes auto-organisatrices sur tableau de dissimilarités par séparation et évaluation

Author: Conan-Guez Brieuc
Rossi Fabrice
Publication venue: Editions RNTI
Publication date: 01/01/2008
Field of study

A paraîtreNational audienceIn this paper, a new implementation of the adaptation of Kohonen self-organising maps (SOM) to dissimilarity matrices is proposed. This implementation relies on the branch and bound principle to reduce the algorithm running time. An important property of this new approach is that the obtained algorithm produces exactly the same results as the standard algorithm

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Finding usage patterns from generalized weblog data

Author: Hasan Tahira
Publication venue
Publication date: 01/01/2009
Field of study

Buried in the enormous, heterogeneous and distributed information, contained in the web server access logs, is knowledge with great potential value. As websites continue to grow in number and complexity, web usage mining systems face two significant challenges - scalability and accuracy. This thesis develops a web data generalization technique and incorporates it into the web usage mining framework in an attempt to exploit this information-rich source of data for effective and efficient pattern discovery. Given a concept hierarchy on the web pages, generalization replaces actual page-clicks with their general concepts. Existing methods do this by taking a level-based cut through the concept hierarchy. This adversely affects the quality of mined patterns since, depending on the depth of the chosen level, either significant pages of user interests get coalesced, or many insignificant concepts are retained. We present a usage driven concept ascension algorithm, which only preserves significant items, possibly at different levels in the hierarchy. Concept usage is estimated using a small stratified sample of the large weblog data. A usage threshold is then used to define the nodes to be pruned in the hierarchy for generalization. Our experiments on large real weblog data demonstrate improved performance in terms of quality and computation time of the pattern discovery process. Our algorithm yields an effective and scalable tool for web usage mining

Concordia University Research Repository