Search CORE

144,974 research outputs found

Combination of Cluster Method for Segmentation of Web Visitors

Author: Eddy P I Ketut
Santosa Budi
Suprapto Yoyon K.
Yuhefizar Yuhefizar
Publication venue: 'Universitas Ahmad Dahlan'
Publication date: 01/03/2013
Field of study

Clustering is one of the important part in web usage miningfor the purpose of segmenting visitors. This action is very important for web personalization orweb modification. In this paper, we perform clustering of the web visitors using a combination of methods of hierarchical and non-hierarchical clustering toward web log data. Hierarchical clustering method used to determine the number of clusters, and non-hierarchical clustering method is used in forming clusters. The stages of cluster analysis are preceded by pre-processing the data and factor analysis. With this approach, the owner of the web is more effective at finding access patterns of web visitors and can have new knowledge about visitors’ segmentation. From the test applied on ITS’s web log data, 6 clusters of web visitors are resulted. Among the 6 cluster, cluster 3 has the biggest number of members. This information can be useful for web management to pay attention on members’ behavioral patterns of the 3rd cluster’s either to make personalization or modification on the web. The test results show the feasibility and efficiency of application of this method

Journal of Education and Learning (EduLearn)

TELKOMNIKA (Telecommunication Computing Electronics and Control)

UAD Journal Management System

Incremental and hierarchical classification of a personal image collection on mobile devices

Author: Pigeau Antoine
Publication venue: Springer Verlag
Publication date: 01/01/2010
Field of study

International audienceBrowsing multimedia collection on mobile devices raises the needs for new multimedia indexing solutions. In this paper, we focus on the management of personal image collections. We propose a method to simplify the browsing task on such a collection. The contributions reside in an incremental hierarchical algorithm, a method to provide a textual representation of the groups obtained and an algorithm to build a geo-temporal view of the collection. The proposed incremental hierarchical algorithm builds a temporal tree from the time stamp of each image. We opt here for a combination of a supervised clustering and an incremental algorithm based on mixture model. Good properties of the hierarchy are determined automatically thanks to the Integrated Likelihood Criterion (ICL). Based on the events obtained, a textual representation is proposed and then used to improve our temporal classification, combining geographical and temporal information. Results are validated on several real user collections with our prototype MyOwnLife

Clustering gene expression data with a penalized graph-based metric

Author: A Baya
A Ben-Hur
A Fred
A Karatzoglou
A Ng
A Richards
A Soukas
A Thalamuthu
AA Alizadeh
AI Su
AK Jain
AK Jain
Ariel E Bayá
B Fischer
B Fischer
B Fischer
B King
B Tjaden
BJ Frey
EJ Yeoh
EP Xing
EY Kim
G McLachlan
G Milligan
J McQueen
J Risinger
J Shawe-Taylor
J Shi
J Tenenbaum
JP Brunet
K Yeung
L Dyrskjot
L Heyer
L Kaufman
L Li
L Liu
M Belkin
M Brito
M de Souto
M Dettling
M Filippone
M Polito
MB Eisen
N Mekuz
P Arabie
P Franti
P Marttinen
Pablo M Granitto
PHA Sneath
R Shai
R Tibshirani
R Tibshirani
R Waite
R Xu
S Calza
S Michele Leone
S Monti
S Pomeroy
S Ramaswamy
S Roweis
TH Cormen
TR Golub
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background The search for cluster structure in microarray datasets is a base problem for the so-called "-omic sciences". A difficult problem in clustering is how to handle data with a manifold structure, i.e. data that is not shaped in the form of compact clouds of points, forming arbitrary shapes or paths embedded in a high-dimensional space, as could be the case of some gene expression datasets. Results In this work we introduce the Penalized k-Nearest-Neighbor-Graph (PKNNG) based metric, a new tool for evaluating distances in such cases. The new metric can be used in combination with most clustering algorithms. The PKNNG metric is based on a two-step procedure: first it constructs the k-Nearest-Neighbor-Graph of the dataset of interest using a low k-value and then it adds edges with a highly penalized weight for connecting the subgraphs produced by the first step. We discuss several possible schemes for connecting the different sub-graphs as well as penalization functions. We show clustering results on several public gene expression datasets and simulated artificial problems to evaluate the behavior of the new metric. Conclusions In all cases the PKNNG metric shows promising clustering results. The use of the PKNNG metric can improve the performance of commonly used pairwise-distance based clustering methods, to the level of more advanced algorithms. A great advantage of the new procedure is that researchers do not need to learn a new method, they can simply compute distances with the PKNNG metric and then, for example, use hierarchical clustering to produce an accurate and highly interpretable dendrogram of their high-dimensional data.</p

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

CONICET Digital

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Repositorio Hipermedial de la Universidad Nacional de Rosario

From Data Topology to a Modular Classifier

Author: Ennaji Abdel
Lecourtier Yves
Ribert Arnaud
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2003
Field of study

This article describes an approach to designing a distributed and modular neural classifier. This approach introduces a new hierarchical clustering that enables one to determine reliable regions in the representation space by exploiting supervised information. A multilayer perceptron is then associated with each of these detected clusters and charged with recognizing elements of the associated cluster while rejecting all others. The obtained global classifier is comprised of a set of cooperating neural networks and completed by a K-nearest neighbor classifier charged with treating elements rejected by all the neural networks. Experimental results for the handwritten digit recognition problem and comparison with neural and statistical nonmodular classifiers are given

arXiv.org e-Print Archive

HAL - Normandie Université

Crossref

Coping with new Challenges in Clustering and Biomedical Imaging

Author: Oswald Annahita
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 21/07/2011
Field of study

The last years have seen a tremendous increase of data acquisition in different scientific fields such as molecular biology, bioinformatics or biomedicine. Therefore, novel methods are needed for automatic data processing and analysis of this large amount of data. Data mining is the process of applying methods like clustering or classification to large databases in order to uncover hidden patterns. Clustering is the task of partitioning points of a data set into distinct groups in order to minimize the intra cluster similarity and to maximize the inter cluster similarity. In contrast to unsupervised learning like clustering, the classification problem is known as supervised learning that aims at the prediction of group membership of data objects on the basis of rules learned from a training set where the group membership is known. Specialized methods have been proposed for hierarchical and partitioning clustering. However, these methods suffer from several drawbacks. In the first part of this work, new clustering methods are proposed that cope with problems from conventional clustering algorithms. ITCH (Information-Theoretic Cluster Hierarchies) is a hierarchical clustering method that is based on a hierarchical variant of the Minimum Description Length (MDL) principle which finds hierarchies of clusters without requiring input parameters. As ITCH may converge only to a local optimum we propose GACH (Genetic Algorithm for Finding Cluster Hierarchies) that combines the benefits from genetic algorithms with information-theory. In this way the search space is explored more effectively. Furthermore, we propose INTEGRATE a novel clustering method for data with mixed numerical and categorical attributes. Supported by the MDL principle our method integrates the information provided by heterogeneous numerical and categorical attributes and thus naturally balances the influence of both sources of information. A competitive evaluation illustrates that INTEGRATE is more effective than existing clustering methods for mixed type data. Besides clustering methods for single data objects we provide a solution for clustering different data sets that are represented by their skylines. The skyline operator is a well-established database primitive for finding database objects which minimize two or more attributes with an unknown weighting between these attributes. In this thesis, we define a similarity measure, called SkyDist, for comparing skylines of different data sets that can directly be integrated into different data mining tasks such as clustering or classification. The experiments show that SkyDist in combination with different clustering algorithms can give useful insights into many applications. In the second part, we focus on the analysis of high resolution magnetic resonance images (MRI) that are clinically relevant and may allow for an early detection and diagnosis of several diseases. In particular, we propose a framework for the classification of Alzheimer's disease in MR images combining the data mining steps of feature selection, clustering and classification. As a result, a set of highly selective features discriminating patients with Alzheimer and healthy people has been identified. However, the analysis of the high dimensional MR images is extremely time-consuming. Therefore we developed JGrid, a scalable distributed computing solution designed to allow for a large scale analysis of MRI and thus an optimized prediction of diagnosis. In another study we apply efficient algorithms for motif discovery to task-fMRI scans in order to identify patterns in the brain that are characteristic for patients with somatoform pain disorder. We find groups of brain compartments that occur frequently within the brain networks and discriminate well among healthy and diseased people

Digitale Hochschulschriften der LMU

ClustGeo: an R package for hierarchical clustering with spatial constraints

Author: Chavent Marie
Kuentz-Simonet Vanessa
Labenne Amaury
Saracco Jérôme
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 13/12/2017
Field of study

In this paper, we propose a Ward-like hierarchical clustering algorithm including spatial/geographical constraints. Two dissimilarity matrices

D_0

and

D_1

are inputted, along with a mixing parameter

\alpha \in [0,1]

. The dissimilarities can be non-Euclidean and the weights of the observations can be non-uniform. The first matrix gives the dissimilarities in the "feature space" and the second matrix gives the dissimilarities in the "constraint space". The criterion minimized at each stage is a convex combination of the homogeneity criterion calculated with

D_0

and the homogeneity criterion calculated with

D_1

. The idea is then to determine a value of

\alpha

which increases the spatial contiguity without deteriorating too much the quality of the solution based on the variables of interest i.e. those of the feature space. This procedure is illustrated on a real dataset using the R package ClustGeo

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

HAL Descartes

Oskar Bordeaux

Belief Hierarchical Clustering

Author: J. Schubert
J.C. Bezdek
L.M. Zouhal
M. Masson
P. Smets
T. Denœux
T. Denœux
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

In the data mining field many clustering methods have been proposed, yet standard versions do not take into account uncertain databases. This paper deals with a new approach to cluster uncertain data by using a hierarchical clustering defined within the belief function framework. The main objective of the belief hierarchical clustering is to allow an object to belong to one or several clusters. To each belonging, a degree of belief is associated, and clusters are combined based on the pignistic properties. Experiments with real uncertain data show that our proposed method can be considered as a propitious tool

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Recommended from our members

The role of human factors in stereotyping behavior and perception of digital library users: A robust clustering approach

Author: Chen SY
Frias-Martinez E
Liu X
Macredie RD
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 03/04/2007
Field of study

To deliver effective personalization for digital library users, it is necessary to identify which human factors are most relevant in determining the behavior and perception of these users. This paper examines three key human factors: cognitive styles, levels of expertise and gender differences, and utilizes three individual clustering techniques: k-means, hierarchical clustering and fuzzy clustering to understand user behavior and perception. Moreover, robust clustering, capable of correcting the bias of individual clustering techniques, is used to obtain a deeper understanding. The robust clustering approach produced results that highlighted the relevance of cognitive style for user behavior, i.e., cognitive style dominates and justifies each of the robust clusters created. We also found that perception was mainly determined by the level of expertise of a user. We conclude that robust clustering is an effective technique to analyze user behavior and perception

Brunel University Research Archive

Axiomatic Construction of Hierarchical Clustering in Asymmetric Networks

Author: Carlsson Gunnar
Mémoli Facundo
Ribeiro Alejandro
Segarra Santiago
Publication venue
Publication date: 01/01/2013
Field of study

This paper considers networks where relationships between nodes are represented by directed dissimilarities. The goal is to study methods for the determination of hierarchical clusters, i.e., a family of nested partitions indexed by a connectivity parameter, induced by the given dissimilarity structures. Our construction of hierarchical clustering methods is based on defining admissible methods to be those methods that abide by the axioms of value - nodes in a network with two nodes are clustered together at the maximum of the two dissimilarities between them - and transformation - when dissimilarities are reduced, the network may become more clustered but not less. Several admissible methods are constructed and two particular methods, termed reciprocal and nonreciprocal clustering, are shown to provide upper and lower bounds in the space of admissible methods. Alternative clustering methodologies and axioms are further considered. Allowing the outcome of hierarchical clustering to be asymmetric, so that it matches the asymmetry of the original data, leads to the inception of quasi-clustering methods. The existence of a unique quasi-clustering method is shown. Allowing clustering in a two-node network to proceed at the minimum of the two dissimilarities generates an alternative axiomatic construction. There is a unique clustering method in this case too. The paper also develops algorithms for the computation of hierarchical clusters using matrix powers on a min-max dioid algebra and studies the stability of the methods proposed. We proved that most of the methods introduced in this paper are such that similar networks yield similar hierarchical clustering results. Algorithms are exemplified through their application to networks describing internal migration within states of the United States (U.S.) and the interrelation between sectors of the U.S. economy.Comment: This is a largely extended version of the previous conference submission under the same title. The current version contains the material in the previous version (published in ICASSP 2013) as well as material presented at the Asilomar Conference on Signal, Systems, and Computers 2013, GlobalSIP 2013, and ICML 2014. Also, unpublished material is included in the current versio

arXiv.org e-Print Archive

Adelaide Research & Scholarship