Search CORE

1,298 research outputs found

Partitioning Clustering Based on Support Vector Ranking

Author: Huang Lan
Ou Ge
Pang Wei
Peng Qing
Wang Yan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 12/11/2016
Field of study

Postprin

Aberdeen University Research

Ontology Partitioning: Clustering Based Approach

Author
Publication venue: 'MECS Publisher'
Publication date
Field of study

Crossref

Partitioning clustering algorithms for protein sequence data sets

Author: A Enright
A Enright
A Herger
A Krause
DW Mount
E Bolten
E Kriventseva
F Can
G Yona
H Cathy
H Spath
J Hartigan
J Shi
KJ Anil
L Kaufman
Mohamed Limam
N Essoussi
Nadia Essoussi
O Sasson
P Cabena
P Clote
P Pipenbacher
P Sperisen
R Ng
R Tatusov
RC Dubes
S Altschul
S Henikoff
S Schneckener
S Van Dongen
SB Needleman
SE Brenner
Sondes Fayech
TF Smith
UM Fayyad
V Faber
V Guralnik
WR Pearson
Z Wu
Publication venue: BioMed Central
Publication date: 01/04/2009
Field of study

Abstract Background Genome-sequencing projects are currently producing an enormous amount of new sequences and cause the rapid increasing of protein sequence databases. The unsupervised classification of these data into functional groups or families, clustering, has become one of the principal research objectives in structural and functional genomics. Computer programs to automatically and accurately classify sequences into families become a necessity. A significant number of methods have addressed the clustering of protein sequences and most of them can be categorized in three major groups: hierarchical, graph-based and partitioning methods. Among the various sequence clustering methods in literature, hierarchical and graph-based approaches have been widely used. Although partitioning clustering techniques are extremely used in other fields, few applications have been found in the field of protein sequence clustering. It is not fully demonstrated if partitioning methods can be applied to protein sequence data and if these methods can be efficient compared to the published clustering methods. Methods We developed four partitioning clustering approaches using Smith-Waterman local-alignment algorithm to determine pair-wise similarities of sequences. Four different sets of protein sequences were used as evaluation data sets for the proposed methods. Results We show that these methods outperform several other published clustering methods in terms of correctly predicting a classifier and especially in terms of the correctness of the provided prediction. The software is available to academic users from the authors upon request.</p

Crossref

Directory of Open Access Journals

PubMed Central

Combining Clustering techniques and Formal Concept Analysis to characterize Interestingness Measures

Author: Grissa Dhouha
Guillaume Sylvie
Nguifo Engelbert Mephu
Publication venue
Publication date: 01/01/2010
Field of study

Formal Concept Analysis "FCA" is a data analysis method which enables to discover hidden knowledge existing in data. A kind of hidden knowledge extracted from data is association rules. Different quality measures were reported in the literature to extract only relevant association rules. Given a dataset, the choice of a good quality measure remains a challenging task for a user. Given a quality measures evaluation matrix according to semantic properties, this paper describes how FCA can highlight quality measures with similar behavior in order to help the user during his choice. The aim of this article is the discovery of Interestingness Measures "IM" clusters, able to validate those found due to the hierarchical and partitioning clustering methods "AHC" and "k-means". Then, based on the theoretical study of sixty one interestingness measures according to nineteen properties, proposed in a recent study, "FCA" describes several groups of measures.Comment: 13 pages, 2 figure

arXiv.org e-Print Archive

CiteSeerX

HAL Clermont Université

Partitioning-Clustering Techniques Applied to the Electricity Price Time Series

Author: Martínez Álvarez Francisco
Riquelme Santos Jesús Manuel
Riquelme Santos José Cristóbal
Troncoso Lora Alicia
Publication venue
Publication date: 01/01/2007
Field of study

Clustering is used to generate groupings of data from a large dataset, with the intention of representing the behavior of a system as accurately as possible. In this sense, clustering is applied in this work to extract useful information from the electricity price time series. To be precise, two clustering techniques, K-means and Expectation Maximization, have been utilized for the analysis of the prices curve, demonstrating that the application of these techniques is effective so to split the whole year into different groups of days, according to their prices conduct. Later, this information will be used to predict the price in the short time period. The prices exhibited a remarkable resemblance among days embedded in a same season and can be split into two major kind of clusters: working days and festivities

idUS. Depósito de Investigación Universidad de Sevilla

A survey of kernel and spectral methods for clustering

Author: Aizerman
Aronszajn
Belkin
Bengio
Bezdek
Bishop
Burges
Camastra
Chan
Chen
Chiang
Cortes
Cristianini
Cristianini
Dhillon
Dhillon
Donath
Duda
Fiedler
Fisher
Francesco Camastra
Francesco Masulli
Gersho
Girolami
Golub
Have
Horn
Huber
Hur
Jain
Kernighan
Kluger
Kohonen
Kohonen
Krishnapuram
Krishnapuram
Kulis
Lee
Leski
Linde
Lloyd
Martinetz
Maurizio Filippone
Mercer
Müller
Ng
Ritter
Rose
Roth
Roweis
Saitoh
Schölkopf
Schölkopf
Shi
Sigillito
Sneath
Stefano Rovetta
Tax
Vapnik
von Luxburg
Ward
Weston
Wolberg
Xu
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2007
Field of study

Clustering algorithms are a useful tool to explore data structures and have been employed in many disciplines. The focus of this paper is the partitioning clustering problem with a special interest in two recent approaches: kernel and spectral methods. The aim of this paper is to present a survey of kernel and spectral clustering methods, two approaches able to produce nonlinear separating hypersurfaces between clusters. The presented kernel clustering methods are the kernel version of many classical clustering algorithms, e.g., K-means, SOM and neural gas. Spectral clustering arise from concepts in spectral graph theory and the clustering problem is configured as a graph cut problem where an appropriate objective function has to be optimized. An explicit proof of the fact that these two paradigms have the same objective is reported since it has been proven that these two seemingly different approaches have the same mathematical foundation. Besides, fuzzy kernel clustering methods are presented as extensions of kernel K-means clustering algorithm. (C) 2007 Pattem Recognition Society. Published by Elsevier Ltd. All rights reserved

CiteSeerX

Archivio della ricerca - Università degli studi di Napoli "Parthenope"

Crossref

Enlighten

Archivio istituzionale della ricerca - Università di Genova

White Rose Research Online

A Modified Overlapping Partitioning Clustering Algorithm for Categorical Data Clustering

Author: A.Fadhil Moayad
Alaqtash Mohammad
F. Al-Azzawi Ali
Publication venue: 'Institute of Advanced Engineering and Science'
Publication date: 01/03/2018
Field of study

Clustering is one of the important approaches for Clustering enables the grouping of unlabeled data by partitioning data into clusters with similar patterns. Over the past decades, many clustering algorithms have been developed for various clustering problems. An overlapping partitioning clustering (OPC) algorithm can only handle numerical data. Hence, novel clustering algorithms have been studied extensively to overcome this issue. By increasing the number of objects belonging to one cluster and distance between cluster centers, the study aimed to cluster the textual data type without losing the main functions. The proposed study herein included over twenty newsgroup dataset, which consisted of approximately 20000 textual documents. By introducing some modifications to the traditional algorithm, an acceptable level of homogeneity and completeness of clusters were generated. Modifications were performed on the pre-processing phase and data representation, along with the number methods which influence the primary function of the algorithm. Subsequently, the results were evaluated and compared with the k-means algorithm of the training and test datasets. The results indicated that the modified algorithm could successfully handle the categorical data and produce satisfactory clusters

Bulletin of Electrical Engineering and Informatics