Search CORE

10 research outputs found

BiETech : Bicluster Ensemble Techniques

Author: Geeta Aggarwal et al.
Publication venue: Auricle Global Society of Education and Research
Publication date: 05/11/2023
Field of study

Various biclustering algorithms have emerged now a days that try to deliver good biclusters from gene expression data which satisfy a particular objective function. Users are lost in finding the best out of these algorithms. Ensemble techniques come to rescue     of these users by aggregating all the solutions and providing a single solution which is more robust and stable than its constituent solutions.  In this paper, we present two different ensemble techniques for biclustering solutions. We have used classifiers in one approach and the other approach uses the concept of metaclustering for forming the consensus. Experiments in this research are performed   on synthetic and real gene expression datasets as biologists are interested in finding meaningful patterns in expression of genes.  The experiments show that both the approaches proposed in the paper show improvement over the input solutions as well as the existing bicluster ensemble techniques

International Journal on Recent and Innovation Trends in Computing and Communication

A Survey on Soft Subspace Clustering

Author: Choi Kup-Sze
Deng Zhaohong
Jiang Yizhang
Wang Jun
Wang Shitong
Publication venue: 'Elsevier BV'
Publication date: 07/04/2016
Field of study

Subspace clustering (SC) is a promising clustering technology to identify clusters based on their associations with subspaces in high dimensional spaces. SC can be classified into hard subspace clustering (HSC) and soft subspace clustering (SSC). While HSC algorithms have been extensively studied and well accepted by the scientific community, SSC algorithms are relatively new but gaining more attention in recent years due to better adaptability. In the paper, a comprehensive survey on existing SSC algorithms and the recent development are presented. The SSC algorithms are classified systematically into three main categories, namely, conventional SSC (CSSC), independent SSC (ISSC) and extended SSC (XSSC). The characteristics of these algorithms are highlighted and the potential future development of SSC is also discussed.Comment: This paper has been published in Information Sciences Journal in 201

arXiv.org e-Print Archive

The Hong Kong Polytechnic University Pao Yue-kong Library

PolyU Institutional Repository

Improving Supervised Classification Using Information Extraction

Author: A Puurula
D Rao
DD Lewis
E Gabrilovich
F Gullo
G Forman
G Tsoumakas
G Tsoumakas
J Piskorski
K Crammer
M Atkinson
M Du
M Hall
R Grishman
RC Prati
S Dendamrongvit
S Huttunen
S Patwardhan
S Wang
W Zhang
Y Liu
Y Yang
Z Erenel
Publication venue: Springer International Publishing AG
Publication date: 01/01/2015
Field of study

Peer reviewe

Crossref

Helsingin yliopiston digitaalinen arkisto

Meta-optimizations for Cluster Analysis

Author: Tomáš Bartoň
Publication venue: Czech Technical University in Prague. Computing and Information Centre.
Publication date: 30/05/2019
Field of study

This dissertation thesis deals with advances in the automation of cluster analysis.This dissertation thesis deals with advances in the automation of cluster analysis

Digital Library of the Czech Technical University in Prague

Advancing data clustering via projective clustering ensembles

Author
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2011
Field of study

Crossref

Overcoming uncertainty and the curse of dimensionality in data clustering

Author: Greco Sergio
Gullo Francesco
Palopoli Luigi
Publication venue
Publication date: 05/03/2014
Field of study

Dottorato di Ricerca in Ingegneria dei Sistemi ed Informatica, XXII Ciclo, 2009Uncertainty and the curse of dimensionality are two crucial problems that usually a®ect data clustering. Uncertainty in data clustering may be typically considered at data level or clustering level. Data level uncertainty is inherently present in the repre- sentation of several kinds of data objects from various application contexts (e.g., sensor networks, moving objects databases, biomedicine). This kind of uncertainty should be carefully taken into account in a clustering task in order to achieve adequate accuracy; unfortunately, traditional clustering methods are designed to work only on deterministic vectorial representations of data objects. Clustering uncertainty is related to the output of any clustering algo- rithm. Indeed, the ill-posed nature of clustering leads to clustering algorithms that cannot be generally valid for any input dataset, i.e., their output results are necessarily uncertain. Clustering ensembles has been recognized as a pow- erful solution to overcome clustering uncertainty. It aims to derive a single clustering solution (i.e., the consensus partition) from a set of clusterings rep- resenting di®erent partitions of the same input dataset (i.e., the ensemble). A major weakness of the existing clustering ensembles methods is that they compute the consensus partition by equally considering all the solutions in the ensemble. The curse of dimensionality in data clustering concerns all the issues that naturally arise from data objects represented by a large set of features and are responsible of poor accuracy and e±ciency achieved by traditional clustering methods working on high dimensional data. Classic approaches to the curse of dimensionality include global and local dimensionality reduction. Global techniques aim at reducing the dimensionality of the input dataset by applying the same algorithm(s) to all the input data objects. Local dimensionality reduction acts by considering subsets of the input dataset and performing dimensionality reduction speci¯c for any of such subsets. Projective clustering is an e®ective class of methods falling into the category of local dimensionality reduction. It aims to discover clusters of objects along with the corresponding subspaces, following the principle that objects in the same cluster are close to each other if and only if they are projected onto the subspace associated to that cluster. viii Abstract The focus of this thesis is on the development of proper techniques for overcoming the crucial problems of uncertainty and the curse of dimension- ality arising from data clustering. This thesis provides the following main contributions. Uncertainty. Uncertainty at a representation level is addressed by proposing: UK-medoids, which is a new partitional algorithm for clustering un- certain objects, which is designed to overcome e±ciency and accuracy issues of some existing state-of-the-art methods; U-AHC, i.e., the ¯rst (agglomerative) hierarchical algorithm for clus- tering uncertain objects; a methodology to exploit U-AHC for clustering microarray biomedical data with probe-level uncertainty. Clustering uncertainty is addressed by focusing on the problem of weighted consensus clustering, which aims to automatically determine weighting schemes to discriminate among clustering solutions in a given ensemble. In particular: three novel diversity-based, general schemes for weighting the individ- ual clusterings in a given ensemble are proposed, i.e., Single-Weighting (SW), Group-Weighting (GW), and Dendrogram-Weighting (DW); three algorithms, called WICE, WCCE, and WHCE, are de¯ned to eas- ily involve clustering weighting schemes into any clustering ensembles algorithm falling into one of the main classes of clustering ensembles approaches, i.e., instance-based, cluster-based, and hybrid. The curse of dimensionality. Global dimensionality reduction is addressed by focusing on the time series data application context: the Derivative time series Segment Approximation (DSA) model is proposed as a new time series dimensionality reduction method de- signed for accurate and fast similarity detection and clustering; Mass Spectrometry Data Analysis (MaSDA) system is presented; it mainly aims at analyzing mass spectrometry (MS) biomedical data by exploiting DSA to model such data according to a time series-based representation; DSA is exploited for pro¯ling low-voltage electricity customers. Regarding local dimensionality reduction, a uni¯ed view of projective clus- tering and clustering ensembles is provided. In particular: the novel Projective Clustering Ensembles (PCE) problem is addressed and formally de¯ned according two speci¯c optimization formulations, i.e., two-objective PCE and single-objective PCE; MOEA-PCE and EM-PCE algorithms are proposed as novel heuristics to solve two-objective PCE and single-objective PCE, respectively. Absolute accuracy and e±ciency performance achieved by the proposed techniques, as well as the performance with respect to the prominent state- of-the-art methods are evaluated by performing extensive sets of experiments on benchmark, synthetically generated, and real-world datasetsUniversità della Calabri

Archivio Istituzionale dell'Università della Calabria