Search CORE

117,711 research outputs found

Optimal Clustering Framework for Hyperspectral Band Selection

Author: Li Xuelong
Wang Qi
Zhang Fahong
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 29/04/2019
Field of study

Band selection, by choosing a set of representative bands in hyperspectral image (HSI), is an effective method to reduce the redundant information without compromising the original contents. Recently, various unsupervised band selection methods have been proposed, but most of them are based on approximation algorithms which can only obtain suboptimal solutions toward a specific objective function. This paper focuses on clustering-based band selection, and proposes a new framework to solve the above dilemma, claiming the following contributions: 1) An optimal clustering framework (OCF), which can obtain the optimal clustering result for a particular form of objective function under a reasonable constraint. 2) A rank on clusters strategy (RCS), which provides an effective criterion to select bands on existing clustering structure. 3) An automatic method to determine the number of the required bands, which can better evaluate the distinctive information produced by certain number of bands. In experiments, the proposed algorithm is compared to some state-of-the-art competitors. According to the experimental results, the proposed algorithm is robust and significantly outperform the other methods on various data sets

arXiv.org e-Print Archive

Institutional Repository of Xi'an Institute of Optics and Precision Mechanics, CAS

Clustering by soft-constraint affinity propagation: Applications to gene-expression data

Author: Alizadeh
Blatt
Braunstein
Golub
M. Leone
M. Weigt
Pomeroy
Sumedha
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2007
Field of study

Motivation: Similarity-measure based clustering is a crucial problem appearing throughout scientific data analysis. Recently, a powerful new algorithm called Affinity Propagation (AP) based on message-passing techniques was proposed by Frey and Dueck \cite{Frey07}. In AP, each cluster is identified by a common exemplar all other data points of the same cluster refer to, and exemplars have to refer to themselves. Albeit its proved power, AP in its present form suffers from a number of drawbacks. The hard constraint of having exactly one exemplar per cluster restricts AP to classes of regularly shaped clusters, and leads to suboptimal performance, {\it e.g.}, in analyzing gene expression data. Results: This limitation can be overcome by relaxing the AP hard constraints. A new parameter controls the importance of the constraints compared to the aim of maximizing the overall similarity, and allows to interpolate between the simple case where each data point selects its closest neighbor as an exemplar and the original AP. The resulting soft-constraint affinity propagation (SCAP) becomes more informative, accurate and leads to more stable clustering. Even though a new {\it a priori} free-parameter is introduced, the overall dependence of the algorithm on external tuning is reduced, as robustness is increased and an optimal strategy for parameter selection emerges more naturally. SCAP is tested on biological benchmark data, including in particular microarray data related to various cancer types. We show that the algorithm efficiently unveils the hierarchical cluster structure present in the data sets. Further on, it allows to extract sparse gene expression signatures for each cluster.Comment: 11 pages, supplementary material: http://isiosf.isi.it/~weigt/scap_supplement.pd

arXiv.org e-Print Archive

CiteSeerX

Crossref

Constraint-based discriminative dimension selection for high-dimensional stream clustering

Author: Kitsana Waiyamai
Thanapat Kangkachit
Publication venue: 'Universitas Ahmad Dahlan, Kampus 3'
Publication date: 01/11/2018
Field of study

Clustering data streams is one of active research topic in data mining. However, runtime of the existing stream clustering algorithms increases and their performance drop in the face of large number of dimensions. Complexity of the stream clustering methods is increased when perform on data with large number of dimensions. In order to reduce the clustering complexity, one possible solution consists in determining the appropriate subset of cluster dimensions via dimension projection. SED-Stream is an efficient clustering algorithm that supports high dimension data streams. The aim of this paper is to increase performance of SED-Stream in terms of both clustering quality and execution-time. In order to improve the clustering process, background or domain expert knowledge are integrated as “constraints” in SEDC-Stream. The new algorithm, SEDC-Stream, supports the evolving characteristics of the dynamic constraints which are activation, fading, outdating and prioritization. SEDC-Stream algorithm is able to reduce cluster splitting time, and place new incoming points to their suitable clusters. Compared to SED-Stream on the three real-world streams datasets, SEDC-Stream is able to generate a better clustering performance in terms of both purity and f-measure

International Journal of Advances in Intelligent Informatics

Directory of Open Access Journals

International Journal of Advances in Intelligent Informatics (IJAIN)

Delete or merge regressors for linear model selection

Author: Maj-Kańska Aleksandra
Pokarowski Piotr
Prochenka Agnieszka
Publication venue: 'Institute of Mathematical Statistics'
Publication date: 01/01/2015
Field of study

We consider a problem of linear model selection in the presence of both continuous and categorical predictors. Feasible models consist of subsets of numerical variables and partitions of levels of factors. A new algorithm called delete or merge regressors (DMR) is presented which is a stepwise backward procedure involving ranking the predictors according to squared t-statistics and choosing the final model minimizing BIC. In the article we prove consistency of DMR when the number of predictors tends to infinity with the sample size and describe a simulation study using a pertaining R package. The results indicate significant advantage in time complexity and selection accuracy of our algorithm over Lasso-based methods described in the literature. Moreover, a version of DMR for generalized linear models is proposed

arXiv.org e-Print Archive

Crossref