Search CORE

23,734 research outputs found

Clustering Via Nonparametric Density Estimation: the R Package pdfCluster

Author: Azzalini Adelchi
Menardi Giovanna
Publication venue
Publication date: 28/01/2013
Field of study

The R package pdfCluster performs cluster analysis based on a nonparametric estimate of the density of the observed variables. After summarizing the main aspects of the methodology, we describe the features and the usage of the package, and finally illustrate its working with the aid of two datasets

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Journal of Statistical Software

Archivio istituzionale della ricerca - Università di Padova

Clustering via nonparametric density estimation: an application to microarray data.

Author: De Bin Riccardo
Risso Davide
Publication venue: Dipartimento di Scienze Statistiche
Publication date: 01/01/2010
Field of study

Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, being the number of variables much higher than the number of observations. Here, we present a novel approach to clustering of microarray data via nonparametric density estimation, based on the following steps: (i) selection of relevant variables; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Applications on simulated and real data show promising results in comparison with those produced by two standard approaches, k-means and Mclust. In the simulation studies, our nonparametric approach shows performances comparable to those of models based on normality assumption, even in Gaussian settings. On the other hand, in two benchmarking real datasets, it outperforms the existing parametric approaches

Archivio istituzionale della ricerca - Università di Padova

A novel approach to the clustering of microarray data via nonparametric density estimation

Author: A Azzalini
A Banerjee
B Bolstad
C Fraley
C Fraley
C Kendziorski
CB Barber
D Slonim
D Tritchler
Davide Risso
ES Garrett
G Getz
G Kerr
G Menardi
GJ McLachlan
IM Johnstone
J Friedman
J Li
J Li
JA Hartigan
JD Banfield
M Chiogna
M de Berg
ML Chow
R Bourgon
R Development Core Team
RC Gentleman
Riccardo De Bin
S Dudoit
S Madeira
T Hastie
TR Golub
U Alon
Y Cheng
Publication venue: BioMed Central
Publication date: 01/01/2011
Field of study

Abstract Background Cluster analysis is a crucial tool in several biological and medical studies dealing with microarray data. Such studies pose challenging statistical problems due to dimensionality issues, since the number of variables can be much higher than the number of observations. Results Here, we present a general framework to deal with the clustering of microarray data, based on a three-step procedure: (i) gene filtering; (ii) dimensionality reduction; (iii) clustering of observations in the reduced space. Via a nonparametric model-based clustering approach we obtain promising results both in simulated and real data. Conclusions The proposed algorithm is a simple and effective tool for the clustering of microarray data, in an unsupervised setting.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Offline and Online Density Estimation for Large High-Dimensional Data

Author: Majdara Aref
Publication venue: Digital Commons @ Michigan Tech
Publication date: 01/01/2018
Field of study

Density estimation has wide applications in machine learning and data analysis techniques including clustering, classification, multimodality analysis, bump hunting and anomaly detection. In high-dimensional space, sparsity of data in local neighborhood makes many of parametric and nonparametric density estimation methods mostly inefficient. This work presents development of computationally efficient algorithms for high-dimensional density estimation, based on Bayesian sequential partitioning (BSP). Copula transform is used to separate the estimation of marginal and joint densities, with the purpose of reducing the computational complexity and estimation error. Using this separation, a parallel implementation of the density estimation algorithm on a 4-core CPU is presented. Also, some example applications of the high-dimensional density estimation in density-based classification and clustering are presented. Another challenge in the area of density estimation rises in dealing with online sources of data, where data is arriving over an open-ended and non-stationary stream. This calls for efficient algorithms for online density estimation. An online density estimator needs to be capable of providing up-to-date estimates of the density, bound to the available computing resources and requirements of the application. In response to this, BBSP method for online density estimation is introduced. It works based on collecting and processing the data in blocks of fixed size, followed by a weighted averaging over block-wise estimates of the density. Proper choice of block size is discussed via simulations for streams of synthetic and real datasets. Further, with the purpose of efficiency improvement in offline and online density estimation, progressive update of the binary partitions in BBSP is proposed, which as simulation results show, leads into improved accuracy as well as speed-up, for various block sizes

Michigan Technological University

Signal detection in high energy physics via a semisupervised nonparametric approach

Author: Casa Alessandro
Menardi Giovanna
Publication venue
Publication date: 01/01/2017
Field of study

Archivio istituzionale della ricerca - Università di Padova