54,185 research outputs found
A survey of popular R packages for cluster analysis
Cluster analysis is a set of statistical methods for discovering new group/class structure when exploring datasets. This article reviews the following popular libraries/commands in the R software language for applying different types of cluster analysis: from the stats library, the kmeans and hclust functions; the mclust library; the poLCA library; and the clustMD library. The packages/functions cover a variety of cluster analysis methods for continuous data, categorical data or a collection of the two. The contrasting methods in the different packages are briefly introduced and basic usage of the functions is discussed. The use of the different methods is compared and contrasted and then illustrated on example data. In the discussion, links to information on other available libraries for different clustering methods and extensions beyond basic clustering methods are given. The code for the worked examples in Section 2 is available at http://www.stats.gla.ac.uk/~nd29c/Software/ClusterReviewCode.
Non-parametric Bayesian modeling of complex networks
Modeling structure in complex networks using Bayesian non-parametrics makes
it possible to specify flexible model structures and infer the adequate model
complexity from the observed data. This paper provides a gentle introduction to
non-parametric Bayesian modeling of complex networks: Using an infinite mixture
model as running example we go through the steps of deriving the model as an
infinite limit of a finite parametric model, inferring the model parameters by
Markov chain Monte Carlo, and checking the model's fit and predictive
performance. We explain how advanced non-parametric models for complex networks
can be derived and point out relevant literature
A Tutorial on Bayesian Nonparametric Models
A key problem in statistical modeling is model selection, how to choose a
model at an appropriate level of complexity. This problem appears in many
settings, most prominently in choosing the number ofclusters in mixture models
or the number of factors in factor analysis. In this tutorial we describe
Bayesian nonparametric methods, a class of methods that side-steps this issue
by allowing the data to determine the complexity of the model. This tutorial is
a high-level introduction to Bayesian nonparametric methods and contains
several examples of their application.Comment: 28 pages, 8 figure
Latent class analysis for segmenting preferences of investment bonds
Market segmentation is a key component of conjoint analysis which addresses consumer
preference heterogeneity. Members in a segment are assumed to be homogenous in their
views and preferences when worthing an item but distinctly heterogenous to members of other
segments. Latent class methodology is one of the several conjoint segmentation procedures
that overcome the limitations of aggregate analysis and a-priori segmentation. The main
benefit of Latent class models is that market segment membership and regression parameters
of each derived segment are estimated simultaneously. The Latent class model presented in
this paper uses mixtures of multivariate conditional normal distributions to analyze rating
data, where the likelihood is maximized using the EM algorithm. The application focuses on
customer preferences for investment bonds described by four attributes; currency, coupon
rate, redemption term and price. A number of demographic variables are used to generate
segments that are accessible and actionable.peer-reviewe
- …