1,068 research outputs found
Neighborhood Graphs, Stripes and Shadow Plots for Cluster Visualization
Centroid-based partitioning cluster analysis is a popular method for
segmenting data into more homogeneous subgroups. Visualization can
help tremendously to understand the positions of these subgroups
relative to each other in higher dimensional spaces and to assess
the quality of partitions. In this paper we present several
improvements on existing cluster displays using neighborhood graphs
with edge weights based on cluster separation and convex hulls of
inner and outer cluster regions. A new display called shadow-stars
can be used to diagnose pairwise cluster separation with respect to
the distribution of the original data. Artificial data and two case
studies with real data are used to demonstrate the techniques
R behind the scenes: Using S the (un)usual way
R is not only a program for analyzing and visualizing data, it is an
open and programmable software environment. It can not only easily
access other programs written in a wide variety of languages, but also
be accessed itself from other programs. As such, it can be seen as the
computational Swiss army knife of statistics. Connecting a program to
R can be surprisingly simple, and once the connection is established,
the perhaps largest existing collection of statistical methodology is
available through a unified interface. Embedding R can save a lot of
human time by automating routine tasks, but more importantly, it often
gives a simple way of making our methods accessible to a much wider
audience
FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R
FlexMix implements a general framework for fitting discrete mixtures of regression models in the R statistical computing environment: three variants of the EM algorithm can be used for parameter estimation, regressors and responses may be multivariate with arbitrary dimension, data may be grouped, e.g., to account for multiple observations per individual, the usual formula interface of the S language is used for convenient model specification, and a modular concept of driver functions allows to interface many different types of regression models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering. FlexMix provides the E-step and all data handling, while the M-step can be supplied by the user to easily define new models.
Modelling Background Noise in Finite Mixtures of Generalized Linear Regression Models
In this paper we show how only a few outliers can completely break down EM-estimation of mixtures of regression models. A simple, yet very effective way of dealing with this problem, is to use a component where all regression parameters are fixed to zero to model the background noise. This noise component can be easily defined for different types of generalized linear models, has a familiar interpretation as the empty regression model, and is not very sensitive with respect to its own parameters
Visualizing Gene Clusters using Neighborhood Graphs in R
The visualization of cluster solutions in gene expression data analysis gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. Neighborhood graphs allow for visual assessment of relationships between adjacent clusters. The number of clusters in gene expression data is for biological reasons rather large. As a linear projection of the data into 2 dimensions does not scale well in the number of clusters there is a need for new visualization techniques using non-linear arrangement of the clusters. The new visualization tool is implemented in the open source statistical computing environment R. It is demonstrated on microarray data from yeast
Creating R Packages: A Tutorial
This tutorial gives a practical introduction to creating R packages. We discuss how object oriented programming and S formulas can be used to give R code the usual look and feel, how to start a package from a collection of R functions, and how to test the code once the package has been created. As running example we use functions for standard linear regression analysis which are developed from scratch
Generating Correlated Ordinal Random Values
Ordinal variables appear in many field of statistical research. Since working with simulated data is an accepted technique to improve models or test results there is a need for providing correlated ordinal random values with certain properties like marginal distribution and correlation structure. The present paper describes two methods for generating such values: binary conversion and a mean mapping approach. The algorithms of the two methods are described and some examples of the outcomes are shown
Estimates of capital stocks and capital productivity in Austrian manufacturing industries, 1978 -1994
We present gross, net and productive capital stock estimates for 20 industries of the Austrian manufacturing sector based on the perpetual inventory method for the period 1969-1994. The estimation of the net capital stocks and the volume index of capital services follows an integrated method derived from the neoclassical theory of investment. Based on the estimates we calculate capital intensity and capital productivity measures for the 20 industries and provide estimates of capital productivity developments. We find that capital productivity decreased only for 5 out of the 20 industries. The other industries showed in part marked increases in both capital and labor productivity.Capital Services, Capital Productivity, Austria, Manufacturing
FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters
flexmix provides infrastructure for flexible fitting of finite mixture models in R using the expectation-maximization (EM) algorithm or one of its variants. The functionality of the package was enhanced. Now concomitant variable models as well as varying and constant parameters for the component specific generalized linear regression models can be fitted. The application of the package is demonstrated on several examples, the implementation described and examples given to illustrate how new drivers for the component specific models and the concomitant variable models can be defined.
Exploratory Analysis of Benchmark Experiments -- An Interactive Approach
The analysis of benchmark experiments consists in a large part of exploratory methods, especially visualizations. In Eugster et al. [2008] we presented a comprehensive toolbox including the bench plot. This plot visualizes the behavior of the algorithms on the individual drawn learning and test samples according to specific performance measures. In this paper we show an interactive version of the bench plot can easily uncover details and relations unseen with the static version
- …