1,068 research outputs found

    Neighborhood Graphs, Stripes and Shadow Plots for Cluster Visualization

    Get PDF
    Centroid-based partitioning cluster analysis is a popular method for segmenting data into more homogeneous subgroups. Visualization can help tremendously to understand the positions of these subgroups relative to each other in higher dimensional spaces and to assess the quality of partitions. In this paper we present several improvements on existing cluster displays using neighborhood graphs with edge weights based on cluster separation and convex hulls of inner and outer cluster regions. A new display called shadow-stars can be used to diagnose pairwise cluster separation with respect to the distribution of the original data. Artificial data and two case studies with real data are used to demonstrate the techniques

    R behind the scenes: Using S the (un)usual way

    Get PDF
    R is not only a program for analyzing and visualizing data, it is an open and programmable software environment. It can not only easily access other programs written in a wide variety of languages, but also be accessed itself from other programs. As such, it can be seen as the computational Swiss army knife of statistics. Connecting a program to R can be surprisingly simple, and once the connection is established, the perhaps largest existing collection of statistical methodology is available through a unified interface. Embedding R can save a lot of human time by automating routine tasks, but more importantly, it often gives a simple way of making our methods accessible to a much wider audience

    FlexMix: A General Framework for Finite Mixture Models and Latent Class Regression in R

    Get PDF
    FlexMix implements a general framework for fitting discrete mixtures of regression models in the R statistical computing environment: three variants of the EM algorithm can be used for parameter estimation, regressors and responses may be multivariate with arbitrary dimension, data may be grouped, e.g., to account for multiple observations per individual, the usual formula interface of the S language is used for convenient model specification, and a modular concept of driver functions allows to interface many different types of regression models. Existing drivers implement mixtures of standard linear models, generalized linear models and model-based clustering. FlexMix provides the E-step and all data handling, while the M-step can be supplied by the user to easily define new models.

    Modelling Background Noise in Finite Mixtures of Generalized Linear Regression Models

    Get PDF
    In this paper we show how only a few outliers can completely break down EM-estimation of mixtures of regression models. A simple, yet very effective way of dealing with this problem, is to use a component where all regression parameters are fixed to zero to model the background noise. This noise component can be easily defined for different types of generalized linear models, has a familiar interpretation as the empty regression model, and is not very sensitive with respect to its own parameters

    Visualizing Gene Clusters using Neighborhood Graphs in R

    Get PDF
    The visualization of cluster solutions in gene expression data analysis gives practitioners an understanding of the cluster structure of their data and makes it easier to interpret the cluster results. Neighborhood graphs allow for visual assessment of relationships between adjacent clusters. The number of clusters in gene expression data is for biological reasons rather large. As a linear projection of the data into 2 dimensions does not scale well in the number of clusters there is a need for new visualization techniques using non-linear arrangement of the clusters. The new visualization tool is implemented in the open source statistical computing environment R. It is demonstrated on microarray data from yeast

    Creating R Packages: A Tutorial

    Get PDF
    This tutorial gives a practical introduction to creating R packages. We discuss how object oriented programming and S formulas can be used to give R code the usual look and feel, how to start a package from a collection of R functions, and how to test the code once the package has been created. As running example we use functions for standard linear regression analysis which are developed from scratch

    Generating Correlated Ordinal Random Values

    Get PDF
    Ordinal variables appear in many field of statistical research. Since working with simulated data is an accepted technique to improve models or test results there is a need for providing correlated ordinal random values with certain properties like marginal distribution and correlation structure. The present paper describes two methods for generating such values: binary conversion and a mean mapping approach. The algorithms of the two methods are described and some examples of the outcomes are shown

    Estimates of capital stocks and capital productivity in Austrian manufacturing industries, 1978 -1994

    Get PDF
    We present gross, net and productive capital stock estimates for 20 industries of the Austrian manufacturing sector based on the perpetual inventory method for the period 1969-1994. The estimation of the net capital stocks and the volume index of capital services follows an integrated method derived from the neoclassical theory of investment. Based on the estimates we calculate capital intensity and capital productivity measures for the 20 industries and provide estimates of capital productivity developments. We find that capital productivity decreased only for 5 out of the 20 industries. The other industries showed in part marked increases in both capital and labor productivity.Capital Services, Capital Productivity, Austria, Manufacturing

    FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters

    Get PDF
    flexmix provides infrastructure for flexible fitting of finite mixture models in R using the expectation-maximization (EM) algorithm or one of its variants. The functionality of the package was enhanced. Now concomitant variable models as well as varying and constant parameters for the component specific generalized linear regression models can be fitted. The application of the package is demonstrated on several examples, the implementation described and examples given to illustrate how new drivers for the component specific models and the concomitant variable models can be defined.

    Exploratory Analysis of Benchmark Experiments -- An Interactive Approach

    Get PDF
    The analysis of benchmark experiments consists in a large part of exploratory methods, especially visualizations. In Eugster et al. [2008] we presented a comprehensive toolbox including the bench plot. This plot visualizes the behavior of the algorithms on the individual drawn learning and test samples according to specific performance measures. In this paper we show an interactive version of the bench plot can easily uncover details and relations unseen with the static version
    corecore