106 research outputs found

    FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters

    Get PDF
    flexmix provides infrastructure for flexible fitting of finite mixture models in R using the expectation-maximization (EM) algorithm or one of its variants. The functionality of the package was enhanced. Now concomitant variable models as well as varying and constant parameters for the component specific generalized linear regression models can be fitted. The application of the package is demonstrated on several examples, the implementation described and examples given to illustrate how new drivers for the component specific models and the concomitant variable models can be defined.

    topicmodels: An R Package for Fitting Topic Models

    Get PDF
    Topic models allow the probabilistic modeling of term frequency occurrences in documents. The fitted model can be used to estimate the similarity between documents as well as between a set of specified keywords using an additional layer of latent variables which are referred to as topics. The R package topicmodels provides basic infrastructure for fitting topic models based on data structures from the text mining package tm. The package includes interfaces to two algorithms for fitting topic models: the variational expectation-maximization algorithm provided by David M. Blei and co-authors and an algorithm using Gibbs sampling by Xuan-Hieu Phan and co-authors.

    Automatic Generation of Exams in R

    Get PDF
    Package exams provides a framework for automatic generation of standardized statistical exams which is especially useful for large-scale exams. To employ the tools, users just need to supply a pool of exercises and a master file controlling the layout of the final PDF document. The exercises are specified in separate Sweave files (containing R code for data generation and LaTeX code for problem and solution description) and the master file is a LaTeX document with some additional control commands. This paper gives an overview of the main design aims and principles as well as strategies for adaptation and extension. Hands-on illustrations---based on example exercises and control files provided in the package---are presented to get new users started easily.

    Semi-parametric Regression under Model Uncertainty: Economic Applications

    Get PDF
    Economic theory does not always specify the functional relationship between dependent and explanatory variables, or even isolate a particular set of covariates. This means that model uncertainty is pervasive in empirical economics. In this paper, we indicate how Bayesian semi-parametric regression methods in combination with stochastic search variable selection can be used to address two model uncertainties simultaneously: (i) the uncertainty with respect to the variables which should be included in the model and (ii) the uncertainty with respect to the functional form of their effects. The presented approach enables the simultaneous identification of robust linear and nonlinear effects. The additional insights gained are illustrated on applications in empirical economics, namely willingness to pay for housing, and cross-country growth regression

    arules - A Computational Environment for Mining Association Rules and Frequent Item Sets

    Get PDF
    Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.

    Identifying Mixtures of Mixtures Using Bayesian Estimation

    Get PDF
    The use of a finite mixture of normal distributions in model-based clustering allows to capture non-Gaussian data clusters. However, identifying the clusters from the normal components is challenging and in general either achieved by imposing constraints on the model or by using post-processing procedures. Within the Bayesian framework we propose a different approach based on sparse finite mixtures to achieve identifiability. We specify a hierarchical prior where the hyperparameters are carefully selected such that they are reflective of the cluster structure aimed at. In addition this prior allows to estimate the model using standard MCMC sampling methods. In combination with a post-processing approach which resolves the label switching issue and results in an identified model, our approach allows to simultaneously (1) determine the number of clusters, (2) flexibly approximate the cluster distributions in a semi-parametric way using finite mixtures of normals and (3) identify cluster-specific parameters and classify observations. The proposed approach is illustrated in two simulation studies and on benchmark data sets.Comment: 49 page

    Automatic Generation of Exams in R

    Get PDF
    Package exams provides a framework for automatic generation of standardized statistical exams which is especially useful for large-scale exams. To employ the tools, users just need to supply a pool of exercises and a master file controlling the layout of the final PDF document. The exercises are specified in separate Sweave files (containing R code for data generation and LaTeX code for problem and solution description) and the master file is a LaTeX document with some additional control commands. This paper gives an overview of the main design aims and principles as well as strategies for adaptation and extension. Hands-on illustrations - based on example exercises and control files provided in the package - are presented to get new users started easily

    FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters

    Get PDF
    flexmix provides infrastructure for flexible fitting of finite mixture models in R using the expectation-maximization (EM) algorithm or one of its variants. The functionality of the package was enhanced. Now concomitant variable models as well as varying and constant parameters for the component specific generalized linear regression models can be fitted. The application of the package is demonstrated on several examples, the implementation described and examples given to illustrate how new drivers for the component specific models and the concomitant variable models can be defined