106 research outputs found
FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters
flexmix provides infrastructure for flexible fitting of finite mixture models in R using the expectation-maximization (EM) algorithm or one of its variants. The functionality of the package was enhanced. Now concomitant variable models as well as varying and constant parameters for the component specific generalized linear regression models can be fitted. The application of the package is demonstrated on several examples, the implementation described and examples given to illustrate how new drivers for the component specific models and the concomitant variable models can be defined.
topicmodels: An R Package for Fitting Topic Models
Topic models allow the probabilistic modeling of term frequency occurrences in documents. The fitted model can be used to estimate the similarity between documents as well as between a set of specified keywords using an additional layer of latent variables which are referred to as topics. The R package topicmodels provides basic infrastructure for fitting topic models based on data structures from the text mining package tm. The package includes interfaces to two algorithms for fitting topic models: the variational expectation-maximization algorithm provided by David M. Blei and co-authors and an algorithm using Gibbs sampling by Xuan-Hieu Phan and co-authors.
Automatic Generation of Exams in R
Package exams provides a framework for automatic generation of standardized statistical exams which is especially useful for large-scale exams. To employ the tools, users just need to supply a pool of exercises and a master file controlling the layout of the final PDF document. The exercises are specified in separate Sweave files (containing R code for data generation and LaTeX code for problem and solution description) and the master file is a LaTeX document with some additional control commands. This paper gives an overview of the main design aims and principles as well as strategies for adaptation and extension. Hands-on illustrations---based on example exercises and control files provided in the package---are presented to get new users started easily.
Semi-parametric Regression under Model Uncertainty: Economic Applications
Economic theory does not always specify the functional relationship between dependent and explanatory variables, or even isolate a particular set of covariates. This means that model uncertainty is pervasive in empirical economics. In this paper, we indicate how Bayesian semi-parametric regression methods in combination with stochastic search variable selection can be used to address two model uncertainties simultaneously: (i) the uncertainty with respect to the variables which should be included in the model and (ii) the uncertainty with respect to the functional form of their effects. The presented approach enables the simultaneous identification of robust linear and nonlinear effects. The additional insights gained are illustrated on applications in empirical economics, namely willingness to pay for housing, and cross-country growth regression
arules - A Computational Environment for Mining Association Rules and Frequent Item Sets
Mining frequent itemsets and association rules is a popular and well researched approach for discovering interesting relationships between variables in large databases. The R package arules presented in this paper provides a basic infrastructure for creating and manipulating input data sets and for analyzing the resulting itemsets and rules. The package also includes interfaces to two fast mining algorithms, the popular C implementations of Apriori and Eclat by Christian Borgelt. These algorithms can be used to mine frequent itemsets, maximal frequent itemsets, closed frequent itemsets and association rules.
Identifying Mixtures of Mixtures Using Bayesian Estimation
The use of a finite mixture of normal distributions in model-based clustering
allows to capture non-Gaussian data clusters. However, identifying the clusters
from the normal components is challenging and in general either achieved by
imposing constraints on the model or by using post-processing procedures.
Within the Bayesian framework we propose a different approach based on sparse
finite mixtures to achieve identifiability. We specify a hierarchical prior
where the hyperparameters are carefully selected such that they are reflective
of the cluster structure aimed at. In addition this prior allows to estimate
the model using standard MCMC sampling methods. In combination with a
post-processing approach which resolves the label switching issue and results
in an identified model, our approach allows to simultaneously (1) determine the
number of clusters, (2) flexibly approximate the cluster distributions in a
semi-parametric way using finite mixtures of normals and (3) identify
cluster-specific parameters and classify observations. The proposed approach is
illustrated in two simulation studies and on benchmark data sets.Comment: 49 page
Automatic Generation of Exams in R
Package exams provides a framework for automatic generation of standardized statistical exams which is especially useful for large-scale exams. To employ the tools, users just need to supply a pool of exercises and a master file controlling the layout of the final PDF document. The exercises are specified in separate Sweave files (containing R code for data generation and LaTeX code for problem and solution description) and the master file is a LaTeX document with some additional control commands. This paper gives an overview of the main design aims and principles as well as strategies for adaptation and extension. Hands-on illustrations - based on example exercises and control files provided in the package - are presented to get new users started easily
Amos-type bounds for modified Bessel function ratios.
(please take a look at the pdf
FlexMix Version 2: Finite Mixtures with Concomitant Variables and Varying and Constant Parameters
flexmix provides infrastructure for flexible fitting of finite mixture models in R using the expectation-maximization (EM) algorithm or one of its variants. The functionality of the package was enhanced. Now concomitant variable models as well as varying and constant parameters for the component specific generalized linear regression models can be fitted. The application of the package is demonstrated on several examples, the implementation described and examples given to illustrate how new drivers for the component specific models and the concomitant variable models can be defined
- …