1,605 research outputs found
Computationally Efficient Simulation of Queues: The R Package queuecomputer
Large networks of queueing systems model important real-world systems such as
MapReduce clusters, web-servers, hospitals, call centers and airport passenger
terminals. To model such systems accurately, we must infer queueing parameters
from data. Unfortunately, for many queueing networks there is no clear way to
proceed with parameter inference from data. Approximate Bayesian computation
could offer a straightforward way to infer parameters for such networks if we
could simulate data quickly enough.
We present a computationally efficient method for simulating from a very
general set of queueing networks with the R package queuecomputer. Remarkable
speedups of more than 2 orders of magnitude are observed relative to the
popular DES packages simmer and simpy. We replicate output from these packages
to validate the package.
The package is modular and integrates well with the popular R package dplyr.
Complex queueing networks with tandem, parallel and fork/join topologies can
easily be built with these two packages together. We show how to use this
package with two examples: a call center and an airport terminal.Comment: Updated for queuecomputer_0.8.
EMMIXcskew: an R Package for the Fitting of a Mixture of Canonical Fundamental Skew t-Distributions
This paper presents an R package EMMIXcskew for the fitting of the canonical
fundamental skew t-distribution (CFUST) and finite mixtures of this
distribution (FM-CFUST) via maximum likelihood (ML). The CFUST distribution
provides a flexible family of models to handle non-normal data, with parameters
for capturing skewness and heavy-tails in the data. It formally encompasses the
normal, t, and skew-normal distributions as special and/or limiting cases. A
few other versions of the skew t-distributions are also nested within the CFUST
distribution. In this paper, an Expectation-Maximization (EM) algorithm is
described for computing the ML estimates of the parameters of the FM-CFUST
model, and different strategies for initializing the algorithm are discussed
and illustrated. The methodology is implemented in the EMMIXcskew package, and
examples are presented using two real datasets. The EMMIXcskew package contains
functions to fit the FM-CFUST model, including procedures for generating
different initial values. Additional features include random sample generation
and contour visualization in 2D and 3D
Bradley-Terry models in R : the BradleyTerry2 package
This is a short overview of the R add-on package BradleyTerry2, which facilitates the specification and fitting of Bradley-Terry logit, probit or cauchit models to pair-comparison data. Included are the standard 'unstructured' Bradley-Terry model, structured versions in which the parameters are related through a linear predictor to explanatory variables, and the possibility of an order or 'home advantage' effect or other 'contest-specific' effects. Model fitting is either by maximum likelihood, by penalized quasi-likelihood (for models which involve a random effect), or by bias-reduced maximum likelihood in which the first-order asymptotic bias of parameter estimates is eliminated. Also provided are a simple and efficient approach to handling missing covariate data, and suitably-defined residuals for diagnostic checking of the linear predictor
Learning Large-Scale Bayesian Networks with the sparsebn Package
Learning graphical models from data is an important problem with wide
applications, ranging from genomics to the social sciences. Nowadays datasets
often have upwards of thousands---sometimes tens or hundreds of thousands---of
variables and far fewer samples. To meet this challenge, we have developed a
new R package called sparsebn for learning the structure of large, sparse
graphical models with a focus on Bayesian networks. While there are many
existing software packages for this task, this package focuses on the unique
setting of learning large networks from high-dimensional data, possibly with
interventions. As such, the methods provided place a premium on scalability and
consistency in a high-dimensional setting. Furthermore, in the presence of
interventions, the methods implemented here achieve the goal of learning a
causal network from data. Additionally, the sparsebn package is fully
compatible with existing software packages for network analysis.Comment: To appear in the Journal of Statistical Software, 39 pages, 7 figure
mplot: An R Package for Graphical Model Stability and Variable Selection Procedures
The mplot package provides an easy to use implementation of model stability
and variable inclusion plots (M\"uller and Welsh 2010; Murray, Heritier, and
M\"uller 2013) as well as the adaptive fence (Jiang, Rao, Gu, and Nguyen 2008;
Jiang, Nguyen, and Rao 2009) for linear and generalised linear models. We
provide a number of innovations on the standard procedures and address many
practical implementation issues including the addition of redundant variables,
interactive visualisations and approximating logistic models with linear
models. An option is provided that combines our bootstrap approach with glmnet
for higher dimensional models. The plots and graphical user interface leverage
state of the art web technologies to facilitate interaction with the results.
The speed of implementation comes from the leaps package and cross-platform
multicore support.Comment: 28 pages, 9 figure
Fitting Prediction Rule Ensembles with R Package pre
Prediction rule ensembles (PREs) are sparse collections of rules, offering
highly interpretable regression and classification models. This paper presents
the R package pre, which derives PREs through the methodology of Friedman and
Popescu (2008). The implementation and functionality of package pre is
described and illustrated through application on a dataset on the prediction of
depression. Furthermore, accuracy and sparsity of PREs is compared with that of
single trees, random forest and lasso regression in four benchmark datasets.
Results indicate that pre derives ensembles with predictive accuracy comparable
to that of random forests, while using a smaller number of variables for
prediction
TMB: Automatic Differentiation and Laplace Approximation
TMB is an open source R package that enables quick implementation of complex
nonlinear random effect (latent variable) models in a manner similar to the
established AD Model Builder package (ADMB, admb-project.org). In addition, it
offers easy access to parallel computations. The user defines the joint
likelihood for the data and the random effects as a C++ template function,
while all the other operations are done in R; e.g., reading in the data. The
package evaluates and maximizes the Laplace approximation of the marginal
likelihood where the random effects are automatically integrated out. This
approximation, and its derivatives, are obtained using automatic
differentiation (up to order three) of the joint likelihood. The computations
are designed to be fast for problems with many random effects (~10^6) and
parameters (~10^3). Computation times using ADMB and TMB are compared on a
suite of examples ranging from simple models to large spatial models where the
random effects are a Gaussian random field. Speedups ranging from 1.5 to about
100 are obtained with increasing gains for large problems. The package and
examples are available at http://tmb-project.org
Dynamic Model Averaging for Practitioners in Economics and Finance: The eDMA Package
Raftery, Karny, and Ettler (2010) introduce an estimation technique, which
they refer to as Dynamic Model Averaging (DMA). In their application, DMA is
used to predict the output strip thickness for a cold rolling mill, where the
output is measured with a time delay. Recently, DMA has also shown to be useful
in macroeconomic and financial applications. In this paper, we present the eDMA
package for DMA estimation implemented in R. The eDMA package is especially
suited for practitioners in economics and finance, where typically a large
number of predictors are available. Our implementation is up to 133 times
faster then a standard implementation using a single-core CPU. Thus, with the
help of this package, practitioners are able to perform DMA on a standard PC
without resorting to large clusters, which are not easily available to all
researchers. We demonstrate the usefulness of this package through simulation
experiments and an empirical application using quarterly U.S. inflation data.Comment: 21 pages, 5 figures, 2 table
EMMIX-uskew: An R Package for Fitting Mixtures of Multivariate Skew t-distributions via the EM Algorithm
This paper describes an algorithm for fitting finite mixtures of unrestricted
Multivariate Skew t (FM-uMST) distributions. The package EMMIX-uskew implements
a closed-form expectation-maximization (EM) algorithm for computing the maximum
likelihood (ML) estimates of the parameters for the (unrestricted) FM-MST model
in R. EMMIX-uskew also supports visualization of fitted contours in two and
three dimensions, and random sample generation from a specified FM-uMST
distribution.
Finite mixtures of skew t-distributions have proven to be useful in modelling
heterogeneous data with asymmetric and heavy tail behaviour, for example,
datasets from flow cytometry. In recent years, various versions of mixtures
with multivariate skew t (MST) distributions have been proposed. However, these
models adopted some restricted characterizations of the component MST
distributions so that the E-step of the EM algorithm can be evaluated in closed
form. This paper focuses on mixtures with unrestricted MST components, and
describes an iterative algorithm for the computation of the ML estimates of its
model parameters.
The usefulness of the proposed algorithm is demonstrated in three
applications to real data sets. The first example illustrates the use of the
main function fmmst in the package by fitting a MST distribution to a bivariate
unimodal flow cytometric sample. The second example fits a mixture of MST
distributions to the Australian Institute of Sport (AIS) data, and demonstrate
that EMMIX-uskew can provide better clustering results than mixtures with
restricted MST components. In the third example, EMMIX-uskew is applied to
classify cells in a trivariate flow cytometric dataset. Comparisons with other
available methods suggests that the EMMIX-uskew result achieved a lower
misclassification rate with respect to the labels given by benchmark gating
analysis
archivist: An R Package for Managing, Recording and Restoring Data Analysis Results
Everything that exists in R is an object [Chambers2016]. This article
examines what would be possible if we kept copies of all R objects that have
ever been created. Not only objects but also their properties, meta-data,
relations with other objects and information about context in which they were
created.
We introduce archivist, an R package designed to improve the management of
results of data analysis. Key functionalities of this package include: (i)
management of local and remote repositories which contain R objects and their
meta-data (objects' properties and relations between them); (ii) archiving R
objects to repositories; (iii) sharing and retrieving objects (and it's
pedigree) by their unique hooks; (iv) searching for objects with specific
properties or relations to other objects; (v) verification of object's identity
and context of it's creation.
The presented archivist package extends, in a combination with packages such
as knitr and Sweave, the reproducible research paradigm by creating new ways to
retrieve and validate previously calculated objects. These new features give a
variety of opportunities such as: sharing R objects within reports or articles;
adding hooks to R objects in table or figure captions; interactive exploration
of object repositories; caching function calls with their results; retrieving
object's pedigree (information about how the object was created); automated
tracking of the performance of considered models, restoring R libraries to the
state in which object was archived.Comment: Submitted to JSS in 2015, conditionally accepte
- …