Journal of Statistical Software
Not a member yet
1616 research outputs found
Sort by
GLMcat: An R Package for Generalized Linear Models for Categorical Responses
In statistical modeling, there is a wide variety of generalized linear models for categorical response variables (nominal or ordinal responses); yet, there is no software embracing all these models together in a unique and generic framework. We propose and present GLMcat, an R package to estimate generalized linear models implemented under the unified specification (r, F, Z) where r represents the ratio of probabilities (reference, cumulative, adjacent, or sequential), F the cumulative distribution function for the linkage, and Z the design matrix. All classical models (and their variations) for categorical data can be written as an (r, F, Z) triplet, thus, they can be fitted with GLMcat. The functions in the package are intuitive and user-friendly. For each of the three components, there are multiple alternatives from which the user should thoroughly select those that best address the objectives of the analysis. The main strengths of the GLMcat package are the possibility of choosing from a large number of link functions (defined by the composition of F and r) and the simplicity for setting constraints in the linear prediction, either on the intercepts or on the slopes. This paper proposes a methodological and practical guide for the appropriate selection of a model considering the concordance between the nature of the data and the properties of the model
ebnm: An R Package for Solving the Empirical Bayes Normal Means Problem Using a Variety of Prior Families
The empirical Bayes normal means (EBNM) model is important to many areas of statistics, including (but not limited to) multiple testing, wavelet denoising, and gene expression analysis. There are several existing software packages that can fit EBNM models under different prior assumptions and using different algorithms. However, the differences across interfaces complicate direct comparisons, and a number of important prior assumptions do not yet have implementations. Motivated by these issues, we developed the R package ebnm, which provides a unified interface for efficiently fitting EBNM models using a variety of prior assumptions, including nonparametric approaches. In some cases, we incorporated existing implementations into ebnm; in others, we implemented new fitting procedures, with an emphasis on speed and numerical stability. We illustrate the use of ebnm in a detailed analysis of baseball statistics. By providing a unified and easily extensible interface, ebnm can facilitate development of new methods in statistics, genetics, and other areas; as an example, we briefly discuss the R package flashier, which harnesses ebnm for flexible and robust matrix factorization
dame-flame: A Python Package Providing Fast Interpretable Matching for Causal Inference
dame-flame is a Python package for performing matching for observational causal inference on datasets containing discrete covariates. This package implements the dynamic almost matching exactly (DAME) and fast, large-scale almost matching exactly (FLAME) algorithms, which match treatment and control units on subsets of the covariates. The resulting matched groups are interpretable, because the matches are made directly on covariates, and high-quality, because machine learning is used to determine which covariates are important to match on instead of human inputs. The package provides several adjustable parameters to adapt the algorithms to specific applications, and can calculate treatment effects after matching. The most recent source code of the implementation is available at https://github.com/almost-matching-exactly/DAME-FLAME-Python-Package
hibayes: An R Package to Fit Individual-Level, Summary-Level and Single-Step Bayesian Regression Models for Genomic Prediction and Genome-Wide Association Studies
With the rapid development of sequencing technology, the costs of individual genotyping have been reduced dramatically, leading to genomic prediction and genome-wide association studies being widely promoted and used to predict the unknown phenotypes and to locate candidate genes for animal and plant economic traits and, increasingly, for human diseases. Developing new advanced statistical models to improve prediction accuracy and location precision for the traits with various genetic architectures has always been a hot topic in those two research domains. The Bayesian regression model (BRM) has played a crucial role in the past decade, and it has been used widely in relevant genetic analyses owing to its flexible model assumptions on the unknown genetic architecture of complex traits. To fully utilize the available data from either a self-designed experimental population or a public database, statistical geneticists have constantly extended the fitting capacity of BRM, and a series of new methodologies have been proposed for different application scenarios. Here we introduce the R package hibayes, a software tool that can be used to fit individual-level, summary-level, and single-step Bayesian regression models. Including also the richest methods achieved thus far, it covers most of the functionalities involved in the field of genomic prediction and genome-wide association studies, potentially helping to address a wide range of research problems, while retaining an easy-to-learn and flexible-to-use experience. We believe that package hibayes will facilitate the academic research and practical application of statistical genetics for humans, plants, and animals
Parsimoniously Fitting Large Multivariate Random Effects in glmmTMB
Multivariate random effects with unstructured variance-covariance matrices of large dimensions, q, can be a major challenge to estimate. In this paper, we introduce a new implementation of a reduced-rank approach to fit large dimensional multivariate random effects by writing them as a linear combination of d < q latent variables. By adding reduced-rank functionality to the package glmmTMB, we enhance the mixed models available to include random effects of dimensions that were previously not possible. We apply the reduced-rank random effect to two examples, estimating a generalized latent variable model for multivariate abundance data and a random-slopes model
TSCI: Two Stage Curvature Identification for Causal Inference with Invalid Instruments in R
TSCI implements treatment effect estimation from observational data under invalid instruments in the R statistical computing environment. Existing instrumental variable approaches rely on arguably strong and untestable identification assumptions, which limits their practical application. TSCI does not require the classical instrumental variable identification conditions and is effective even if all instruments are invalid. TSCI implements a two-stage algorithm. In the first stage, machine learning is used to cope with nonlinearities and interactions in the treatment model. In the second stage, a space to capture the instrument violations is selected in a data-adaptive way. These violations are then projected out to estimate the treatment effect
Learning Permutation Symmetry of a Gaussian Vector with gips in R
The study of hidden structures in data presents challenges in modern statistics and machine learning. We introduce the gips package in R, which identifies permutation subgroup symmetries in Gaussian vectors. gips serves two main purposes: Exploratory analysis in discovering hidden permutation symmetries and estimating the covariance matrix under permutation symmetry. It is competitive to canonical methods in dimensionality reduction while providing a new interpretation of the results. gips implements a novel Bayesian model selection procedure within Gaussian vectors invariant under the permutation subgroup introduced in Graczyk, Ishi, Kołodziejek, and Massam (2022b, The Annals of Statistics)
StatisticalProcessMonitoring.jl: A General Framework for Statistical Process Monitoring in Julia
Statistical process monitoring (SPM) control charts are widely used for monitoring the stability of sequential processes. Currently, there is no open-source software which provides a general and extensible implementation of control charts. StatisticalProcessMonitoring.jl is a novel Julia package which aims at addressing this gap, offering support for monitoring various type of data, such as univariate and multivariate observations, partially-observed data streams, and profiles. The package introduces an extensible SPM framework, allowing users to seamlessly design control charts for structured data types using the existing implementation. By introducing a flexible implementation of control charts, StatisticalProcessMonitoring.jl provides fully-automated and efficient algorithms for determining control limits and tuning control chart hyperparameters. These algorithms can accommodate various commonly-used performance metrics based on the run length distribution. The package further leverages existing packages in the Julia ecosystem to offer users a range of optimization and plotting functionalities
BayesMix: Bayesian Mixture Models in C++
We describe BayesMix, a C++ library for MCMC posterior simulation for general Bayesian mixture models. The goal of BayesMix is to provide a self-contained ecosystem to perform inference for mixture models to computer scientists, statisticians and practitioners. The key idea of this library is extensibility, as we wish the users to easily adapt our software to their specific Bayesian mixture models. In addition to the several models and MCMC algorithms for posterior inference included in the library, new users with little familiarity on mixture models and the related MCMC algorithms can extend our library with minimal coding effort. Our library is computationally very efficient when compared to competitor software. Examples show that the typical code runtimes are from two to 25 times faster than competitors for data dimension from one to ten. We also provide Python (bayesmixpy) and R (bayesmixr) interfaces. Our library is publicly available on GitHub at https://github.com/bayesmix-dev/bayesmix/
pyrichlet: A Python Package for Density Estimation and Clustering Using Gaussian Mixture Models
Bayesian nonparametric models have proven to be successful tools for clustering and density estimation. While there exists a nourished ecosystem of implementations in R, for Python there are only a few. Here we develop a Python package called pyrichlet, for Bayesian nonparametric density estimation and clustering using various state-of-the-art Gaussian mixture models that generalize the well established Dirichlet process mixture, many of which are fairly new. Implementation is performed using Markov chain Monte Carlo techniques as well as variational Bayes methods. This article contains a detailed description of pyrichlet and examples for its usage with a real dataset