3 research outputs found
ebnm: An R Package for Solving the Empirical Bayes Normal Means Problem Using a Variety of Prior Families
The empirical Bayes normal means (EBNM) model is important to many areas of
statistics, including (but not limited to) multiple testing, wavelet denoising,
multiple linear regression, and matrix factorization. There are several
existing software packages that can fit EBNM models under different prior
assumptions and using different algorithms; however, the differences across
interfaces complicate direct comparisons. Further, a number of important prior
assumptions do not yet have implementations. Motivated by these issues, we
developed the R package ebnm, which provides a unified interface for
efficiently fitting EBNM models using a variety of prior assumptions, including
nonparametric approaches. In some cases, we incorporated existing
implementations into ebnm; in others, we implemented new fitting procedures
with a focus on speed and numerical stability. To demonstrate the capabilities
of the unified interface, we compare results using different prior assumptions
in two extended examples: the shrinkage estimation of baseball statistics; and
the matrix factorization of genetics data (via the new R package flashier). In
summary, ebnm is a convenient and comprehensive package for performing EBNM
analyses under a wide range of prior assumptions.Comment: 43 pages, 19 figure
Recommended from our members
Empirical Bayes Matrix Factorization: Methods and Applications
Matrix factorization methods are commonly used to explore structure in multivariate data. When the structure can be expected to have a sparse representation, then a sparsity-inducing method will often be preferred. Empirical Bayes matrix factorization (EBMF), a recent approach that uses the observed data to estimate priors, can adaptively model sparsity and thus yield representations with interpretable components while also performing well on inferential tasks. Further, since fitting the EBMF model can be reduced to solving a series of empirical Bayes normal means (EBNM) subproblems, which can be relatively easily solved for a wide variety of prior families — sparse and otherwise —, the approach is very general.
The dissertation extends the reach of EBMF in several ways. The first chapter describes the R package ebnm, which I developed in order to provide a unified interface for efficiently solving the EBNM problem using a range of prior families. Existing packages are harnessed when practical; in other cases, solutions are implemented from scratch. The second chapter details my implementation of the EBMF algorithm, flashier, which was designed to handle much larger datasets and offer more flexibility than the original implementation of EBMF. In particular, I show that EBMF can yield insight into single-cell RNA sequencing data, outperforming more commonly used methods on tasks such as rare cell type detection in spite of the fact that the EBMF model is misspecified for count data. The final chapter considers data with an underlying tree-like structure, with particular attention to population genetics data. In addition to providing theory that helps to elucidate the kind of factorization that one should be looking for, I propose a tailored EBMF method that can successfully identify tree-like structure in both simulated and real datasets