Journal of Statistical Software
Not a member yet
    1576 research outputs found

    cubble: An R Package for Organizing and Wrangling Multivariate Spatio-Temporal Data

    Get PDF
    Multivariate spatio-temporal data refers to multiple measurements taken across space and time. For many analyses, spatial and time components can be separately studied: for example, to explore the temporal trend of one variable for a single spatial location, or to model the spatial distribution of one variable at a given time. However for some studies, it is important to analyze different aspects of the spatio-temporal data simultaneously, for instance, temporal trends of multiple variables across locations. In order to facilitate the study of different portions or combinations of spatio-temporal data, we introduce a new class, cubble, with a suite of functions enabling easy slicing and dicing on different spatio-temporal components. The proposed cubble class ensures that all the components of the data are easy to access and manipulate while providing flexibility for data analysis. In addition, the cubble package facilitates visual and numerical explorations of the data while easing data wrangling and modelling. The cubble class and the tools implemented in the package are illustrated with examples from climate data analysis

    melt: Multiple Empirical Likelihood Tests in R

    Get PDF
    Empirical likelihood enables a nonparametric, likelihood-driven style of inference without relying on assumptions frequently made in parametric models. Empirical likelihood-based tests are asymptotically pivotal and thus avoid explicit studentization. This paper presents the R package melt that provides a unified framework for data analysis with empirical likelihood methods. A collection of functions are available to perform multiple empirical likelihood tests for linear and generalized linear models in R. The package melt offers an easy-to-use interface and flexibility in specifying hypotheses and calibration methods, extending the framework to simultaneous inferences. Hypothesis testing uses a projected gradient algorithm to solve constrained empirical likelihood optimization problems. The core computational routines are implemented in C++, with OpenMP for parallel computation

    anomaly: Detection of Anomalous Structure in Time Series Data

    Get PDF
    One of the contemporary challenges in anomaly detection is the ability to detect, and differentiate between, both point and collective anomalies within a data sequence or time series. The anomaly package has been developed to provide users with a choice of anomaly detection methods and, in particular, provides an implementation of the recently proposed collective and point anomaly family of anomaly detection algorithms. This article describes the methods implemented whilst also highlighting their application to simulated data as well as real data examples contained in the package

    salmon: A Symbolic Linear Regression Package for Python

    Get PDF
    One of the most attractive features of R is its linear modeling capabilities. We describe a Python package, salmon, that brings the best of R's linear modeling functionality to Python in a Pythonic way - by providing composable objects for specifying and fitting linear models. This object-oriented design also enables other features that enhance easeof-use, such as automatic visualizations and intelligent model building

    sparsegl: An R Package for Estimating Sparse Group Lasso

    Get PDF
    The sparse group lasso is a high-dimensional regression technique that is useful for problems whose predictors have a naturally grouped structure and where sparsity is encouraged at both the group and individual predictor level. In this paper we discuss a new R package for computing such regularized models. The intention is to provide highly optimized solution routines enabling analysis of very large datasets, especially in the context of sparse design matrices

    DoubleML: An Object-Oriented Implementation of Double Machine Learning in R

    Get PDF
    The R package DoubleML implements the double/debiased machine learning framework of Chernozhukov, Chetverikov, Demirer, Duflo, Hansen, Newey, and Robins (2018). It provides functionalities to estimate parameters in causal models based on machine learning methods. The double machine learning framework consists of three key ingredients: Neyman orthogonality, high-quality machine learning estimation and sample splitting. Estimation of nuisance components can be performed by various state-of-the-art machine learning methods that are available in the mlr3 ecosystem. DoubleML makes it possible to perform inference in a variety of causal models, including partially linear and interactive regression models and their extensions to instrumental variable estimation. The object-oriented implementation of DoubleML enables a high flexibility for the model specification and makes it easily extendable. This paper serves as an introduction to the double machine learning framework and the R package DoubleML. In reproducible code examples with simulated and real data sets, we demonstrate how DoubleML users can perform valid inference based on machine learning methods

    fHMM: Hidden Markov Models for Financial Time Series in R

    Get PDF
    Hidden Markov models constitute a versatile class of statistical models for time series that are driven by hidden states. In financial applications, the hidden states can often be linked to market regimes such as bearish and bullish markets or recessions and periods of economics growth. To give an example, when the market is in a nervous state, corresponding stock returns often follow some distribution with relatively high variance, whereas calm periods are often characterized by a different distribution with relatively smaller variance. Hidden Markov models can be used to explicitly model the distribution of the observations conditional on the hidden states and the transitions between states, and thus help us to draw a comprehensive picture of market behavior. While various implementations of hidden Markov models are available, a comprehensive R package that is tailored to financial applications is still lacking. In this paper, we introduce the R package fHMM, which provides various tools for applying hidden Markov models to financial time series. It contains functions for fitting hidden Markov models to data, conducting simulation experiments, and decoding the hidden state sequence. Furthermore, functions for model checking, model selection, and state prediction are provided. In addition to basic hidden Markov models, hierarchical hidden Markov models are implemented, which can be used to jointly model multiple data streams that were observed at different temporal resolutions. The aim of the fHMM package is to give R users with an interest in financial applications access to hidden Markov models and their extensions

    Modeling Big, Heterogeneous, Non-Gaussian Spatial and Spatio-Temporal Data Using FRK

    Get PDF
    Non-Gaussian spatial and spatio-temporal data are becoming increasingly prevalent, and their analysis is needed in a variety of disciplines. FRK is an R package for spatial and spatio-temporal modeling and prediction with very large data sets that, to date, has only supported linear process models and Gaussian data models. In this paper, we describe a major upgrade to FRK that allows for non-Gaussian data to be analyzed in a generalized linear mixed model framework. These vastly more general spatial and spatio-temporal models are fitted using the Laplace approximation via the software TMB. The existing functionality of FRK is retained with this advance into non-Gaussian models; in particular, it allows for automatic basis-function construction, it can handle both point-referenced and areal data simultaneously, and it can predict process values at any spatial support from these data. This new version of FRK also allows for the use of a large number of basis functions when modeling the spatial process, and thus it is often able to achieve more accurate predictions than previous versions of the package in a Gaussian setting. We demonstrate innovative features in this new version of FRK, highlight its ease of use, and compare it to alternative packages using both simulated and real data sets

    Generalized Plackett-Luce Likelihoods

    Get PDF
    The hyper2 package provides functionality to work with extensions of the Bradley-Terry probability model such as Plackett-Luce likelihood including team strengths and reified entities (monsters). The package allows one to use relatively natural R idiom to manipulate such likelihood functions. Here, I present a generalization of hyper2 in which multiple entities are constrained to have identical Bradley-Terry strengths. A new S3 class 'hyper3', along with associated methods, is motivated and introduced. Three datasets are analyzed, each analysis furnishing new insight, and each highlighting different capabilities of the package

    CRTFASTGEEPWR: A SAS Macro for Power of Generalized Estimating Equations Analysis of Multi-Period Cluster Randomized Trials with Application to Stepped Wedge Designs

    Get PDF
    Multi-period cluster randomized trials (CRTs) are increasingly used for the evaluation of interventions delivered at the group level. While generalized estimating equations (GEE) are commonly used to provide population-averaged inference in CRTs, there is a gap of general methods and statistical software tools for power calculation based on multi-parameter, within-cluster correlation structures suitable for multi-period CRTs that can accommodate both complete and incomplete designs. A computationally fast, nonsimulation procedure for determining statistical power is described for the GEE analysis of complete and incomplete multi-period cluster randomized trials. The procedure is implemented via a SAS macro, CRTFASTGEEPWR, which is applicable to binary, count and continuous responses and several correlation structures in multi-period CRTs. The SAS macro is illustrated in the power calculation of two complete and two incomplete stepped wedge cluster randomized trial scenarios under different specifications of marginal mean model and within-cluster correlation structure. The proposed GEE power method is quite general as demonstrated in the SAS macro with numerous input options. The power procedure and macro can also be used in the planning of parallel and crossover CRTs in addition to cross-sectional and closed cohort stepped wedge trials

    1,511

    full texts

    1,576

    metadata records
    Updated in last 30 days.
    Journal of Statistical Software
    Access Repository Dashboard
    Do you manage Open Research Online? Become a CORE Member to access insider analytics, issue reports and manage access to outputs from your repository in the CORE Repository Dashboard! 👇