1,087 research outputs found

    Multi-dimensional Point Process Models in R

    Get PDF
    A software package for fitting and assessing multi-dimensional point process models using the R sta- tistical computing environment is described. Methods of residual analysis based on random thinning are discussed and implemented. Features of the software are demonstrated using data on wildfire occurrences in Northern Los Angeles County, California

    Predicting Patient No-Shows in Community Health Clinics: A Case Study in Designing a Data Analytic Product

    Full text link
    The data science revolution has highlighted the varying roles that data analytic products can play in a different industries and applications. There has been particular interest in using analytic products coupled with algorithmic prediction models to aid in human decision-making. However, detailed descriptions of the decision-making process that leads to the design and development of analytic products are lacking in the statistical literature, making it difficult to accumulate a body of knowledge where students interested in the field of data science may look to learn about this process. In this paper, we present a case study describing the development of an analytic product for predicting whether patients will show up for scheduled appointments at a community health clinic. We consider the stakeholders involved and their interests, along with the real-world analytical and technical trade-offs involved in developing and deploying the product. Our goal here is to highlight the decisions made and evaluate them in the context of possible alternatives. We find that although this case study has some unique characteristics, there are lessons to be learned that could translate to other settings and applications

    Caching and Visualizing Statistical Analyses

    Get PDF
    We present the cacher and CodeDepends packages for R, which provide tools for (1) caching and analyzing the code for statistical analyses and (2) distributing these analyses to others in an efficient manner over the web. The cacher package takes objects created by evaluating R expressions and stores them in key-value databases. These databases of cached objects can subsequently be assembled into “cache packages” for distribution over the web. The cacher package also provides tools to help readers examine the data and code in a statistical analysis and reproduce, modify, or improve upon the results. In addition, readers can easily conduct alternate analyses of the data. The CodeDepends package provides complementary tools for analyzing and visualizing the code for a statistical analysis and this functionality has been integrated into the cacher package. In this chapter we describe the cacher and CodeDepends packages and provide examples of how they can be used for reproducible research

    Spatial Misalignment in time series studies of air pollution and health data

    Get PDF
    Time series studies of environmental exposures often involve comparing daily changes in a toxicant measured at a point in space with daily changes in an aggregate measure of health. Spatial misalignment of the exposure and response variables can bias the estimation of health risk and the magnitude of this bias depends on the spatial variation of the exposure of interest. In air pollution epidemiology, there is an increasing focus on estimating the health effects of the chemical components of particulate matter. One issue that is raised by this new focus is the spatial misalignment error introduced by the lack of spatial homogeneity in many of the particulate matter components. Current approaches to estimating short-term health risks via time series modeling do not take into account the spatial properties of the chemical components and therefore could result in biased estimation of those risks. We present a spatial-temporal statistical model for quantifying spatial misalignment error and show how adjusted heath risk estimates can be obtained using a regression calibration approach and a two-stage Bayesian model. We apply our methods to a database containing information on hospital admissions, air pollution, and weather for 20 large urban counties in the United States

    The National Morbidity, Mortality, and Air Pollution Study Database in R

    Get PDF
    The NMMAPS data package contains daily mortality, air pollution, and weather data originally assembled as part of the National Morbidity,Mortality, and Air Pollution Study (NMMAPS). The data have recently been updated and are available for 108 United States cities for the years 1987--2000. The package provides tools for building versions of the full database in a structured and reproducible manner. These database derivatives may be more suitable for particular analyses. We describe how to use the package to implement a multi-city time series analysis of mortality and PM(10). In addition we demonstrate how to reproduce recent findings based on the NMMAPS data
    • …
    corecore