Search CORE

1,094 research outputs found

Multi-dimensional Point Process Models in R

Author: Roger D. Peng
Publication venue: Foundation for Open Access Statistics
Publication date: 01/01/2002
Field of study

A software package for fitting and assessing multi-dimensional point process models using the R sta- tistical computing environment is described. Methods of residual analysis based on random thinning are discussed and implemented. Features of the software are demonstrated using data on wildfire occurrences in Northern Los Angeles County, California

Ezid

Crossref

Directory of Open Access Journals

eScholarship - University of California

Journal of Statistical Software

Flexible Distributed Lag Models using Random Functions with Application to Estimating Mortality Displacement from Heat-Related Deaths

Author: Peng Roger D
Publication venue: Collection of Biostatistics Research Archive
Publication date: 14/12/2011
Field of study

Collection Of Biostatistics Research Archive

Predicting Patient No-Shows in Community Health Clinics: A Case Study in Designing a Data Analytic Product

Author: Peng Roger D.
Publication venue
Publication date: 26/10/2023
Field of study

The data science revolution has highlighted the varying roles that data analytic products can play in a different industries and applications. There has been particular interest in using analytic products coupled with algorithmic prediction models to aid in human decision-making. However, detailed descriptions of the decision-making process that leads to the design and development of analytic products are lacking in the statistical literature, making it difficult to accumulate a body of knowledge where students interested in the field of data science may look to learn about this process. In this paper, we present a case study describing the development of an analytic product for predicting whether patients will show up for scheduled appointments at a community health clinic. We consider the stakeholders involved and their interests, along with the real-world analytical and technical trade-offs involved in developing and deploying the product. Our goal here is to highlight the decisions made and evaluate them in the context of possible alternatives. We find that although this case study has some unique characteristics, there are lessons to be learned that could translate to other settings and applications

arXiv.org e-Print Archive

Caching and Visualizing Statistical Analyses

Author: Peng Roger D
Temple Lang Duncan
Publication venue: Collection of Biostatistics Research Archive
Publication date: 29/06/2009
Field of study

We present the cacher and CodeDepends packages for R, which provide tools for (1) caching and analyzing the code for statistical analyses and (2) distributing these analyses to others in an efficient manner over the web. The cacher package takes objects created by evaluating R expressions and stores them in key-value databases. These databases of cached objects can subsequently be assembled into “cache packages” for distribution over the web. The cacher package also provides tools to help readers examine the data and code in a statistical analysis and reproduce, modify, or improve upon the results. In addition, readers can easily conduct alternate analyses of the data. The CodeDepends package provides complementary tools for analyzing and visualizing the code for a statistical analysis and this functionality has been integrated into the cacher package. In this chapter we describe the cacher and CodeDepends packages and provide examples of how they can be used for reproducible research

Collection Of Biostatistics Research Archive

Spatial Misalignment in time series studies of air pollution and health data

Author: Bell Michelle L
Peng Roger D
Publication venue: Collection of Biostatistics Research Archive
Publication date: 01/12/2008
Field of study

Time series studies of environmental exposures often involve comparing daily changes in a toxicant measured at a point in space with daily changes in an aggregate measure of health. Spatial misalignment of the exposure and response variables can bias the estimation of health risk and the magnitude of this bias depends on the spatial variation of the exposure of interest. In air pollution epidemiology, there is an increasing focus on estimating the health effects of the chemical components of particulate matter. One issue that is raised by this new focus is the spatial misalignment error introduced by the lack of spatial homogeneity in many of the particulate matter components. Current approaches to estimating short-term health risks via time series modeling do not take into account the spatial properties of the chemical components and therefore could result in biased estimation of those risks. We present a spatial-temporal statistical model for quantifying spatial misalignment error and show how adjusted heath risk estimates can be obtained using a regression calibration approach and a two-stage Bayesian model. We apply our methods to a database containing information on hospital admissions, air pollution, and weather for 20 large urban counties in the United States

Collection Of Biostatistics Research Archive

Modeling Data Analytic Iteration With Probabilistic Outcome Sets

Author: Hicks Stephanie C.
Peng Roger D.
Publication venue
Publication date: 01/02/2024
Field of study

In 1977 John Tukey described how in exploratory data analysis, data analysts use tools, such as data visualizations, to separate their expectations from what they observe. In contrast to statistical theory, an underappreciated aspect of data analysis is that a data analyst must make decisions by comparing the observed data or output from a statistical tool to what the analyst previously expected from the data. However, there is little formal guidance for how to make these data analytic decisions as statistical theory generally omits a discussion of who is using these statistical methods. In this paper, we propose a model for the iterative process of data analysis based on the analyst's expectations, using what we refer to as expected and anomaly probabilistic outcome sets, and the concept of statistical information gain. Here, we extend the basic idea of comparing an analyst's expectations to what is observed in a data visualization to more general analytic situations. Our model posits that the analyst's goal is to increase the amount of information the analyst has relative to what the analyst already knows, through successive analytic iterations. We introduce two criteria--expected information gain and anomaly information gain--to provide guidance about analytic decision-making and ultimately to improve the practice of data analysis. Finally, we show how our framework can be used to characterize common situations in practical data analysis.Comment: 30 page

arXiv.org e-Print Archive