30 research outputs found
Partial identification for discrete data with nonignorable missing outcomes
Nonignorable missing outcomes are common in real world datasets and often
require strong parametric assumptions to achieve identification. These
assumptions can be implausible or untestable, and so we may forgo them in
favour of partially identified models that narrow the set of a priori possible
values to an identification region. Here we propose a new nonparametric Bayes
method that allows for the incorporation of multiple clinically relevant
restrictions of the parameter space simultaneously. We focus on two common
restrictions, instrumental variables and the direction of missing data bias,
and investigate how these restrictions narrow the identification region for
parameters of interest. Additionally, we propose a rejection sampling algorithm
that allows us to quantify the evidence for these assumptions in the data. We
compare our method to a standard Heckman selection model in both simulation
studies and in an applied problem examining the effectiveness of cash-transfers
for people experiencing homelessness.Comment: 43 pages, 4 figures, 4 table
The Neuroscience Information Framework: A Data and Knowledge Environment for Neuroscience
With support from the Institutes and Centers forming the NIH Blueprint for Neuroscience Research, we have designed and implemented a new initiative for integrating access to and use of Web-based neuroscience resources: the Neuroscience Information Framework. The Framework arises from the expressed need of the neuroscience community for neuroinformatic tools and resources to aid scientific inquiry, builds upon prior development of neuroinformatics by the Human Brain Project and others, and directly derives from the Society for Neuroscience’s Neuroscience Database Gateway. Partnered with the Society, its Neuroinformatics Committee, and volunteer consultant-collaborators, our multi-site consortium has developed: (1) a comprehensive, dynamic, inventory of Web-accessible neuroscience resources, (2) an extended and integrated terminology describing resources and contents, and (3) a framework accepting and aiding concept-based queries. Evolving instantiations of the Framework may be viewed at http://nif.nih.gov, http://neurogateway.org, and other sites as they come on line
Terminology for Neuroscience Data Discovery: Multi-tree Syntax and Investigator-Derived Semantics
The Neuroscience Information Framework: A Data and Knowledge Environment for Neuroscience
Rao-Blackwellizing field-goal percentage
Shooting skill in the NBA is typically measured by field goal percentage (FG%) - the number of makes out of the total number of shots. Even more advanced metrics like true shooting percentage are calculated by counting each player’s 2-point, 3-point, and free throw makes and misses, ignoring the spatiotemporal data now available (Kubatko et al. 2007). In this paper we aim to better characterize player shooting skill by introducing a new estimator based on post-shot release shot-make probabilities. Via the Rao-Blackwell theorem, we propose a shot-make probability model that conditions probability estimates on shot trajectory information, thereby reducing the variance of the new estimator relative to standard FG%. We obtain shooting information by using optical tracking data to estimate three factors for each shot: entry angle, shot depth, and left-right accuracy. Next, we use these factors to model shot-make probabilities for all shots in the 2014-15 season, and use these probabilities to produce a Rao-Blackwellized FG% estimator (RB-FG%) for each player. We present a variety of results derived from this shot trajectory data, as well as demonstrate that RB-FG% is better than raw FG% at predicting 3-point shooting and true-shooting percentages. Overall, we find that conditioning shot-make probabilities on spatial trajectory information stabilizes inference of FG%, creating the potential to estimate shooting statistics and related metrics earlier in a season than was previously possible
Bayesian causal inference for discrete data
Causal inference provides a framework for estimating how a response changes when a given cause of interest changes. When all data are discrete we can use saturated nonparametric models to avoid unnecessary assumptions in our causal inference modelling, where we specify unique parameters for all possible combinations of treatments and confounders when estimating an outcome. Bayesian methods allow us to incorporate prior information into these saturated models, making them usable beyond simple settings with low dimensional confounders. In this thesis we propose two new nonparametric Bayes methods for causal inference based on saturated modelling.
The first method combines a parametric model with a nonparametric saturated outcome model to estimate treatment effects in observational studies with longitudinal data. By conceptually splitting the data, we can combine these models while maintaining a conjugate framework, allowing us to avoid the use of Markov chain Monte Carlo methods. Approximations using the central limit theorem and random sampling allows our method to be scaled to high-dimensional confounders.
The second method uses prior restrictions of the parameter space of a saturated model to partially identify causal effect estimates in scenarios with nonignorable missing outcome data. We focus on two common restrictions, instrumental variables and the direction of missing data bias, and investigate how these restrictions narrow the identification region for parameters of interest. Additionally, we propose a rejection sampling algorithm that allows us to quantify the evidence for these assumptions in the data.
Saturated models require discrete data so continuous data must be discretized to use these methods, which can introduce residual confounding. We conclude by proposing a new soft-thresholding technique to discretize continuous confounders in the context of frequentist linear regression. We show using a triangular distribution weighting function can reduce the bias induced by discretization, while maintaining the interpretability benefits typically associated with discrete variables.Science, Faculty ofStatistics, Department ofGraduat