1,182 research outputs found
Evaluating Throwing Ability in Baseball
We present a quantitative analysis of throwing ability for major league
outfielders and catchers. We use detailed game event data to tabulate success
and failure events in outfielder and catcher throwing opportunities. We
attribute a run contribution to each success or failure which are tabulated for
each player in each season. We use four seasons of data to estimate the overall
throwing ability of each player using a Bayesian hierarchical model. This model
allows us to shrink individual player estimates towards an overall population
mean depending on the number of opportunities for each player. We use the
posterior distribution of player abilities from this model to identify players
with significant positive and negative throwing contributions.Comment: Accepted for publication in the Journal of Quantitative Analysis in
Sport
Changes in the Distribution of Income Volatility
Recent research has documented a significant rise in the volatility (e.g.,
expected squared change) of individual incomes in the U.S. since the 1970s.
Existing measures of this trend abstract from individual heterogeneity,
effectively estimating an increase in average volatility. We decompose this
increase in average volatility and find that it is far from representative of
the experience of most people: there has been no systematic rise in volatility
for the vast majority of individuals. The rise in average volatility has been
driven almost entirely by a sharp rise in the income volatility of those
expected to have the most volatile incomes, identified ex-ante by large income
changes in the past. We document that the self-employed and those who
self-identify as risk-tolerant are much more likely to have such volatile
incomes; these groups have experienced much larger increases in income
volatility than the population at large. These results color the policy
implications one might draw from the rise in average volatility. While the
basic results are apparent from PSID summary statistics, providing a complete
characterization of the dynamics of the volatility distribution is a
methodological challenge. We resolve these difficulties with a Markovian
hierarchical Dirichlet process that builds on work from the non-parametric
Bayesian statistics literature
Estimating an NBA player's impact on his team's chances of winning
Traditional NBA player evaluation metrics are based on scoring differential
or some pace-adjusted linear combination of box score statistics like points,
rebounds, assists, etc. These measures treat performances with the outcome of
the game still in question (e.g. tie score with five minutes left) in exactly
the same way as they treat performances with the outcome virtually decided
(e.g. when one team leads by 30 points with one minute left). Because they
ignore the context in which players perform, these measures can result in
misleading estimates of how players help their teams win. We instead use a win
probability framework for evaluating the impact NBA players have on their
teams' chances of winning. We propose a Bayesian linear regression model to
estimate an individual player's impact, after controlling for the other players
on the court. We introduce several posterior summaries to derive rank-orderings
of players within their team and across the league. This allows us to identify
highly paid players with low impact relative to their teammates, as well as
players whose high impact is not captured by existing metrics.Comment: To appear in the Journal of Quantitative Analysis of Spor
Hierarchical Bayesian Modeling of Hitting Performance in Baseball
We have developed a sophisticated statistical model for predicting the
hitting performance of Major League baseball players. The Bayesian paradigm
provides a principled method for balancing past performance with crucial
covariates, such as player age and position. We share information across time
and across players by using mixture distributions to control shrinkage for
improved accuracy. We compare the performance of our model to current
sabermetric methods on a held-out season (2006), and discuss both successes and
limitations
Bayesian variable selection and data integration for biological regulatory networks
A substantial focus of research in molecular biology are gene regulatory
networks: the set of transcription factors and target genes which control the
involvement of different biological processes in living cells. Previous
statistical approaches for identifying gene regulatory networks have used gene
expression data, ChIP binding data or promoter sequence data, but each of these
resources provides only partial information. We present a Bayesian hierarchical
model that integrates all three data types in a principled variable selection
framework. The gene expression data are modeled as a function of the unknown
gene regulatory network which has an informed prior distribution based upon
both ChIP binding and promoter sequence data. We also present a variable
weighting methodology for the principled balancing of multiple sources of prior
information. We apply our procedure to the discovery of gene regulatory
relationships in Saccharomyces cerevisiae (Yeast) for which we can use several
external sources of information to validate our results. Our inferred
relationships show greater biological relevance on the external validation
measures than previous data integration methods. Our model also estimates
synergistic and antagonistic interactions between transcription factors, many
of which are validated by previous studies. We also evaluate the results from
our procedure for the weighting for multiple sources of prior information.
Finally, we discuss our methodology in the context of previous approaches to
data integration and Bayesian variable selection.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS130 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
- …