14 research outputs found

    The Pythagorean Won-Loss Formula and Hockey: A Statistical Justification for Using the Classic Baseball Formula as an Evaluative Tool in Hockey

    Full text link
    Originally devised for baseball, the Pythagorean Won-Loss formula estimates the percentage of games a team should have won at a particular point in a season. For decades, this formula had no mathematical justification. In 2006, Steven Miller provided a statistical derivation by making some heuristic assumptions about the distributions of runs scored and allowed by baseball teams. We make a similar set of assumptions about hockey teams and show that the formula is just as applicable to hockey as it is to baseball. We hope that this work spurs research in the use of the Pythagorean Won-Loss formula as an evaluative tool for sports outside baseball.Comment: 21 pages, 4 figures; Forthcoming in The Hockey Research Journal: A Publication of the Society for International Hockey Research, 2012/1

    First Order Approximations of the Pythagorean Won-Loss Formula for Predicting MLB Teams' Winning Percentages

    Full text link
    We mathematically prove that an existing linear predictor of baseball teams' winning percentages (Jones and Tappin 2005) is simply just a first-order approximation to Bill James' Pythagorean Won-Loss formula and can thus be written in terms of the formula's well-known exponent. We estimate the linear model on twenty seasons of Major League Baseball data and are able to verify that the resulting coefficient estimate, with 95% confidence, is virtually identical to the empirically accepted value of 1.82. Our work thus helps explain why this simple and elegant model is such a strong linear predictor.Comment: 7 pages, 1 Table, Appendix with Alternative Proof; By the Numbers 21, 201

    Contributions to Bayesian Statistical Modeling in Public Policy Research

    Get PDF
    This dissertation improves existing Bayesian statistical methodologies and applies these improvements to a variety of important public policy questions. The manuscript is divided into six chapters. The first chapter provides an overview of the various chapters of the dissertation. The second chapter improves existing Bayesian binary logistic regression methodologies using polynomial expansions as an alternative to existing Markov Chain Monte Carlo (MCMC) methods. Our improvements make the estimation technique quite useful for a variety of applications. We also demonstrate the methodology to be considerably faster than existing MCMC methods. These computational gains are quite useful for models analyzing large data sets involving high-dimensional parameter spaces. We apply this methodology to a child poverty data set to analyze the potential causes of child poverty. The next chapter improves upon a well-known technique in semiparametric modeling known as density ratio estimation. This methodology is useful in principle; however, it suffers from one primary limitation - The technique has thus far been incapable of modeling individual-level heterogeneity. Modeling heterogeneity is important as there is often no a priori reason to believe that different individuals (or observations) in a data set will behave in an identical manner. We ameliorate this limitation in the third chapter of this dissertation by adapting density ratio estimation methods to accommodate individual-level heterogeneity. We apply this new methodology to an analysis of the efficacy of medical malpractice reform across the country. In the fourth chapter of this dissertation, we shift our focus toward improving Bayesian credible interval estimation via semiparametric density ratio estimation. We do so by applying an innovative adaptation of the methodology, known as out of sample fusion, to posterior samples from a hierarchical Bayesian linear model looking at the efficacy of the welfare reform of the 1990s. In the fifth chapter, we extend the application of this methodology to credible interval estimation of a hierarchical generalized linear model used for analyzing terrorism data in a number of major conflicts across the globe. We use our results to offer some prescriptive policy suggestions regarding counterterrorism policy. The final chapter concludes the dissertation and offers a number of suggestions for further research. We emphasize that the modeling contributions presented in this dissertation are useful in myriads of other applied problems beyond just the public policy applications presented here

    Unfounded FUND: Yet Another EPA Model Not Ready for the Big Game

    Get PDF
    n Using the OMB-mandated discount rate of 7 percent, the Climate Framework for Uncertainty, Negotiation and Distribution (FUND) model suggests an average social cost of carbon (SCC) of essentially zero dollars, suggesting no net economic damages of global warming. n Upon using the OMB-mandated discount rate in conjunction with updating the equilibrium climate sensitivity distribution, the model reduces its estimate of the SCC for 2020 by nearly $34 a ton (a drop of more than 102 percent). n The FUND model even allows negative estimates of the SCC. In some instances, the chance of the SCC's being negative is nearly 70 percent. n With such great sensitivity to assumptions producing results all over the map, the FUND model may remain an interesting academic exercise, but it is almost certainly not reliable enough to justify trillions of dollars' worth of additional economic regulations with which to burden the economy. Abstract Th

    Applications of Improvements to the Pythagorean Won-Loss Expectation in Optimizing Rosters

    Full text link
    Bill James' Pythagorean formula has for decades done an excellent job estimating a baseball team's winning percentage from very little data: if the average runs scored and allowed are denoted respectively by RS{\rm RS} and RA{\rm RA}, there is some γ\gamma such that the winning percentage is approximately RSγ/(RSγ+RAγ){\rm RS}^\gamma / ({\rm RS}^\gamma + {\rm RA}^\gamma). One important consequence is to determine the value of different players to the team, as it allows us to estimate how many more wins we would have given a fixed increase in run production. We summarize earlier work on the subject, and extend the earlier theoretical model of Miller (who estimated the run distributions as arising from independent Weibull distributions with the same shape parameter; this has been observed to describe the observed run data well). We now model runs scored and allowed as being drawn from independent Weibull distributions where the shape parameter is not necessarily the same, and then use the Method of Moments to solve a system of four equations in four unknowns. Doing so yields a predicted winning percentage that is consistently better than earlier models over the last 30 MLB seasons (1994 to 2023). This comes at a small cost as we no longer have a closed form expression but must evaluate a two-dimensional integral of two Weibull distributions and numerically estimate the solutions to the system of equations; as these are trivial to do with simple computational programs it is well worth adopting this framework and avoiding the issues of implementing the Method of Least Squares or the Method of Maximum Likelihood

    Closed-Form Bayesian Inferences for the Logit Model via Polynomial Expansions

    Full text link
    Articles in Marketing and choice literatures have demonstrated the need for incorporating person-level heterogeneity into behavioral models (e.g., logit models for multiple binary outcomes as studied here). However, the logit likelihood extended with a population distribution of heterogeneity doesn't yield closed-form inferences, and therefore numerical integration techniques are relied upon (e.g., MCMC methods). We present here an alternative, closed-form Bayesian inferences for the logit model, which we obtain by approximating the logit likelihood via a polynomial expansion, and then positing a distribution of heterogeneity from a flexible family that is now conjugate and integrable. For problems where the response coefficients are independent, choosing the Gamma distribution leads to rapidly convergent closed-form expansions; if there are correlations among the coefficients one can still obtain rapidly convergent closed-form expansions by positing a distribution of heterogeneity from a Multivariate Gamma distribution. The solution then comes from the moment generating function of the Multivariate Gamma distribution or in general from the multivariate heterogeneity distribution assumed. Closed-form Bayesian inferences, derivatives (useful for elasticity calculations), population distribution parameter estimates (useful for summarization) and starting values (useful for complicated algorithms) are hence directly available. Two simulation studies demonstrate the efficacy of our approach.Comment: 30 pages, 2 figures, corrected some typos. Appears in Quantitative Marketing and Economics vol 4 (2006), no. 2, 173--20

    Closed Form Bayesian Inferences for Binary Logistic Regression with Applications to American Voter Turnout

    No full text
    Understanding the factors that influence voter turnout is a fundamentally important question in public policy and political science research. Bayesian logistic regression models are useful for incorporating individual level heterogeneity to answer these and many other questions. When these questions involve incorporating individual level heterogeneity for large data sets that include many demographic and ethnic subgroups, however, standard Markov Chain Monte Carlo (MCMC) sampling methods to estimate such models can be quite slow and impractical to perform in a reasonable amount of time. We present an innovative closed form Empirical Bayesian approach that is significantly faster than MCMC methods, thus enabling the estimation of voter turnout models that had previously been considered computationally infeasible. Our results shed light on factors impacting voter turnout data in the 2000, 2004, and 2008 presidential elections. We conclude with a discussion of these factors and the associated policy implications. We emphasize, however, that although our application is to the social sciences, our approach is fully generalizable to the myriads of other fields involving statistical models with binary dependent variables and high-dimensional parameter spaces as well

    Closed Form Bayesian Inferences for Binary Logistic Regression with Applications to American Voter Turnout

    No full text
    Understanding the factors that influence voter turnout is a fundamentally important question in public policy and political science research. Bayesian logistic regression models are useful for incorporating individual level heterogeneity to answer these and many other questions. When these questions involve incorporating individual level heterogeneity for large data sets that include many demographic and ethnic subgroups, however, standard Markov Chain Monte Carlo (MCMC) sampling methods to estimate such models can be quite slow and impractical to perform in a reasonable amount of time. We present an innovative closed form Empirical Bayesian approach that is significantly faster than MCMC methods, thus enabling the estimation of voter turnout models that had previously been considered computationally infeasible. Our results shed light on factors impacting voter turnout data in the 2000, 2004, and 2008 presidential elections. We conclude with a discussion of these factors and the associated policy implications. We emphasize, however, that although our application is to the social sciences, our approach is fully generalizable to the myriads of other fields involving statistical models with binary dependent variables and high-dimensional parameter spaces as well
    corecore