Article thumbnail
Location of Repository

To p, or not to p?: quantifying inferential decision errors\ud to assess whether significance truly is significant

By James Spencer Abdey

Abstract

Empirical testing is centred on p-values. These summary statistics are used to assess the plausibility of a null hypothesis, and therein lies a flaw in their interpretation.\ud Central to this research is accounting for the behaviour of p-values, through density functions, under the alternative hypothesis, H1. These densities are determined by a combination of the sample size and parametric specification of H1. Here, several new contributions are presented to reflect p-value behaviour. By considering the likelihood of both hypotheses in parallel, it is possible to optimise the decision-making process. A framework for simultaneously testing the null and alternative hypotheses is outlined for various testing scenarios. To facilitate efficient empirical conclusions, a new set of critical value tables is presented requiring only the conventional p-value, hence avoiding the need for additional computation in order to apply this joint testing in practice. Simple and composite forms of H1\ud are considered. Recognising the conflict between different schools of thought with respect to hypothesis testing, a unified approach at consolidating the advantages of each is\ud offered. Again, exploiting p-value distributions under various forms of H1, a revised conditioning statistic for conditional frequentist testing is developed from which\ud original p-value curves and surfaces are produced to further ease decision making. Finally, attention turns to multiple hypothesis testing. Estimation of multiple testing error rates is discussed and a new estimator for the proportion of true null hypotheses, when simultaneously testing several independent hypotheses, is presented. Under certain conditions it is shown that this estimator is superior to an\ud established estimator

Topics: HA Statistics, QA Mathematics
Year: 2009
OAI identifier: oai:etheses.lse.ac.uk:31
Provided by: LSE Theses Online

Suggested articles

Citations

  1. (2002). A direct approach to false discovery rates.
  2. (1948). A k-sample slippage test for an extreme population.
  3. (1998). A new approach to the problem of multiple comparisons in the genetic dissection of complex traits.
  4. (1999). A note on information seldom reported via the P value.
  5. (1988). A sharper Bonferroni procedure for multiple tests of significance.
  6. (1979). A simple sequentially rejective multiple test procedure.
  7. (1988). A stagewise rejective multiple test procedure based on a modified Bonferroni test.
  8. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing. doi
  9. (2005). Alphas and asterisks: The development of statistical significance testing standards in sociology.
  10. (1970). Appropriate post hoc comparisons for interaction and nested hypotheses in analysis of variance designs: The elimination of Type-IV errors.
  11. (1982). Asymptotic lognormality of p-values.
  12. (1986). Asymptotic Methods in Statistical Decision Theory.
  13. (1980). Bayes factors and choice criteria for linear models.
  14. (1963). Bayesian statistical inference for psychological research.
  15. (1970). Beyond a reasonable doubt: An experimental attempt at quantification.
  16. (2001). Calibration of p-values for testing precise null hypotheses.
  17. (1983). Clinical trials and statistical verdicts: probable grounds for appeal.
  18. (1985). Combining and comparing significance levels from nonindeBIBLIOGRAPHY 156 pendent hypothesis tests.
  19. (1977). Conditional confidence statements and confidence estimators (with discussion).
  20. (1986). Confidence intervals rather than P values; estimating rather than hypothesis testing.
  21. (1962). Contribution to the study of subjective probability. Part i. doi
  22. (1995). Controlling the False Discovery Rate: A practical and powerful approach to multiple testing.
  23. (2003). Could Fisher, Jeffreys and Neyman have agreed on testing (with discussion)? doi
  24. (1968). Criteria for selecting a significance level: A note on the sacredness of .05.
  25. (1968). Decision Analysis: Introductory Lectures on Choices Under Uncertainty.
  26. (1999). Default Bayes factors for non-nested hypothesis testing.
  27. (1976). Design and analysis of randomized clinical trials requiring prolonged observation of each patient, I: Introduction and design.
  28. (1977). Determining an optimal level of statistical significance.
  29. (1996). Directional decisions for two-tailed tests: Power, error rates, and sample size.
  30. (1960). Directional statistical decisions.
  31. (2006). Distributions of interest for quantifying reasonable doubt and their applications. VERUM Working Paper,
  32. (1973). Doing what comes naturally: Interpreting a tail area as a posterior probability or as a likelihood ratio.
  33. (1987). Engineering anomalies research.
  34. (1992). Estimation of accuracy in testing.
  35. (1989). Exposure and affect: Overview and meta-analysis of research.
  36. (1998). First (?) occurrence of common terms in probability and statistics - A second list, with corrections.
  37. (1992). Fisher and the fiducial argument. doi
  38. (1985). Legal vs. quantified definitions of standards of proof.
  39. (1967). Mathematical statistics: A decision-theoretic approach.
  40. (1985). Mind your p’s and alphas.
  41. (1995). Multiple hypothesis testing.
  42. (1984). New Cambridge Statistical Tables.
  43. (2000). Null hypothesis testing: Problems, prevalence, and an alternative. doi
  44. (1976). Of what use are tests of significance and tests of hypotheses.
  45. (1976). On closed testing procedures with special references to ordered analysis of variance.
  46. (1976). On rereading
  47. (1987). On statistical testing.
  48. (1962). On the foundations of statistical inference.
  49. (1996). On the generalization of the likelihood function and the likelihood principle.
  50. (2001). On the past and future of null hypothesis significance testing.
  51. (2002). Operating characteristics and extensions of the false discovery rate procedure.
  52. (1996). P values. What they are and what they are not.
  53. (1975). P-values: Interpretation and methodology.
  54. (1982). Plots of P-values to evaluate many tests simultaneously.
  55. (1984). Posterior odds ratios for regression hypotheses. doi
  56. (1971). Quantifying burdens of proof: A view from the bench, the jury, and the classroom.
  57. (1987). Reconciling Bayesian and Frequentist evidence in the one-sided testing problem (with discussion).
  58. (1960). Resampling-based Multiple Testing: Examples and Methods for P-value Adjustment.
  59. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In
  60. (1973). Scientific reporting.
  61. (1966). Sequential trials, sequential analysis and the likelihood principle.
  62. (1960). Simultaneous comparison of the optimum and sign tests of a normal mean.
  63. (1969). Simultaneous test procedures - some theory of multiple comparisons.
  64. (1991). Small-sample accuracy of approximate distributions of functions of observed probabilities from t tests. doi
  65. (1938). Some difficulties of interpretation encountered in the application of the chi-square test.
  66. (1980). Some general points in probability theory. In A. Zellner (Ed.), Bayesian analysis in econometrics and statistics,
  67. (2002). Some results on false discovery rate in stepwise multiple testing procedures.
  68. (1943). Statistical adjustment of data.
  69. (1988). Statistical analysis and the illusion of objectivity. doi
  70. (1978). Statistical dogma and the logic of statistical testing.
  71. (2003). Statistical significance for genomewide studies.
  72. (1996). Statistical significance testing and cumulative knowledge in psychology: Implications for training of researchers.
  73. (1986). Statistical significance: balancing evidence against doubt.
  74. (2004). Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: A unified approach.
  75. (1987). Testing a point null hypothesis: The irreconcilability of p-values and evidence.
  76. (1987). Testing precise hypotheses.
  77. (1995). Testing simple hypotheses. In
  78. (1986). Testing statistical hypotheses (2nd ed.).
  79. (2005). Testing Statistical Hypotheses.
  80. (1954). Testing the approximate validity of statistical hypotheses.
  81. (1942). Tests of significance considered as evidence.
  82. (1997). The behavior of the p-value when the alternative hypothesis is true.
  83. (1978). The case against statistical significance testing.
  84. (2001). The control of the false discovery rate in multiple testing under dependency.
  85. (1994). The earth is round (p<.05).
  86. (1986). The effect of sample size in the meaning of significance tests.
  87. (1993). The Fisher, Neyman-Pearson theories of testing hypotheses: one theory or two?
  88. (1977). The ideas of conditional confidence in the simplest setting.
  89. (2004). The interplay of Bayesian and Frequentist analysis. doi
  90. (1988). The Likelihood Principle.
  91. (1976). The logic of tests of significance (with discussion).
  92. (2003). The positive false discovery rate: A Bayesian interpretation and the q-value.
  93. (1938). The probability integral transformation for testing goodness of fit and combining independent tests of significance.
  94. (1977). The role of significance tests.
  95. (1995). The roles of conditioning in inference.
  96. (1983). The significance of statistical significance tests in marketing research.
  97. (1961). Theory of Probability. London:
  98. (1995). Theory of Statistics.
  99. (1999). Toward evidence-based medical statistics 1: the p-value fallacy.
  100. (1999). Toward evidence-based medical statistics 2: the Bayes factor.
  101. (1971). Trial by mathematics: Precision and ritual in the legal process.
  102. (2000). Using the false discovery rate approach in the genetic dissection of complex traits: A response to Weller et al.
  103. (1980). Vote-counting methods in research synthesis.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.