2,275,815 research outputs found
Abandon Statistical Significance
We discuss problems the null hypothesis significance testing (NHST) paradigm
poses for replication and more broadly in the biomedical and social sciences as
well as how these problems remain unresolved by proposals involving modified
p-value thresholds, confidence intervals, and Bayes factors. We then discuss
our own proposal, which is to abandon statistical significance. We recommend
dropping the NHST paradigm--and the p-value thresholds intrinsic to it--as the
default statistical paradigm for research, publication, and discovery in the
biomedical and social sciences. Specifically, we propose that the p-value be
demoted from its threshold screening role and instead, treated continuously, be
considered along with currently subordinate factors (e.g., related prior
evidence, plausibility of mechanism, study design and data quality, real world
costs and benefits, novelty of finding, and other factors that vary by research
domain) as just one among many pieces of evidence. We have no desire to "ban"
p-values or other purely statistical measures. Rather, we believe that such
measures should not be thresholded and that, thresholded or not, they should
not take priority over the currently subordinate factors. We also argue that it
seldom makes sense to calibrate evidence as a function of p-values or other
purely statistical measures. We offer recommendations for how our proposal can
be implemented in the scientific publication process as well as in statistical
decision making more broadly
Recommended from our members
Self-play: statistical significance
Heinz recently completed a comprehensive experiment in self-play using the FRITZ chess engine to establish the âdecreasing returnsâ hypothesis with specific levels of statistical confidence. This note revisits the results and recalculates the confidence levels of this and other hypotheses. These appear to be better than Heinzâ initial analysis suggests
Assessing statistical significance of periodogram peaks
The least-squares (or Lomb-Scargle) periodogram is a powerful tool which is
used routinely in many branches of astronomy to search for periodicities in
observational data. The problem of assessing statistical significance of
candidate periodicities for different periodograms is considered. Based on
results in extreme value theory, improved analytic estimations of false alarm
probabilities are given. They include an upper limit to the false alarm
probability (or a lower limit to the significance). These estimations are
tested numerically in order to establish regions of their practical
applicability.Comment: 7 pages, 6 figures, 1 table; To be published in MNRA
Statistical significance of communities in networks
Nodes in real-world networks are usually organized in local modules. These
groups, called communities, are intuitively defined as sub-graphs with a larger
density of internal connections than of external links. In this work, we
introduce a new measure aimed at quantifying the statistical significance of
single communities. Extreme and Order Statistics are used to predict the
statistics associated with individual clusters in random graphs. These
distributions allows us to define one community significance as the probability
that a generic clustering algorithm finds such a group in a random graph. The
method is successfully applied in the case of real-world networks for the
evaluation of the significance of their communities.Comment: 9 pages, 8 figures, 2 tables. The software to calculate the C-score
can be found at http://filrad.homelinux.org/cscor
Statistical Significance of the Netflix Challenge
Inspired by the legacy of the Netflix contest, we provide an overview of what
has been learned---from our own efforts, and those of others---concerning the
problems of collaborative filtering and recommender systems. The data set
consists of about 100 million movie ratings (from 1 to 5 stars) involving some
480 thousand users and some 18 thousand movies; the associated ratings matrix
is about 99% sparse. The goal is to predict ratings that users will give to
movies; systems which can do this accurately have significant commercial
applications, particularly on the world wide web. We discuss, in some detail,
approaches to "baseline" modeling, singular value decomposition (SVD), as well
as kNN (nearest neighbor) and neural network models; temporal effects,
cross-validation issues, ensemble methods and other considerations are
discussed as well. We compare existing models in a search for new models, and
also discuss the mission-critical issues of penalization and parameter
shrinkage which arise when the dimensions of a parameter space reaches into the
millions. Although much work on such problems has been carried out by the
computer science and machine learning communities, our goal here is to address
a statistical audience, and to provide a primarily statistical treatment of the
lessons that have been learned from this remarkable set of data.Comment: Published in at http://dx.doi.org/10.1214/11-STS368 the Statistical
Science (http://www.imstat.org/sts/) by the Institute of Mathematical
Statistics (http://www.imstat.org
The Cult of Statistical Significance
This article takes issue with a recent book by Ziliak and McCloskey (2008) of the same title. Ziliak and McCloskey argue that statistical significance testing is a barrier rather than a booster for empirical research in many fields and should therefore be abandoned altogether. The present article argues that this is good advice in some research areas but not in others. Taking all issues which have appeared so far of the German Economic Review and a recent epidemiological meta-analysis as examples, it shows that there has indeed been a lot of misleading work in the context of significance testing, and that at the same time many promising avenues for fruitfully employing statistical significance tests, disregarded by Ziliak and McCloskey, have not been used.significance testing
Statistical significance of variables driving systematic variation
There are a number of well-established methods such as principal components
analysis (PCA) for automatically capturing systematic variation due to latent
variables in large-scale genomic data. PCA and related methods may directly
provide a quantitative characterization of a complex biological variable that
is otherwise difficult to precisely define or model. An unsolved problem in
this context is how to systematically identify the genomic variables that are
drivers of systematic variation captured by PCA. Principal components (and
other estimates of systematic variation) are directly constructed from the
genomic variables themselves, making measures of statistical significance
artificially inflated when using conventional methods due to over-fitting. We
introduce a new approach called the jackstraw that allows one to accurately
identify genomic variables that are statistically significantly associated with
any subset or linear combination of principal components (PCs). The proposed
method can greatly simplify complex significance testing problems encountered
in genomics and can be utilized to identify the genomic variables significantly
associated with latent variables. Using simulation, we demonstrate that our
method attains accurate measures of statistical significance over a range of
relevant scenarios. We consider yeast cell-cycle gene expression data, and show
that the proposed method can be used to straightforwardly identify
statistically significant genes that are cell-cycle regulated. We also analyze
gene expression data from post-trauma patients, allowing the gene expression
data to provide a molecularly-driven phenotype. We find a greater enrichment
for inflammatory-related gene sets compared to using a clinically defined
phenotype. The proposed method provides a useful bridge between large-scale
quantifications of systematic variation and gene-level significance analyses.Comment: 35 pages, 1 table, 6 main figures, 7 supplementary figure
- âŠ