39 research outputs found

    Caveats for using statistical significance tests in research assessments

    Full text link
    This paper raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators. Statistical significance tests are highly controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with such tests. The issues specifically discussed are the ritual practice of such tests, their dichotomous application in decision making, the difference between statistical and substantive significance, the implausibility of most null hypotheses, the crucial assumption of randomness, as well as the utility of standard errors and confidence intervals for inferential purposes. We argue that applying statistical significance tests and mechanically adhering to their results is highly problematic and detrimental to critical thinking. We claim that the use of such tests do not provide any advantages in relation to citation indicators, interpretations of them, or the decision making processes based upon them. On the contrary their use may be harmful. Like many other critics, we generally believe that statistical significance tests are over- and misused in the social sciences including scientometrics and we encourage a reform on these matters.Comment: Accepted version for Journal of Informetric

    Untenable nonstationarity: An assessment of the fitness for purpose of trend tests in hydrology

    Get PDF
    The detection and attribution of long-term patterns in hydrological time series have been important research topics for decades. A significant portion of the literature regards such patterns as ‘deterministic components’ or ‘trends’ even though the complexity of hydrological systems does not allow easy deterministic explanations and attributions. Consequently, trend estimation techniques have been developed to make and justify statements about tendencies in the historical data, which are often used to predict future events. Testing trend hypothesis on observed time series is widespread in the hydro-meteorological literature mainly due to the interest in detecting consequences of human activities on the hydrological cycle. This analysis usually relies on the application of some null hypothesis significance tests (NHSTs) for slowly-varying and/or abrupt changes, such as Mann-Kendall, Pettitt, or similar, to summary statistics of hydrological time series (e.g., annual averages, maxima, minima, etc.). However, the reliability of this application has seldom been explored in detail. This paper discusses misuse, misinterpretation, and logical flaws of NHST for trends in the analysis of hydrological data from three different points of view: historic-logical, semantic-epistemological, and practical. Based on a review of NHST rationale, and basic statistical definitions of stationarity, nonstationarity, and ergodicity, we show that even if the empirical estimation of trends in hydrological time series is always feasible from a numerical point of view, it is uninformative and does not allow the inference of nonstationarity without assuming a priori additional information on the underlying stochastic process, according to deductive reasoning. This prevents the use of trend NHST outcomes to support nonstationary frequency analysis and modeling. We also show that the correlation structures characterizing hydrological time series might easily be underestimated, further compromising the attempt to draw conclusions about trends spanning the period of records. Moreover, even though adjusting procedures accounting for correlation have been developed, some of them are insufficient or are applied only to some tests, while some others are theoretically flawed but still widely applied. In particular, using 250 unimpacted stream flow time series across the conterminous United States (CONUS), we show that the test results can dramatically change if the sequences of annual values are reproduced starting from daily stream flow records, whose larger sizes enable a more reliable assessment of the correlation structures

    New Guidelines for Null Hypothesis Significance Testing in Hypothetico-Deductive IS Research

    Get PDF
    The objective of this research perspectives article is to promote policy change among journals, scholars, and students with a vested interest in hypothetico-deductive information systems (IS) research. We are concerned about the design, analysis, reporting, and reviewing of quantitative IS studies that draw on null hypothesis significance testing (NHST). We observe that although debates about misinterpretations, abuse, and issues with NHST have persisted for about half a century, they remain largely absent in IS. We find this to be an untenable position for a discipline with a proud quantitative tradition. We discuss traditional and emergent threats associated with the application of NHST and examine how they manifest in recent IS scholarship. To encourage the development of new standards for NHST in hypothetico-deductive IS research, we develop a balanced account of possible actions that are implementable in the short-term or long-term and that incentivize or penalize specific practices. To promote an immediate push for change, we also develop two sets of guidelines that IS scholars can adopt immediately

    The Utility and Feasibility of Metric Calibration for Basic Psychological Research

    Get PDF
    Inspired by the history of the development of instruments in the physical sciences, and by past psychology giants, the following dissertation aimed to advance basic psychological science by investigating the metric calibration of psychological instruments. The over-arching goal of the dissertation was to demonstrate that it is both useful and feasible to calibrate the metric of psychological instruments so as to render their metrics non-arbitrary. Concerning utility, a conceptual analysis was executed delineating four categories of proposed benefits of non-arbitrary metrics including (a) help in the interpretation of data, (b) facilitation of construct validity research, (c) contribution to theory development, and (d) facilitation of general accumulation of knowledge. With respect to feasibility, the metric calibration approach was successfully applied to instruments of seven distinct constructs commonly studied in psychology, across three empirical demonstration studies and re-analyses of other researchers’ data. Extending past research, metric calibration was achieved in these empirical demonstration studies by finding empirical linkages between scores of the measures and specifically configured theoretically-relevant behaviors argued to reflect particular locations (i.e., ranges) of the relevant underlying psychological dimension. More generally, such configured behaviors can serve as common reference points to calibrate the scores of different instruments, rendering the metric of those instruments non-arbitrary.Study 1 showed a meaningful metric mapping between scores of a frequently used instrument to measure need for cognition and probability of choosing to complete a cognitively effortful over a cognitively simpler task. Study 1 also found an interesting metric linkage between scores of a practically useful self-report measure of task persistence and actual persistence in an anagram persistence task. Study 2, set in the context of the debate of pan-cultural self-enhancement, found theoretically interesting metric mappings between a trait rating measure of self-enhancement often used in the debate and a specifically configured behavioral measure of self-enhancement (i.e., over-claiming of knowledge). Study 3 demonstrated the metric calibration approach for popular behavioral measures of risk-taking often used in experimental studies and found meaningful metric linkages to risky gambles in binary lottery choices involving the possibility of winning real money. Re-analyses of relevant datasets shared by other researchers also revealed meaningful metric mappings for instruments assessing extraversion, conscientiousness, and self-control. Gregariousness facet scores were empirically linked to number of social parties attended per month, Dutifulness facet scores (conscientiousness) were connected to maximum driving speed, and trait self-control scores were calibrated to GPA. In addition, to further demonstrate the utility of non-arbitrary metrics for basic psychological research, some of my preliminary metric calibration findings were applied to actual research findings from the literature. Limitations and obstacles of metric calibration and promising future directions are also discussed

    Improving statistical practice in psychological research: Sequential tests of composite hypotheses

    Get PDF
    Statistical hypothesis testing is an integral part of the scientific process. When employed to make decisions about hypotheses, it is important that statistical tests control the probabilities of decision errors. Conventional procedures that allow for error-probability control have limitations, however: They often require extremely large sample sizes, are bound to tests of point hypotheses, and typically require explicit assumptions about unknown nuisance parameters. As a consequence, the issue of proper error-probability control has frequently been neglected in statistical practice, resulting in a widespread reliance on questionable statistical rituals. In this thesis, I promote an alternative statistical procedure: the sequential probability ratio test (SPRT). In three articles, I implement, further develop, and examine three extensions of the SPRT to common hypothesis-testing situations in psychological research. In the first project, I show that the SPRT substantially reduces required sample sizes while reliably controlling error probabilities in the context of the common t-test situation. In a subsequent project, I seize on the SPRT to develop a simple procedure that allows for statistical decisions with controlled error probabilities in the context of Bayesian t tests. Thus, it allows for tests of distributional hypotheses and combines the advantages of frequentist and Bayesian hypothesis tests. Finally, I apply a procedure for sequential hypothesis tests without explicit assumptions about unknown nuisance parameters to a popular class of stochastic measurement models, namely, multinomial processing tree models. With that, I demonstrate how sequential analysis can improve the applicability of these models in substantive research. The procedures promoted herein do not only extend the SPRT to common hypothesis-testing situations, they also remedy a number of limitations of conventional hypothesis tests. With my dissertation, I aim to make these procedures available to psychologists, thus bridging the gap between the fields of statistical methods and substantive research. Thereby, I hope to contribute to the improvement of statistical practice in psychology and help restore public trust in the reliability of psychological research

    Chicago Man, K-T Man, and the Future of Behavioral Law and Economics

    Get PDF
    Most law is aimed at shaping human behavior, encouraging that which is good for society and discouraging that which is bad.\u27 Nonetheless, for most of the history of our legal system, laws were passed, cases were decided, and academics pontificated about the law based on nothing more than common sense assumptions about how people make decisions. A quarter century or more ago, the law and economics movement replaced these common sense assumptions with a well-considered and expressly stated assumption-that man is a rational maximizer of his expected utilities. Based on this premise, law and economics has dominated interdisciplinary thought in the legal academy for the past thirty years. In the past decade it has become clear, however, that people simply do not make decisions as modeled by traditional law and economics. A mountain of experiments performed in psychology and related disciplines, much of it in the heuristics and biases tradition founded by psychologists Daniel Kahneman and Amos Tversky, demonstrate that people tend to deviate systematically from rational norms when they make decisions

    Towards a New Paradigm for Statistical Evidence

    Get PDF
    Many scientists now widely agree that the current paradigm of statistical significance should be abandoned or largely modified. In response to these calls for change, a Special Issue of Econometrics (MDPI) has been proposed. This book is a collection of the articles that have been published in this Special Issue. These seven articles add new insights to the problem and propose new methods that lay a solid foundation for the new paradigm for statistical significance

    Taking Behavioralism Too Seriously? The Unwarranted Pessimism of the New Behavioral Analysis of Law

    Full text link
    Legal scholars increasingly rely on a behavioral analysis of judgment and decision making to explain legal phenomena and argue for legal reforms. The. main argument of this new behavioral analysis of the law is twofold: (1)All human cognition is beset by systematic flaws in the way that judgments and decisions are made, and theseflaws lead to predictable irrational behaviors and (2) these widespread and systematic nonrational tendencies bring into serious question the assumption of procedural rationality underlying much legal doctrine. This Article examines the psychological research relied on by legal behavioralistst o form this argumenta nd demonstratest hat this research does not support the bleak and simple portrait of pervasivei rrationalityp ainted by these scholars.C areful scrutiny of the psychological research reveals greater adherence to norms of rationality than that implied by the legal behavioralists, and the methodological and interpretive limitations on this psychological research make extrapolation from experimental settings to real world legal settings often inappropriate. Accordingly, this Article argues that legal scholars should exercise greater care and precision in their uses of psychological data to avoid advocating further legal reforms based on flawed understandings of psychological research

    Understanding the phonetics of neutralisation: a variability-field account of vowel/zero alternations in a Hijazi dialect of Arabic

    Get PDF
    This thesis throws new light on issues debated in the experimental literature on neutralisation. They concern the extent of phonetic merger (the completeness question) and the empirical validity of the phonetic effect (the genuineness question). Regarding the completeness question, I present acoustic and perceptual analyses of vowel/zero alternations in Bedouin Hijazi Arabic (BHA) that appear to result in neutralisation. The phonology of these alternations exemplifies two neutralisation scenarios bearing on the completeness question. Until now, these scenarios have been investigated separately within small-scale studies. Here I look more closely at both, testing hypotheses involving the acoustics-perception relation and the phonetics-phonology relation. I then discuss the genuineness question from an experimental and statistical perspective. Experimentally, I devise a paradigm that manipulates important variables claimed to influence the phonetics of neutralisation. Statistically, I reanalyse neutralisation data reported in the literature from Turkish and Polish. I apply different pre-analysis procedures which, I argue, can partly explain the mixed results in the literature. My inquiry into these issues leads me to challenge some of the discipline’s accepted standards for characterising the phonetics of neutralisation. My assessment draws on insights from different research fields including statistics, cognition, neurology, and psychophysics. I suggest alternative measures that are both cognitively and phonetically more plausible. I implement these within a new model of lexical representation and phonetic processing, the Variability Field Model (VFM). According to VFM, phonetic data are examined as jnd-based intervals rather than as single data points. This allows for a deeper understanding of phonetic variability. The model combines prototypical and episodic schemes and integrates linguistic, paralinguistic, and extra-linguistic effects. The thesis also offers a VFM-based analysis of a set of neutralisation data from BHA. In striving for a better understanding of the phonetics of neutralisation, the thesis raises important issues pertaining to the way we approach phonetic questions, generate and analyse data, and interpret and evaluate findings
    corecore