80 research outputs found

    Untenable nonstationarity: An assessment of the fitness for purpose of trend tests in hydrology

    Get PDF
    The detection and attribution of long-term patterns in hydrological time series have been important research topics for decades. A significant portion of the literature regards such patterns as ‘deterministic components’ or ‘trends’ even though the complexity of hydrological systems does not allow easy deterministic explanations and attributions. Consequently, trend estimation techniques have been developed to make and justify statements about tendencies in the historical data, which are often used to predict future events. Testing trend hypothesis on observed time series is widespread in the hydro-meteorological literature mainly due to the interest in detecting consequences of human activities on the hydrological cycle. This analysis usually relies on the application of some null hypothesis significance tests (NHSTs) for slowly-varying and/or abrupt changes, such as Mann-Kendall, Pettitt, or similar, to summary statistics of hydrological time series (e.g., annual averages, maxima, minima, etc.). However, the reliability of this application has seldom been explored in detail. This paper discusses misuse, misinterpretation, and logical flaws of NHST for trends in the analysis of hydrological data from three different points of view: historic-logical, semantic-epistemological, and practical. Based on a review of NHST rationale, and basic statistical definitions of stationarity, nonstationarity, and ergodicity, we show that even if the empirical estimation of trends in hydrological time series is always feasible from a numerical point of view, it is uninformative and does not allow the inference of nonstationarity without assuming a priori additional information on the underlying stochastic process, according to deductive reasoning. This prevents the use of trend NHST outcomes to support nonstationary frequency analysis and modeling. We also show that the correlation structures characterizing hydrological time series might easily be underestimated, further compromising the attempt to draw conclusions about trends spanning the period of records. Moreover, even though adjusting procedures accounting for correlation have been developed, some of them are insufficient or are applied only to some tests, while some others are theoretically flawed but still widely applied. In particular, using 250 unimpacted stream flow time series across the conterminous United States (CONUS), we show that the test results can dramatically change if the sequences of annual values are reproduced starting from daily stream flow records, whose larger sizes enable a more reliable assessment of the correlation structures

    Caveats for using statistical significance tests in research assessments

    Full text link
    This paper raises concerns about the advantages of using statistical significance tests in research assessments as has recently been suggested in the debate about proper normalization procedures for citation indicators. Statistical significance tests are highly controversial and numerous criticisms have been leveled against their use. Based on examples from articles by proponents of the use statistical significance tests in research assessments, we address some of the numerous problems with such tests. The issues specifically discussed are the ritual practice of such tests, their dichotomous application in decision making, the difference between statistical and substantive significance, the implausibility of most null hypotheses, the crucial assumption of randomness, as well as the utility of standard errors and confidence intervals for inferential purposes. We argue that applying statistical significance tests and mechanically adhering to their results is highly problematic and detrimental to critical thinking. We claim that the use of such tests do not provide any advantages in relation to citation indicators, interpretations of them, or the decision making processes based upon them. On the contrary their use may be harmful. Like many other critics, we generally believe that statistical significance tests are over- and misused in the social sciences including scientometrics and we encourage a reform on these matters.Comment: Accepted version for Journal of Informetric

    Bayesian Hypothesis Testing: An Alternative to Null Hypothesis Significance Testing (NHST) in Psychology and Social Sciences

    Get PDF
    Since the mid-1950s, there has been a clear predominance of the Frequentist approach to hypothesis testing, both in psychology and in social sciences. Despite its popularity in the field of statistics, Bayesian inference is barely known and used in psychology. Frequentist inference, and its null hypothesis significance testing (NHST), has been hegemonic through most of the history of scientific psychology. However, the NHST has not been exempt of criticisms. Therefore, the aim of this chapter is to introduce a Bayesian approach to hypothesis testing that may represent a useful complement, or even an alternative, to the current NHST. The advantages of this Bayesian approach over Frequentist NHST will be presented, providing examples that support its use in psychology and social sciences. Conclusions are outlined

    Methodological Differences Between Psychological Fields and its Impact on Questionable Research Practices

    Get PDF
    A recent development in research fields, including psychology, is that several studies have called into question the replicability of findings that were thought to be well-established. This phenomenon, termed the replication crisis in psychology, is gaining acceptance as a legitimate concern. This paper explores the quality of research from three prominent psychology journals: The Journal of Experimental Psychology: General, the Journal of Personality and Social Psychology and the Journal of Abnormal Psychology, across the years 1995, 2005 and 2015. The quality of research was determined through creating individual p-distributions, similar to the methods of Masicampo & Lalande (2012). This paper uncovered that there was evidence regarding the use of questionable research practices (QRPs) since 1995. Overall, the quality of each journal's research appeared to be increasing as the years progressed

    Trustworthiness of statistical inference

    Get PDF
    We examine the role of trustworthiness and trust in statistical inference, arguing that it is the extent of trustworthiness in inferential statistical tools which enables trust in the conclusions. Certain tools, such as the p‐value and significance test, have recently come under renewed criticism, with some arguing that they damage trust in statistics. We argue the contrary, beginning from the position that the central role of these methods is to form the basis for trusted conclusions in the face of uncertainty in the data, and noting that it is the misuse and misunderstanding of these tools which damages trustworthiness and hence trust. We go on to argue that recent calls to ban these tools would tackle the symptom, not the cause, and themselves risk damaging the capability of science to advance, as well as risking feeding into public suspicion of the discipline of statistics. The consequence could be aggravated mistrust of our discipline and of science more generally. In short, the very proposals could work in quite the contrary direction from that intended. We make some alternative proposals for tackling the misuse and misunderstanding of these methods, and for how trust in our discipline might be promoted

    A Fuzzy Take on the Logical Issues of Statistical Hypothesis Testing

    Get PDF
    Statistical Hypothesis Testing (SHT) is a class of inference methods whereby one makes use of empirical data to test a hypothesis and often emit a judgment about whether to reject it or not. In this paper, we focus on the logical aspect of this strategy, which is largely independent of the adopted school of thought, at least within the various frequentist approaches. We identify SHT as taking the form of an unsound argument from Modus Tollens in classical logic, and, in order to rescue SHT from this difficulty, we propose that it can instead be grounded in t-norm based fuzzy logics. We reformulate the frequentists’ SHT logic by making use of a fuzzy extension of Modus Tollens to develop a model of truth valuation for its premises. Importantly, we show that it is possible to preserve the soundness of Modus Tollens by exploring the various conventions involved with constructing fuzzy negations and fuzzy implications (namely, the S and R conventions). We find that under the S convention, it is possible to conduct the Modus Tollens inference argument using Zadeh’s compositional extension and any possible t-norm. Under the R convention we find that this is not necessarily the case, but that by mixing R-implication with S-negation we can salvage the product t-norm, for example. In conclusion, we have shown that fuzzy logic is a legitimate framework to discuss and address the difficulties plaguing frequentist interpretations of SHT

    A Tool-Based View of Theories of Evidence

    Get PDF
    Philosophical theories of evidence have been on offer, but they are mostly evaluated in terms of all-or-none desiderata — if they fail to meet one of the desiderata, they are not a satisfactory theory. In this thesis, I aim to accomplish three missions. Firstly, I construct a new way of evaluating theories of evidence, which I call a tool-based view. Secondly, I analyse the nature of what I will call the various relevance-mediating vehicles that each theory of evidence employs. Thirdly, I articulate the comparative core of evidential reasoning in the historical sciences, one which is overlooked in major theories of evidence. On the first mission, I endorse a meta-thesis of pluralism on theories of evidence, namely a tool-based view. I regard a theory of evidence as a purpose-specific and setting-sensitive tool which has its own strengths, difficulties and limitations. Among the major theories of evidence I have reviewed, I focus on Achinstein’s explanationist theory, Cartwright’s argument theory and Reiss’s inferentialist account, scrutinising and evaluating them against the purposes they set out and the scope of their applications. On the second mission, I note that there is no such thing as intrinsically ‘being evidence’. Rather, I hold that relevance-mediating vehicles configure data, materials or claims in such ways that some of them are labelled evidence. I identify the relevance-mediating vehicles that the theories of evidence employ. On the final mission, I argue that the likelihoodist account is an appropriate tool for explaining the evidential reasoning in poorly specified settings where likelihoods can be only imprecisely compared. Such settings, I believe, are typical in the historical sciences. Using the reconstruction of proto-sounds in historical linguistics as a case study, I formalise the rationale behind it by means of the law of likelihood

    How do researchers evaluate statistical evidence when drawing inferences from data?

    Get PDF
    • 

    corecore