140 research outputs found

    Empirical Analysis of Factors Affecting Confirmation Bias Levels of Software Engineers

    Get PDF
    Confirmation bias is defined as the tendency of people to seek evidence that verifies a hypothesis rather than seeking evidence to falsify it. Due to the confirmation bias, defects may be introduced in a software product during requirements analysis, design, implementation and/or testing phases. For instance, testers may exhibit confirmatory behavior in the form of a tendency to make the code run rather than employing a strategic approach to make it fail. As a result, most of the defects that have been introduced in the earlier phases of software development may be overlooked leading to an increase in software defect density. In this paper, we quantify confirmation bias levels in terms of a single derived metric. However, the main focus of this paper is the analysis of factors affecting confirmation bias levels of software engineers. Identification of these factors can guide project managers to circumvent negative effects of confirmation bias, as well as providing guidance for the recruitment and effective allocation of software engineers. In this empirical study, we observed low confirmation bias levels among participants with logical reasoning and hypothesis testing skills

    The Search for Invariance: Repeated Positive Testing Serves the Goals of Causal Learning

    Get PDF
    Positive testing is characteristic of exploratory behavior, yet it seems to be at odds with the aim of information seeking. After all, repeated demonstrations of one’s current hypothesis often produce the same evidence and fail to distinguish it from potential alternatives. Research on the development of scientific reasoning and adult rule learning have both documented and attempted to explain this behavior. The current chapter reviews this prior work and introduces a novel theoretical account—the Search for Invariance (SI) hypothesis—which suggests that producing multiple positive examples serves the goals of causal learning. This hypothesis draws on the interventionist framework of causal reasoning, which suggests that causal learners are concerned with the invariance of candidate hypotheses. In a probabilistic and interdependent causal world, our primary goal is to determine whether, and in what contexts, our causal hypotheses provide accurate foundations for inference and intervention—not to disconfirm their alternatives. By recognizing the central role of invariance in causal learning, the phenomenon of positive testing may be reinterpreted as a rational information-seeking strategy

    Type I error rates of multi-arm multi-stage clinical trials: strong control and impact of intermediate outcomes

    Get PDF
    BACKGROUND: The multi-arm multi-stage (MAMS) design described by Royston et al. [Stat Med. 2003;22(14):2239-56 and Trials. 2011;12:81] can accelerate treatment evaluation by comparing multiple treatments with a control in a single trial and stopping recruitment to arms not showing sufficient promise during the course of the study. To increase efficiency further, interim assessments can be based on an intermediate outcome (I) that is observed earlier than the definitive outcome (D) of the study. Two measures of type I error rate are often of interest in a MAMS trial. Pairwise type I error rate (PWER) is the probability of recommending an ineffective treatment at the end of the study regardless of other experimental arms in the trial. Familywise type I error rate (FWER) is the probability of recommending at least one ineffective treatment and is often of greater interest in a study with more than one experimental arm. METHODS: We demonstrate how to calculate the PWER and FWER when the I and D outcomes in a MAMS design differ. We explore how each measure varies with respect to the underlying treatment effect on I and show how to control the type I error rate under any scenario. We conclude by applying the methods to estimate the maximum type I error rate of an ongoing MAMS study and show how the design might have looked had it controlled the FWER under any scenario. RESULTS: The PWER and FWER converge to their maximum values as the effectiveness of the experimental arms on I increases. We show that both measures can be controlled under any scenario by setting the pairwise significance level in the final stage of the study to the target level. In an example, controlling the FWER is shown to increase considerably the size of the trial although it remains substantially more efficient than evaluating each new treatment in separate trials. CONCLUSIONS: The proposed methods allow the PWER and FWER to be controlled in various MAMS designs, potentially increasing the uptake of the MAMS design in practice. The methods are also applicable in cases where the I and D outcomes are identical

    Decision-Making in Research Tasks with Sequential Testing

    Get PDF
    Background: In a recent controversial essay, published by JPA Ioannidis in PLoS Medicine, it has been argued that in some research fields, most of the published findings are false. Based on theoretical reasoning it can be shown that small effect sizes, error-prone tests, low priors of the tested hypotheses and biases in the evaluation and publication of research findings increase the fraction of false positives. These findings raise concerns about the reliability of research. However, they are based on a very simple scenario of scientific research, where single tests are used to evaluate independent hypotheses. Methodology/Principal Findings: In this study, we present computer simulations and experimental approaches for analyzing more realistic scenarios. In these scenarios, research tasks are solved sequentially, i.e. subsequent tests can be chosen depending on previous results. We investigate simple sequential testing and scenarios where only a selected subset of results can be published and used for future rounds of test choice. Results from computer simulations indicate that for the tasks analyzed in this study, the fraction of false among the positive findings declines over several rounds of testing if the most informative tests are performed. Our experiments show that human subjects frequently perform the most informative tests, leading to a decline of false positives as expected from the simulations. Conclusions/Significance: For the research tasks studied here, findings tend to become more reliable over time. We also find that the performance in those experimental settings where not all performed tests could be published turned out to be surprisingly inefficient. Our results may help optimize existing procedures used in the practice of scientific research and provide guidance for the development of novel forms of scholarly communication.Engineering and Applied SciencesPsycholog

    Rationality and the experimental study of reasoning

    Get PDF
    A survey of the results obtained during the past three decades in some of the most widely used tasks and paradigms in the experimental study of reasoning is presented. It is shown that, at first sight, human performance suffers from serious shortcomings. However, after the problems of communication between experimenter and subject are taken into account, which leads to clarify the subject's representation of the tasks, one observes a better performance, although still far from perfect. Current theories of reasoning, of which the two most prominent are very briefly outlined, agree in identifying the load in working memory as the main source of limitation in performance. Finally, a recent view on human rationality prompted by the foregoing results is described

    Bayesian inference for the information gain model

    Get PDF
    One of the most popular paradigms to use for studying human reasoning involves the Wason card selection task. In this task, the participant is presented with four cards and a conditional rule (e.g., “If there is an A on one side of the card, there is always a 2 on the other side”). Participants are asked which cards should be turned to verify whether or not the rule holds. In this simple task, participants consistently provide answers that are incorrect according to formal logic. To account for these errors, several models have been proposed, one of the most prominent being the information gain model (Oaksford & Chater, Psychological Review, 101, 608–631, 1994). This model is based on the assumption that people independently select cards based on the expected information gain of turning a particular card. In this article, we present two estimation methods to fit the information gain model: a maximum likelihood procedure (programmed in R) and a Bayesian procedure (programmed in WinBUGS). We compare the two procedures and illustrate the flexibility of the Bayesian hierarchical procedure by applying it to data from a meta-analysis of the Wason task (Oaksford & Chater, Psychological Review, 101, 608–631, 1994). We also show that the goodness of fit of the information gain model can be assessed by inspecting the posterior predictives of the model. These Bayesian procedures make it easy to apply the information gain model to empirical data. Supplemental materials may be downloaded along with this article from www.springerlink.com

    Of Black Swans and Tossed Coins: Is the Description-Experience Gap in Risky Choice Limited to Rare Events?

    Get PDF
    When faced with risky decisions, people tend to be risk averse for gains and risk seeking for losses (the reflection effect). Studies examining this risk-sensitive decision making, however, typically ask people directly what they would do in hypothetical choice scenarios. A recent flurry of studies has shown that when these risky decisions include rare outcomes, people make different choices for explicitly described probabilities than for experienced probabilistic outcomes. Specifically, rare outcomes are overweighted when described and underweighted when experienced. In two experiments, we examined risk-sensitive decision making when the risky option had two equally probable (50%) outcomes. For experience-based decisions, there was a reversal of the reflection effect with greater risk seeking for gains than for losses, as compared to description-based decisions. This fundamental difference in experienced and described choices cannot be explained by the weighting of rare events and suggests a separate subjective utility curve for experience

    A short educational intervention diminishes causal illusions and specific paranormal beliefs in undergraduates

    Get PDF
    Cognitive biases such as causal illusions have been related to paranormal and pseudoscientific beliefs and, thus, pose a real threat to the development of adequate critical thinking abilities. We aimed to reduce causal illusions in undergraduates by means of an educational intervention combining training-in-bias and training-in-rules techniques. First, participants directly experienced situations that tend to induce the Barnum effect and the confirmation bias. Thereafter, these effects were explained and examples of their influence over everyday life were provided. Compared to a control group, participants who received the intervention showed diminished causal illusions in a contingency learning task and a decrease in the precognition dimension of a paranormal belief scale. Overall, results suggest that evidence-based educational interventions like the one presented here could be used to significantly improve critical thinking skills in our students

    Chess databases as a research vehicle in psychology : modeling large data

    Get PDF
    The game of chess has often been used for psychological investigations, particularly in cognitive science. The clear-cut rules and well-defined environment of chess provide a model for investigations of basic cognitive processes, such as perception, memory, and problem solving, while the precise rating system for the measurement of skill has enabled investigations of individual differences and expertise-related effects. In the present study, we focus on another appealing feature of chess—namely, the large archive databases associated with the game. The German national chess database presented in this study represents a fruitful ground for the investigation of multiple longitudinal research questions, since it collects the data of over 130,000 players and spans over 25 years. The German chess database collects the data of all players, including hobby players, and all tournaments played. This results in a rich and complete collection of the skill, age, and activity of the whole population of chess players in Germany. The database therefore complements the commonly used expertise approach in cognitive science by opening up new possibilities for the investigation of multiple factors that underlie expertise and skill acquisition. Since large datasets are not common in psychology, their introduction also raises the question of optimal and efficient statistical analysis. We offer the database for download and illustrate how it can be used by providing concrete examples and a step-by-step tutorial using different statistical analyses on a range of topics, including skill development over the lifetime, birth cohort effects, effects of activity and inactivity on skill, and gender differences
    corecore