60,352 research outputs found

    Theoretical, Measured and Subjective Responsibility in Aided Decision Making

    Full text link
    When humans interact with intelligent systems, their causal responsibility for outcomes becomes equivocal. We analyze the descriptive abilities of a newly developed responsibility quantification model (ResQu) to predict actual human responsibility and perceptions of responsibility in the interaction with intelligent systems. In two laboratory experiments, participants performed a classification task. They were aided by classification systems with different capabilities. We compared the predicted theoretical responsibility values to the actual measured responsibility participants took on and to their subjective rankings of responsibility. The model predictions were strongly correlated with both measured and subjective responsibility. A bias existed only when participants with poor classification capabilities relied less-than-optimally on a system that had superior classification capabilities and assumed higher-than-optimal responsibility. The study implies that when humans interact with advanced intelligent systems, with capabilities that greatly exceed their own, their comparative causal responsibility will be small, even if formally the human is assigned major roles. Simply putting a human into the loop does not assure that the human will meaningfully contribute to the outcomes. The results demonstrate the descriptive value of the ResQu model to predict behavior and perceptions of responsibility by considering the characteristics of the human, the intelligent system, the environment and some systematic behavioral biases. The ResQu model is a new quantitative method that can be used in system design and can guide policy and legal decisions regarding human responsibility in events involving intelligent systems

    Unbounded Human Learning: Optimal Scheduling for Spaced Repetition

    Full text link
    In the study of human learning, there is broad evidence that our ability to retain information improves with repeated exposure and decays with delay since last exposure. This plays a crucial role in the design of educational software, leading to a trade-off between teaching new material and reviewing what has already been taught. A common way to balance this trade-off is spaced repetition, which uses periodic review of content to improve long-term retention. Though spaced repetition is widely used in practice, e.g., in electronic flashcard software, there is little formal understanding of the design of these systems. Our paper addresses this gap in three ways. First, we mine log data from spaced repetition software to establish the functional dependence of retention on reinforcement and delay. Second, we use this memory model to develop a stochastic model for spaced repetition systems. We propose a queueing network model of the Leitner system for reviewing flashcards, along with a heuristic approximation that admits a tractable optimization problem for review scheduling. Finally, we empirically evaluate our queueing model through a Mechanical Turk experiment, verifying a key qualitative prediction of our model: the existence of a sharp phase transition in learning outcomes upon increasing the rate of new item introductions.Comment: Accepted to the ACM SIGKDD Conference on Knowledge Discovery and Data Mining 201

    How much control is enough? Optimizing fun with unreliable input

    Get PDF
    Brain-computer interfaces (BCI) provide a valuable new input modality within human- computer interaction systems, but like other body-based inputs, the system recognition of input commands is far from perfect. This raises important questions, such as: What level of control should such an interface be able to provide? What is the relationship between actual and perceived control? And in the case of applications for entertainment in which fun is an important part of user experience, should we even aim for perfect control, or is the optimum elsewhere? In this experiment the user plays a simple game in which a hamster has to be guided to the exit of a maze, in which the amount of control the user has over the hamster is varied. The variation of control through confusion matrices makes it possible to simulate the experience of using a BCI, while using the traditional keyboard for input. After each session the user ïżœlled out a short questionnaire on fun and perceived control. Analysis of the data showed that the perceived control of the user could largely be explained by the amount of control in the respective session. As expected, user frustration decreases with increasing control. Moreover, the results indicate that the relation between fun and control is not linear. Although in the beginning fun does increase with improved control, the level of fun drops again just before perfect control is reached. This poses new insights for developers of games wanting to incorporate some form of BCI in their game: for creating a fun game, unreliable input can be used to create a challenge for the user

    Estimating the reliability of composite scores

    Get PDF
    In situations where multiple tests are administered (such as the GCSE subjects), scores from individual tests are frequently combined to produce a composite score. As part of the Ofqual reliability programme, this study, through a review of literature, attempts to: look at the different approaches that are employed to form composite scores from component or unit scores; investigate the implications of the use of the different approaches for the psychometric properties, particularly the reliability, of the composite scores; and identify procedures that are commonly used to estimate the reliability measure of composite scores

    Informants in Organizational Marketing Research

    Get PDF
    Organizational research frequently involves seeking judgmental data from multiple informants within organizations. Researchers are often faced with determining how many informants to survey, who those informants should be and (if more than one) how best to aggregate responses when disagreement exists between those responses. Using both recall and forecasting data from a laboratory study involving the MARKSTRAT simulation, we show that when there are multiple respondents who disagree, responses aggregated using confidence-based or competence-based weights outperform those with data-based weights, which in turn provide significant gains in estimation accuracy over simply averaging respondent reports. We then illustrate how these results can be used to determine the best number of respondents for a market research task as well as to provide an effective screening mechanism when seeking a single, best informant.screening;marketing research;aggregation;organizational research;survey research
    • 

    corecore