5,713 research outputs found

    Human-AI Interaction in the Presence of Ambiguity: From Deliberation-based Labeling to Ambiguity-aware AI

    Get PDF
    Ambiguity, the quality of being open to more than one interpretation, permeates our lives. It comes in different forms including linguistic and visual ambiguity, arises for various reasons and gives rise to disagreements among human observers that can be hard or impossible to resolve. As artificial intelligence (AI) is increasingly infused into complex domains of human decision making it is crucial that the underlying AI mechanisms also support a notion of ambiguity. Yet, existing AI approaches typically assume that there is a single correct answer for any given input, lacking mechanisms to incorporate diverse human perspectives in various parts of the AI pipeline, including data labeling, model development and user interface design. This dissertation aims to shed light on the question of how humans and AI can be effective partners in the presence of ambiguous problems. To address this question, we begin by studying group deliberation as a tool to detect and analyze ambiguous cases in data labeling. We present three case studies that investigate group deliberation in the context of different labeling tasks, data modalities and types of human labeling expertise. First, we present CrowdDeliberation, an online platform for synchronous group deliberation in novice crowd work, and show how worker deliberation affects resolvability and accuracy in text classification tasks of varying subjectivity. We then translate our findings to the expert domain of medical image classification to demonstrate how imposing additional structure on deliberation arguments can improve the efficiency of the deliberation process without compromising its reliability. Finally, we present CrowdEEG, an online platform for collaborative annotation and deliberation of medical time series data, implementing an asynchronous and highly structured deliberation process. Our findings from an observational study with 36 sleep health professionals help explain how disagreements arise and when they can be resolved through group deliberation. Beyond investigating group deliberation within data labeling, we also demonstrate how the resulting deliberation data can be used to support both human and artificial intelligence. To this end, we first present results from a controlled experiment with ten medical generalists, suggesting that reading deliberation data from medical specialists significantly improves generalists' comprehension and diagnostic accuracy on difficult patient cases. Second, we leverage deliberation data to simulate and investigate AI assistants that not only highlight ambiguous cases, but also explain the underlying sources of ambiguity to end users in human-interpretable terms. We provide evidence suggesting that this form of ambiguity-aware AI can help end users to triage and trust AI-provided data classifications. We conclude by outlining the main contributions of this dissertation and directions for future research

    Avoiding the Common Wisdom Fallacy: The Role of Social Sciences in Constitutional Adjudication

    Get PDF
    More than one hundred years ago, the U.S. Supreme Court started to refer to social science evidence in its judgments. However, this has not resonated with many constitutional courts outside the United States, in particular in continental Europe. This contribution has a twofold aim. First, it tries to show that legal reasoning in constitutional law is often based on empirical assumptions so that there is a strong need for the use of social sciences. However, constitutional courts often lack the necessary expertise to deal with empirical questions. Therefore, I will discuss three potential strategies to make use of social science evidence. Judges can interpret social facts on their own, they can afford a margin of appreciation to the legislator, or they can defer the question to social science experts. It will be argued that none of these strategies is satisfactory so that courts will have to employ a combination of different strategies. In order to illustrate the argument, I will discuss decisions of different jurisdictions, including the United States, Canada, Germany and South Africa.proportionality, comparative law, Germany, Uncertainty, margin of appreciation, constitutional law, Canada, South Africa, social sciences, empiricism

    Doctor of Philosophy

    Get PDF
    dissertationManual annotation of clinical texts is often used as a method of generating reference standards that provide data for training and evaluation of Natural Language Processing (NLP) systems. Manually annotating clinical texts is time consuming, expensive, and requires considerable cognitive effort on the part of human reviewers. Furthermore, reference standards must be generated in ways that produce consistent and reliable data but must also be valid in order to adequately evaluate the performance of those systems. The amount of labeled data necessary varies depending on the level of analysis, the complexity of the clinical use case, and the methods that will be used to develop automated machine systems for information extraction and classification. Evaluating methods that potentially reduce cost, manual human workload, introduce task efficiencies, and reduce the amount of labeled data necessary to train NLP tools for specific clinical use cases are active areas of research inquiry in the clinical NLP domain. This dissertation integrates a mixed methods approach using methodologies from cognitive science and artificial intelligence with manual annotation of clinical texts. Aim 1 of this dissertation identifies factors that affect manual annotation of clinical texts. These factors are further explored by evaluating approaches that may introduce efficiencies into manual review tasks applied to two different NLP development areas - semantic annotation of clinical concepts and identification of information representing Protected Health Information (PHI) as defined by HIPAA. Both experiments integrate iv different priming mechanisms using noninteractive and machine-assisted methods. The main hypothesis for this research is that integrating pre-annotation or other machineassisted methods within manual annotation workflows will improve efficiency of manual annotation tasks without diminishing the quality of generated reference standards

    Reliability of causality assessment for drug, herbal and dietary supplement hepatotoxicity in the Drug‐Induced Liver Injury Network (DILIN)

    Get PDF
    Background & AimsBecause of the lack of objective tests to diagnose drug‐induced liver injury (DILI), causality assessment is a matter of debate. Expert opinion is often used in research and industry, but its test–retest reliability is unknown. To determine the test–retest reliability of the expert opinion process used by the Drug‐Induced Liver Injury Network (DILIN).MethodsThree DILIN hepatologists adjudicate suspected hepatotoxicity cases to one of five categories representing levels of likelihood of DILI. Adjudication is based on retrospective assessment of gathered case data that include prospective follow‐up information. One hundred randomly selected DILIN cases were re‐assessed using the same processes for initial assessment but by three different reviewers in 92% of cases.ResultsThe median time between assessments was 938 days (range 140–2352). Thirty‐one cases involved >1 agent. Weighted kappa statistics for overall case and individual agent category agreement were 0.60 (95% CI: 0.50–0.71) and 0.60 (0.52–0.68) respectively. Overall case adjudications were within one category of each other 93% of the time, while 5% differed by two categories and 2% differed by three categories. Fourteen per cent crossed the 50% threshold of likelihood owing to competing diagnoses or atypical timing between drug exposure and injury.ConclusionsThe DILIN expert opinion causality assessment method has moderate interobserver reliability but very good agreement within one category. A small but important proportion of cases could not be reliably diagnosed as ≥50% likely to be DILI.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/111130/1/liv12540.pd

    The Parliament of the Experts

    Get PDF
    In the administrative state, how should expert opinions be aggregated and used? If a panel of experts is unanimous on a question of fact, causation, or prediction, can an administrative agency rationally disagree, and on what grounds? If experts are split into a majority view and a minority view, must the agency follow the majority? Should reviewing courts limit agency discretion to select among the conflicting views of experts, or to depart from expert consensus? I argue that voting by expert panels is likely, on average, to be epistemically superior to the substantive judgment of agency heads, in determining questions of fact, causation, or prediction. Nose counting of expert panels should generally be an acceptable basis for decision under the arbitrary and capricious or substantial evidence tests. Moreover, agencies should be obliged to follow the (super)majority view of an expert panel, even if the agency\u27s own judgment is to the contrary, unless the agency can give an epistemically valid second-order reason for rejecting the panel majority\u27s view

    Instructional Designers as Reflective Practitioners: Developing Professional Identity through Reflection

    Get PDF
    As the design thinking approach becomes more established in the instructional design (ID) discourse, the field will have to reconsider the professional identity of instructional designers. Rather than passively following models or processes, a professional identity rooted in design thinking calls for instructional designers to be dynamic agents of change who use reflective thinking to navigate the design space and develop solutions to ill-structured problems. Graduate programs in ID will also need to prepare students to manage the complexities they will encounter in their professional practice, including the establishment of design precedents, reflective thinking skills, and the foundations of professional identity. This research explored the use of reflective writing assignments in an introductory ID graduate course, with results indicating that most students are able to engage in meaningful reflection in relation to prompts concerning design concepts, experiences, and identity attributes, although no clear patterns of improvement emerged over time. Future directions for research include the use of feedback and the structure of prompts (including frequency of writing assignments and wording of prompts) to support improved student performance

    Improving outcome assessment for clinical trials in stroke

    Get PDF
    Abstract Clinical trials are at the centre of advances in our understanding of stroke and its optimal treatment. In this thesis the uses and properties of outcome assessment scales for stroke trials are described, with particular attention given to the modified Rankin Scale (mRS). Through comprehensive literature review I will show that mRS is the most frequently used functional outcome scale in clinical trials but efficacy of the scale is potentially limited by inter-observer variability. Using a “mock” clinical trial design I demonstrate that inter-observer mRS variability in contemporary practice is moderate (k=0.57). Adding these data to systematic review of published data, confirms an overall moderate inter-observer variability across ten trials (k=0.46). Differing strategies to improve mRS reliability will then be described. I will outline development of a bespoke training package, international training scores across 2942 raters again confirms suboptimal reliability (k=0.67). A pilot trial using endpoint committee review of video recorded interviews demonstrates feasibility of this approach. Attempts to improve reliability by deriving mRS from data recorded in patients’ hospital records are not successful (k=0.34). In the final chapters I present a novel methodology for describing stroke outcomes – “home-time”. This measure shows good agreement with mRS, except at extremes of disability. Finally to put mRS in a historical context, the career of John Rankin and the development of his eponymous scale is recounted
    corecore