188 research outputs found

    Human-AI Interaction in the Presence of Ambiguity: From Deliberation-based Labeling to Ambiguity-aware AI

    Get PDF
    Ambiguity, the quality of being open to more than one interpretation, permeates our lives. It comes in different forms including linguistic and visual ambiguity, arises for various reasons and gives rise to disagreements among human observers that can be hard or impossible to resolve. As artificial intelligence (AI) is increasingly infused into complex domains of human decision making it is crucial that the underlying AI mechanisms also support a notion of ambiguity. Yet, existing AI approaches typically assume that there is a single correct answer for any given input, lacking mechanisms to incorporate diverse human perspectives in various parts of the AI pipeline, including data labeling, model development and user interface design. This dissertation aims to shed light on the question of how humans and AI can be effective partners in the presence of ambiguous problems. To address this question, we begin by studying group deliberation as a tool to detect and analyze ambiguous cases in data labeling. We present three case studies that investigate group deliberation in the context of different labeling tasks, data modalities and types of human labeling expertise. First, we present CrowdDeliberation, an online platform for synchronous group deliberation in novice crowd work, and show how worker deliberation affects resolvability and accuracy in text classification tasks of varying subjectivity. We then translate our findings to the expert domain of medical image classification to demonstrate how imposing additional structure on deliberation arguments can improve the efficiency of the deliberation process without compromising its reliability. Finally, we present CrowdEEG, an online platform for collaborative annotation and deliberation of medical time series data, implementing an asynchronous and highly structured deliberation process. Our findings from an observational study with 36 sleep health professionals help explain how disagreements arise and when they can be resolved through group deliberation. Beyond investigating group deliberation within data labeling, we also demonstrate how the resulting deliberation data can be used to support both human and artificial intelligence. To this end, we first present results from a controlled experiment with ten medical generalists, suggesting that reading deliberation data from medical specialists significantly improves generalists' comprehension and diagnostic accuracy on difficult patient cases. Second, we leverage deliberation data to simulate and investigate AI assistants that not only highlight ambiguous cases, but also explain the underlying sources of ambiguity to end users in human-interpretable terms. We provide evidence suggesting that this form of ambiguity-aware AI can help end users to triage and trust AI-provided data classifications. We conclude by outlining the main contributions of this dissertation and directions for future research

    Defining and Assessing Critical Thinking: toward an automatic analysis of HiEd students’ written texts

    Get PDF
    L'obiettivo principale di questa tesi di dottorato è testare, attraverso due studi empirici, l'affidabilità di un metodo volto a valutare automaticamente le manifestazioni del Pensiero Critico (CT) nei testi scritti da studenti universitari. Gli studi empirici si sono basati su una review critica della letteratura volta a proporre una nuova classificazione per sistematizzare le diverse definizioni di CT e i relativi approcci teorici. La review esamina anche la relazione tra le diverse definizioni di CT e i relativi metodi di valutazione. Dai risultati emerge la necessità di concentrarsi su misure aperte per la valutazione del CT e di sviluppare strumenti automatici basati su tecniche di elaborazione del linguaggio naturale (NLP) per superare i limiti attuali delle misure aperte, come l’attendibilità e i costi di scoring. Sulla base di una rubrica sviluppata e implementata dal gruppo di ricerca del Centro di Didattica Museale – Università di Roma Tre (CDM) per la valutazione e l'analisi dei livelli di CT all'interno di risposte aperte (Poce, 2017), è stato progettato un prototipo per la misurazione automatica di alcuni indicatori di CT. Il primo studio empirico condotto su un gruppo di 66 docenti universitari mostra livelli di affidabilità soddisfacenti della rubrica di valutazione, mentre la valutazione effettuata dal prototipo non era sufficientemente attendibile. I risultati di questa sperimentazione sono stati utilizzati per capire come e in quali condizioni il modello funziona meglio. La seconda indagine empirica era volta a capire quali indicatori del linguaggio naturale sono maggiormente associati a sei sottodimensioni del CT, valutate da esperti in saggi scritti in lingua italiana. Lo studio ha utilizzato un corpus di 103 saggi pre-post di studenti universitari di laurea magistrale che hanno frequentato il corso di "Pedagogia sperimentale e valutazione scolastica". All'interno del corso, sono state proposte due attività per stimolare il CT degli studenti: la valutazione delle risorse educative aperte (OER) (obbligatoria e online) e la progettazione delle OER (facoltativa e in modalità blended). I saggi sono stati valutati sia da valutatori esperti, considerando sei sotto-dimensioni del CT, sia da un algoritmo che misura automaticamente diversi tipi di indicatori del linguaggio naturale. Abbiamo riscontrato un'affidabilità interna positiva e un accordo tra valutatori medio-alto. I livelli di CT degli studenti sono migliorati in modo significativo nel post-test. Tre indicatori del linguaggio naturale sono 5 correlati in modo significativo con il punteggio totale di CT: la lunghezza del corpus, la complessità della sintassi e la funzione di peso tf-idf (term frequency–inverse document frequency). I risultati raccolti durante questo dottorato hanno implicazioni sia teoriche che pratiche per la ricerca e la valutazione del CT. Da un punto di vista teorico, questa tesi mostra sovrapposizioni inesplorate tra diverse tradizioni, prospettive e metodi di studio del CT. Questi punti di contatto potrebbero costituire la base per un approccio interdisciplinare e la costruzione di una comprensione condivisa di CT. I metodi di valutazione automatica possono supportare l’uso di misure aperte per la valutazione del CT, specialmente nell'insegnamento online. Possono infatti facilitare i docenti e i ricercatori nell'affrontare la crescente presenza di dati linguistici prodotti all'interno di piattaforme educative (es. Learning Management Systems). A tal fine, è fondamentale sviluppare metodi automatici per la valutazione di grandi quantità di dati che sarebbe impossibile analizzare manualmente, fornendo agli insegnanti e ai valutatori un supporto per il monitoraggio e la valutazione delle competenze dimostrate online dagli studenti.The main goal of this PhD thesis is to test, through two empirical studies, the reliability of a method aimed at automatically assessing Critical Thinking (CT) manifestations in Higher Education students’ written texts. The empirical studies were based on a critical review aimed at proposing a new classification for systematising different CT definitions and their related theoretical approaches. The review also investigates the relationship between the different adopted CT definitions and CT assessment methods. The review highlights the need to focus on open-ended measures for CT assessment and to develop automatic tools based on Natural Language Processing (NLP) technique to overcome current limitations of open-ended measures, such as reliability and costs. Based on a rubric developed and implemented by the Center for Museum Studies – Roma Tre University (CDM) research group for the evaluation and analysis of CT levels within open-ended answers (Poce, 2017), a NLP prototype for the automatic measurement of CT indicators was designed. The first empirical study was carried out on a group of 66 university teachers. The study showed satisfactory reliability levels of the CT evaluation rubric, while the evaluation carried out by the prototype was not yet sufficiently reliable. The results were used to understand how and under what conditions the model works better. The second empirical investigation was aimed at understanding which NLP features are more associated with six CT sub-dimensions as assessed by human raters in essays written in the Italian language. The study used a corpus of 103 students’ pre-post essays who attended a Master's Degree module in “Experimental Education and School Assessment” to assess students' CT levels. Within the module, we proposed two activities to stimulate students' CT: Open Educational Resources (OERs) assessment (mandatory and online) and OERs design (optional and blended). The essays were assessed both by expert evaluators, considering six CT sub-dimensions, and by an algorithm that automatically calculates different kinds of NLP features. The study shows a positive internal reliability and a medium to high inter-coder agreement in expert evaluation. Students' CT levels improved significantly in the post-test. Three NLP indicators significantly correlate with CT total score: the Corpus Length, the Syntax Complexity, and an adapted measure of Term Frequency- Inverse Document Frequency. The results collected during this PhD have both theoretical and practical implications for CT research and assessment. From a theoretical perspective, this thesis shows unexplored similarities among different CT traditions, perspectives, and study methods. These similarities could be exploited to open up an interdisciplinary dialogue among experts and build up a shared understanding of CT. Automatic assessment methods can enhance the use of open-ended measures for CT assessment, especially in online teaching. Indeed, they can support teachers and researchers to deal with the growing presence of linguistic data produced within educational 4 platforms. To this end, it is pivotal to develop automatic methods for the evaluation of large amounts of data which would be impossible to analyse manually, providing teachers an

    Measuring Quality Standards, Assessment Practices, and Outcomes/Effectiveness of Competency-Based Education (Cbe) Using Mixed Methods Research to Determine Cbe’s Vitality

    Get PDF
    Competency-based education (CBE) has been around since the late 1800s but has recently served as a revamped pedagogy designed to respond to some of higher education’s most pressing issues today: low degree attainment and problems of equity; lack of alignment between education and the job market; low and slow graduation rates; high tuition; and poor academic quality. Despite the promises of CBE to resolve these issues, the approach to learning lacks much empirical data. The researcher provided a summary of current research on CBE and identified gaps in the literature. Three gaps were identified including why CBE had failed in the past (and how the reasons for its previous failures are being used today in new CBE quality standards), literature on assessment practices (and how institutions are or are not following these best practices), and reporting on student outcomes including graduation, race/gender equity, and job placement compared to traditional programs. These three gaps led to the creation of three research questions directed by three theoretical frameworks (Lewin’s 3-Stage Theory of Change and Force Field Analysis, Bigg’s constructive alignment theory, and Christensen’s theory of disruptive innovation), as well as one conceptual framework (phenomenology) to tie the study together. The research questions were addressed using multiple research methods including a rubric-based assessment, qualitative interviews, and statistical analyses. All the questions in this dissertation were related to the overall purpose, which was to evaluate whether CBE will have vitality in American higher education today. Vital success was defined as two or more research questions having positive or successful results. Failure was defined as fewer than two research questions having positive or successful results. Based on the results of RQ 1, RQ 2, and RQ 3, the competency-based education movement will likely fail again. However, it is hoped that this research will provide valuable information to those working in competency-based education so they may adjust their programs for better chances of vitality

    A Delphi Study of Effective Practices for Developing Competency-Based Learning Models in Higher Education

    Get PDF
    Currently, there is an increase in competency-based education programs in higher education institutions in response to student and employer needs. However, research is lacking on effective practices for developing competencies, assessments, and learning resources for these programs. The purpose of this qualitative Delphi study was to gather expert opinions about effective practices for developing competencies, assessments, and learning resources in competency-based programs in higher education. The conceptual framework was based on principles of andragogy, critical subjectivity, and social constructivism. Ten long-term specialists in developing competency-based programs in higher education served as participants. Data from 3 rounds of interviews were coded and categorized using Delphi methodology. Eighteen principles for effective practices were agreed upon for developing competencies, 15 principles for effective practice were agreed upon for developing assessments, and 16 principles for effective practice were agreed upon for identifying and leveraging learning resources. Areas of disagreement related to competencies, assessments, and learning resources were identified, with evidence that the variation in rankings presented by participants was due to the unique contexts of different higher education programs. The research from this study contributes to positive social change by providing an emerging list of effective practices useful in developing programs that help students graduate sooner with both a degree and skill set relevant to employers and to their future personal satisfaction

    Best Practices in High Fidelity Patient Simulation to Enhance Higher Order Thinking Skills

    Get PDF
    Undergraduate nursing education has begun to use very expensive and time intensive high fidelity simulation activities without making full use of the ability to build higher order thinking skills in students. Current research in high fidelity patient simulation has tended to be subjective and focus on critical thinking. However, reflective thinking habits of mind must be in place before full use can be made of critical thinking skills. A comprehensive search of all reflective thinking literature used in conjunction with simulated patient experiences by healthcare students was undertaken. A guideline was created for nurse faculty to use that outlined current best practices in simulation to maximize reflective thinking. Though the research on which the guideline was based has been mainly subjective, several analytical studies were found that supported the findings. Policy changes to incorporate reflective thinking and the associated activities were recommended for nursing students and continuing nursing education. Nurse researchers and educators should incorporate reflective thinking exercises with their simulated patient undertakings to maximize higher order thinking skills

    Teaching Non-Technological Skills for Successful Building Information Modeling (BIM) Projects

    Get PDF
    abstract: Implementing Building Information Modeling (BIM) in construction projects has many potential benefits, but issues of projects can hinder its realization in practice. Although BIM involves using the technology, more than four-fifths of the recurring issues in current BIM-based construction projects are related to the people and processes (i.e., the non-technological elements of BIM). Therefore, in addition to the technological skills required for using BIM, educators should also prepare university graduates with the non-technological skills required for managing the people and processes of BIM. This research’s objective is to develop a learning module that teaches the non-technological skills for addressing common, people- and process-related, issues in BIM-based construction projects. To achieve this objective, this research outlines the steps taken to create the learning module and identify its impact on a BIM course. The contribution of this research is in the understanding of the pedagogical value of the developed problem-based learning module and documenting the learning module’s development process.Dissertation/ThesisDoctoral Dissertation Civil, Environmental and Sustainable Engineering 201

    Establishing Content Validity of an Evaluation Rubric for Mobile Technology Applications Utilizing the Delphi Method

    Get PDF
    Abstract The purpose of this research study was to establish content validity for an evaluation tool designed to measure the quality of mobile technology applications (Apps) for use in education settings. The rubric evaluation tool was developed by the researcher based on a review of the literature and consultation with recognized experts in the use of mobile technologies in education. This Delphi study was conducted in collaboration with over 90 Subject Matter Experts (SMEs) from around the world who provided feedback electronically on the domains and score descriptors that comprise the tool developed for this investigation, The Evaluation Rubric for Mobile Apps. The findings resulted in strong content validity being established for the Evaluation Rubric for Mobile Apps. Data from participants were used to refine the domains and score descriptors resulting in an empirically validated, robust evaluation tool for educators to employ in their decision making processes related to the use of mobile technology Apps in education settings. At the school and district level, this rubric has implications to ensure that limited funds available for technology purchases are used in the most effective and efficient manner. On a broader scale, researchers examining technology Apps in schools can employ the rubric in empirical studies to examine the impact of using high quality Apps on teaching and learning

    Simulation in medical education : a case study evaluating the efficacy of high-fidelity patient simulation

    Get PDF
    Indiana University-Purdue University Indianapolis (IUPUI)High-fidelity patient simulation (HFPS) recreates clinical scenarios by combining mock patients and realistic environments to prepare learners with practical experience to meet the demands of modern clinical practice while ensuring patient safety. This research investigated the efficacy of HFPS in medical education through a case study of the Indiana University Bloomington Interprofessional Simulation Center. The goal of this research was to understand the role of simulated learning for attaining clinical selfefficacy and how HFPS training impacts performance. Three research questions were addressed to investigate HFPS in medical education using a mixed methods study design. Clinical competence and self-efficacy were quantified among medical students at IUSMBloomington utilizing HFPS compared to two IUSM campuses that did not incorporate this instructional intervention. Clinical competence was measured as performance on the Objective Structured Clinical Examination (OSCE), while self-efficacy of medical students was measured through a validated questionnaire. Although the effect of HFPS on quantitative results was not definitive, general trends allude to the ability of HFPS to recalibrate learners’ perceived and actual performance. Additionally, perceptual data regarding HFPS from both medical students and medical residents was analyzed. Qualitative results discovered the utility of HFPS for obtaining the clinical mental framework of a physician, fundamental psychomotor skills, and essential practice communicating and functioning as a healthcare team during interprofessional education simulations. Continued studies of HFPS are necessary to fully elucidate the value of this instructional adjunct, however positive outcomes of simulated learning on both medical students and medical residents were discovered in this study contributing to the existing HFPS literature

    Identifying Issues for the Bright ICT Initiative: A Worldwide Delphi Study of IS Journal Editors and Scholars

    Get PDF
    Information and communication technology (ICT) continues to change business as we know it. As ICT further integrates into our daily lives, it creates more opportunities to both help and hinder fundamental social problems throughout the world. In response to these growing and urgent societal needs, the Association for Information Systems approved the Bright ICT Initiative to extend IS research beyond a focus on business to take on the broader challenges of an ICT-enabled bright society. We conducted a Delphi study to provide guidance on where bright ICT-minded researchers might focus to produce their greatest impact. In this paper, we report on our findings. The Delphi panel comprised 182 globally distributed IS journal editors who participated in a three-round consensus-building process via the Internet. Our results provide a framework of eleven research priority areas and specific research topics for those engaged in future-oriented, socially conscious IS research

    Assessment of General Practitioners' Performance in Daily Practice

    Get PDF
    The EURACT Performance Agenda (EUPA) of the European Academy of Teachers in General Practice/Family Medicine (EURACT) is the third paper in a row following the European Definition of General Practice/Family Medicine (WONCA Europe) in 2002 which identified 6 core competencies and 11 abilities every general practitioner (GP) should master, and the EURACT Educational Agenda in 2005 which provided a framework to teach the core competencies by setting learning aims and monitoring their achievement. Performance (in contrast to competence) is understood as the level of actual performance in clinical care and communication with patients in daily practice. Small groups of EURACT Council members from 40 European countries have discussed and developed EUPA since 2007. EUPA is a general, uniform and basic agenda of performance elements every GP masters in daily practice, applicable and adaptable to different countries with different systems. It deals with the process and result of actual work in daily practice, not with a teaching/learning situation. EUPA discusses in depth the psychometrics and edumetrics of performance assessment. Case vignettes of abilities in GPs’ daily practice illustrate performance and its assessment in every chapter. Examples of common assessment tools are workplace-based assessment by a peer, feedback from patients or staff and audit of medical records. EUPA can help to shape various performance assessment activities held locally in general practice/family medicine, e. g. in continuing professional development cycles, re-certification/re-accreditation/licensing procedures, peer hospitation programmes and practice audit programmes in quality management. It can give orientation for self-assessment for reflective practitioners in their continuing professional development. The EURACT Performance Agenda (EUPA) encourages general practitioners to initialize performance agendas adapted to their national health system to further strengthen the role of general practice/family medicine in their country
    • …
    corecore