188 research outputs found
Human-AI Interaction in the Presence of Ambiguity: From Deliberation-based Labeling to Ambiguity-aware AI
Ambiguity, the quality of being open to more than one interpretation, permeates our lives. It comes in different forms including linguistic and visual ambiguity, arises for various reasons and gives rise to disagreements among human observers that can be hard or impossible to resolve. As artificial intelligence (AI) is increasingly infused into complex domains of human decision making it is crucial that the underlying AI mechanisms also support a notion of ambiguity. Yet, existing AI approaches typically assume that there is a single correct answer for any given input, lacking mechanisms to incorporate diverse human perspectives in various parts of the AI pipeline, including data labeling, model development and user interface design.
This dissertation aims to shed light on the question of how humans and AI can be effective partners in the presence of ambiguous problems. To address this question, we begin by studying group deliberation as a tool to detect and analyze ambiguous cases in data labeling. We present three case studies that investigate group deliberation in the context of different labeling tasks, data modalities and types of human labeling expertise.
First, we present CrowdDeliberation, an online platform for synchronous group deliberation in novice crowd work, and show how worker deliberation affects resolvability and accuracy in text classification tasks of varying subjectivity. We then translate our findings to the expert domain of medical image classification to demonstrate how imposing additional structure on deliberation arguments can improve the efficiency of the deliberation process without compromising its reliability. Finally, we present CrowdEEG, an online platform for collaborative annotation and deliberation of medical time series data, implementing an asynchronous and highly structured deliberation process. Our findings from an observational study with 36 sleep health professionals help explain how disagreements arise and when they can be resolved through group deliberation.
Beyond investigating group deliberation within data labeling, we also demonstrate how the resulting deliberation data can be used to support both human and artificial intelligence. To this end, we first present results from a controlled experiment with ten medical generalists, suggesting that reading deliberation data from medical specialists significantly improves generalists' comprehension and diagnostic accuracy on difficult patient cases. Second, we leverage deliberation data to simulate and investigate AI assistants that not only highlight ambiguous cases, but also explain the underlying sources of ambiguity to end users in human-interpretable terms. We provide evidence suggesting that this form of ambiguity-aware AI can help end users to triage and trust AI-provided data classifications.
We conclude by outlining the main contributions of this dissertation and directions for future research
Defining and Assessing Critical Thinking: toward an automatic analysis of HiEd students’ written texts
L'obiettivo principale di questa tesi di dottorato è testare, attraverso due studi empirici, l'affidabilitĂ
di un metodo volto a valutare automaticamente le manifestazioni del Pensiero Critico (CT) nei testi
scritti da studenti universitari. Gli studi empirici si sono basati su una review critica della letteratura
volta a proporre una nuova classificazione per sistematizzare le diverse definizioni di CT e i relativi
approcci teorici. La review esamina anche la relazione tra le diverse definizioni di CT e i relativi
metodi di valutazione. Dai risultati emerge la necessitĂ di concentrarsi su misure aperte per la
valutazione del CT e di sviluppare strumenti automatici basati su tecniche di elaborazione del
linguaggio naturale (NLP) per superare i limiti attuali delle misure aperte, come l’attendibilità e i
costi di scoring.
Sulla base di una rubrica sviluppata e implementata dal gruppo di ricerca del Centro di Didattica
Museale – Università di Roma Tre (CDM) per la valutazione e l'analisi dei livelli di CT all'interno di
risposte aperte (Poce, 2017), è stato progettato un prototipo per la misurazione automatica di alcuni
indicatori di CT. Il primo studio empirico condotto su un gruppo di 66 docenti universitari mostra
livelli di affidabilitĂ soddisfacenti della rubrica di valutazione, mentre la valutazione effettuata dal
prototipo non era sufficientemente attendibile. I risultati di questa sperimentazione sono stati utilizzati
per capire come e in quali condizioni il modello funziona meglio. La seconda indagine empirica era
volta a capire quali indicatori del linguaggio naturale sono maggiormente associati a sei sottodimensioni
del CT, valutate da esperti in saggi scritti in lingua italiana. Lo studio ha utilizzato un
corpus di 103 saggi pre-post di studenti universitari di laurea magistrale che hanno frequentato il
corso di "Pedagogia sperimentale e valutazione scolastica". All'interno del corso, sono state proposte
due attivitĂ per stimolare il CT degli studenti: la valutazione delle risorse educative aperte (OER)
(obbligatoria e online) e la progettazione delle OER (facoltativa e in modalitĂ blended). I saggi sono
stati valutati sia da valutatori esperti, considerando sei sotto-dimensioni del CT, sia da un algoritmo
che misura automaticamente diversi tipi di indicatori del linguaggio naturale. Abbiamo riscontrato
un'affidabilitĂ interna positiva e un accordo tra valutatori medio-alto. I livelli di CT degli studenti
sono migliorati in modo significativo nel post-test. Tre indicatori del linguaggio naturale sono
5
correlati in modo significativo con il punteggio totale di CT: la lunghezza del corpus, la complessitĂ
della sintassi e la funzione di peso tf-idf (term frequency–inverse document frequency). I risultati
raccolti durante questo dottorato hanno implicazioni sia teoriche che pratiche per la ricerca e la
valutazione del CT. Da un punto di vista teorico, questa tesi mostra sovrapposizioni inesplorate tra
diverse tradizioni, prospettive e metodi di studio del CT. Questi punti di contatto potrebbero costituire
la base per un approccio interdisciplinare e la costruzione di una comprensione condivisa di CT.
I metodi di valutazione automatica possono supportare l’uso di misure aperte per la valutazione del
CT, specialmente nell'insegnamento online. Possono infatti facilitare i docenti e i ricercatori
nell'affrontare la crescente presenza di dati linguistici prodotti all'interno di piattaforme educative (es.
Learning Management Systems). A tal fine, è fondamentale sviluppare metodi automatici per la
valutazione di grandi quantitĂ di dati che sarebbe impossibile analizzare manualmente, fornendo agli
insegnanti e ai valutatori un supporto per il monitoraggio e la valutazione delle competenze
dimostrate online dagli studenti.The main goal of this PhD thesis is to test, through two empirical studies, the reliability of a method
aimed at automatically assessing Critical Thinking (CT) manifestations in Higher Education students’
written texts. The empirical studies were based on a critical review aimed at proposing a new
classification for systematising different CT definitions and their related theoretical approaches. The
review also investigates the relationship between the different adopted CT definitions and CT
assessment methods. The review highlights the need to focus on open-ended measures for CT
assessment and to develop automatic tools based on Natural Language Processing (NLP) technique
to overcome current limitations of open-ended measures, such as reliability and costs. Based on a
rubric developed and implemented by the Center for Museum Studies – Roma Tre University (CDM)
research group for the evaluation and analysis of CT levels within open-ended answers (Poce, 2017),
a NLP prototype for the automatic measurement of CT indicators was designed. The first empirical
study was carried out on a group of 66 university teachers. The study showed satisfactory reliability
levels of the CT evaluation rubric, while the evaluation carried out by the prototype was not yet
sufficiently reliable. The results were used to understand how and under what conditions the model
works better. The second empirical investigation was aimed at understanding which NLP features are
more associated with six CT sub-dimensions as assessed by human raters in essays written in the
Italian language. The study used a corpus of 103 students’ pre-post essays who attended a Master's
Degree module in “Experimental Education and School Assessment” to assess students' CT levels.
Within the module, we proposed two activities to stimulate students' CT: Open Educational
Resources (OERs) assessment (mandatory and online) and OERs design (optional and blended). The
essays were assessed both by expert evaluators, considering six CT sub-dimensions, and by an
algorithm that automatically calculates different kinds of NLP features. The study shows a positive
internal reliability and a medium to high inter-coder agreement in expert evaluation. Students' CT
levels improved significantly in the post-test. Three NLP indicators significantly correlate with CT
total score: the Corpus Length, the Syntax Complexity, and an adapted measure of Term Frequency-
Inverse Document Frequency. The results collected during this PhD have both theoretical and
practical implications for CT research and assessment. From a theoretical perspective, this thesis
shows unexplored similarities among different CT traditions, perspectives, and study methods. These
similarities could be exploited to open up an interdisciplinary dialogue among experts and build up a
shared understanding of CT. Automatic assessment methods can enhance the use of open-ended
measures for CT assessment, especially in online teaching. Indeed, they can support teachers and
researchers to deal with the growing presence of linguistic data produced within educational
4
platforms. To this end, it is pivotal to develop automatic methods for the evaluation of large amounts
of data which would be impossible to analyse manually, providing teachers an
Measuring Quality Standards, Assessment Practices, and Outcomes/Effectiveness of Competency-Based Education (Cbe) Using Mixed Methods Research to Determine Cbe’s Vitality
Competency-based education (CBE) has been around since the late 1800s but has recently served as a revamped pedagogy designed to respond to some of higher education’s most pressing issues today: low degree attainment and problems of equity; lack of alignment between education and the job market; low and slow graduation rates; high tuition; and poor academic quality. Despite the promises of CBE to resolve these issues, the approach to learning lacks much empirical data. The researcher provided a summary of current research on CBE and identified gaps in the literature. Three gaps were identified including why CBE had failed in the past (and how the reasons for its previous failures are being used today in new CBE quality standards), literature on assessment practices (and how institutions are or are not following these best practices), and reporting on student outcomes including graduation, race/gender equity, and job placement compared to traditional programs. These three gaps led to the creation of three research questions directed by three theoretical frameworks (Lewin’s 3-Stage Theory of Change and Force Field Analysis, Bigg’s constructive alignment theory, and Christensen’s theory of disruptive innovation), as well as one conceptual framework (phenomenology) to tie the study together. The research questions were addressed using multiple research methods including a rubric-based assessment, qualitative interviews, and statistical analyses. All the questions in this dissertation were related to the overall purpose, which was to evaluate whether CBE will have vitality in American higher education today. Vital success was defined as two or more research questions having positive or successful results. Failure was defined as fewer than two research questions having positive or successful results. Based on the results of RQ 1, RQ 2, and RQ 3, the competency-based education movement will likely fail again. However, it is hoped that this research will provide valuable information to those working in competency-based education so they may adjust their programs for better chances of vitality
A Delphi Study of Effective Practices for Developing Competency-Based Learning Models in Higher Education
Currently, there is an increase in competency-based education programs in higher education institutions in response to student and employer needs. However, research is lacking on effective practices for developing competencies, assessments, and learning resources for these programs. The purpose of this qualitative Delphi study was to gather expert opinions about effective practices for developing competencies, assessments, and learning resources in competency-based programs in higher education. The conceptual framework was based on principles of andragogy, critical subjectivity, and social constructivism. Ten long-term specialists in developing competency-based programs in higher education served as participants. Data from 3 rounds of interviews were coded and categorized using Delphi methodology. Eighteen principles for effective practices were agreed upon for developing competencies, 15 principles for effective practice were agreed upon for developing assessments, and 16 principles for effective practice were agreed upon for identifying and leveraging learning resources. Areas of disagreement related to competencies, assessments, and learning resources were identified, with evidence that the variation in rankings presented by participants was due to the unique contexts of different higher education programs. The research from this study contributes to positive social change by providing an emerging list of effective practices useful in developing programs that help students graduate sooner with both a degree and skill set relevant to employers and to their future personal satisfaction
Best Practices in High Fidelity Patient Simulation to Enhance Higher Order Thinking Skills
Undergraduate nursing education has begun to use very expensive and time intensive high fidelity simulation activities without making full use of the ability to build higher order thinking skills in students. Current research in high fidelity patient simulation has tended to be subjective and focus on critical thinking. However, reflective thinking habits of mind must be in place before full use can be made of critical thinking skills. A comprehensive search of all reflective thinking literature used in conjunction with simulated patient experiences by healthcare students was undertaken. A guideline
was created for nurse faculty to use that outlined current best practices in simulation to maximize reflective thinking. Though the research on which the guideline was based has been mainly subjective, several analytical studies were found that supported the findings. Policy changes to incorporate reflective thinking and the associated activities were recommended for nursing students and continuing nursing education. Nurse researchers and educators should incorporate reflective thinking exercises with their simulated patient undertakings to maximize higher order thinking skills
Teaching Non-Technological Skills for Successful Building Information Modeling (BIM) Projects
abstract: Implementing Building Information Modeling (BIM) in construction projects has many potential benefits, but issues of projects can hinder its realization in practice. Although BIM involves using the technology, more than four-fifths of the recurring issues in current BIM-based construction projects are related to the people and processes (i.e., the non-technological elements of BIM). Therefore, in addition to the technological skills required for using BIM, educators should also prepare university graduates with the non-technological skills required for managing the people and processes of BIM. This research’s objective is to develop a learning module that teaches the non-technological skills for addressing common, people- and process-related, issues in BIM-based construction projects. To achieve this objective, this research outlines the steps taken to create the learning module and identify its impact on a BIM course. The contribution of this research is in the understanding of the pedagogical value of the developed problem-based learning module and documenting the learning module’s development process.Dissertation/ThesisDoctoral Dissertation Civil, Environmental and Sustainable Engineering 201
Establishing Content Validity of an Evaluation Rubric for Mobile Technology Applications Utilizing the Delphi Method
Abstract
The purpose of this research study was to establish content validity for an evaluation tool designed to measure the quality of mobile technology applications (Apps) for use in education settings. The rubric evaluation tool was developed by the researcher based on a review of the literature and consultation with recognized experts in the use of mobile technologies in education. This Delphi study was conducted in collaboration with over 90 Subject Matter Experts (SMEs) from around the world who provided feedback electronically on the domains and score descriptors that comprise the tool developed for this investigation, The Evaluation Rubric for Mobile Apps. The findings resulted in strong content validity being established for the Evaluation Rubric for Mobile Apps. Data from participants were used to refine the domains and score descriptors resulting in an empirically validated, robust evaluation tool for educators to employ in their decision making processes related to the use of mobile technology Apps in education settings. At the school and district level, this rubric has implications to ensure that limited funds available for technology purchases are used in the most effective and efficient manner. On a broader scale, researchers examining technology Apps in schools can employ the rubric in empirical studies to examine the impact of using high quality Apps on teaching and learning
Simulation in medical education : a case study evaluating the efficacy of high-fidelity patient simulation
Indiana University-Purdue University Indianapolis (IUPUI)High-fidelity patient simulation (HFPS) recreates clinical scenarios by combining
mock patients and realistic environments to prepare learners with practical experience to
meet the demands of modern clinical practice while ensuring patient safety. This research
investigated the efficacy of HFPS in medical education through a case study of the
Indiana University Bloomington Interprofessional Simulation Center. The goal of this
research was to understand the role of simulated learning for attaining clinical selfefficacy
and how HFPS training impacts performance. Three research questions were
addressed to investigate HFPS in medical education using a mixed methods study design.
Clinical competence and self-efficacy were quantified among medical students at IUSMBloomington
utilizing HFPS compared to two IUSM campuses that did not incorporate
this instructional intervention. Clinical competence was measured as performance on the
Objective Structured Clinical Examination (OSCE), while self-efficacy of medical
students was measured through a validated questionnaire. Although the effect of HFPS
on quantitative results was not definitive, general trends allude to the ability of HFPS to
recalibrate learners’ perceived and actual performance. Additionally, perceptual data
regarding HFPS from both medical students and medical residents was analyzed.
Qualitative results discovered the utility of HFPS for obtaining the clinical mental
framework of a physician, fundamental psychomotor skills, and essential practice
communicating and functioning as a healthcare team during interprofessional education simulations. Continued studies of HFPS are necessary to fully elucidate the value of this
instructional adjunct, however positive outcomes of simulated learning on both medical
students and medical residents were discovered in this study contributing to the existing
HFPS literature
Identifying Issues for the Bright ICT Initiative: A Worldwide Delphi Study of IS Journal Editors and Scholars
Information and communication technology (ICT) continues to change business as we know it. As ICT further integrates into our daily lives, it creates more opportunities to both help and hinder fundamental social problems throughout the world. In response to these growing and urgent societal needs, the Association for Information Systems approved the Bright ICT Initiative to extend IS research beyond a focus on business to take on the broader challenges of an ICT-enabled bright society. We conducted a Delphi study to provide guidance on where bright ICT-minded researchers might focus to produce their greatest impact. In this paper, we report on our findings. The Delphi panel comprised 182 globally distributed IS journal editors who participated in a three-round consensus-building process via the Internet. Our results provide a framework of eleven research priority areas and specific research topics for those engaged in future-oriented, socially conscious IS research
Assessment of General Practitioners' Performance in Daily Practice
The EURACT Performance Agenda (EUPA) of the European Academy of Teachers in General Practice/Family Medicine (EURACT) is the third paper in a row following the European Definition of General Practice/Family Medicine (WONCA Europe) in 2002 which identified 6 core competencies and 11 abilities every general practitioner (GP) should master, and the EURACT Educational Agenda in 2005 which provided a framework to teach the core competencies by setting learning aims and monitoring their achievement. Performance (in contrast to competence) is understood as the level of actual performance in clinical care and communication with patients in daily practice. Small groups of EURACT Council members from 40 European countries have discussed and developed EUPA since 2007. EUPA is a general, uniform and basic agenda of performance elements every GP masters in daily practice, applicable and adaptable to different countries with different systems. It deals with the process and result of actual work in daily practice, not with a teaching/learning situation. EUPA discusses in depth the psychometrics and edumetrics of performance assessment. Case vignettes of abilities in GPs’ daily practice illustrate performance and its assessment in every chapter. Examples of common assessment tools are workplace-based assessment by a peer, feedback from patients or staff and audit of medical records. EUPA can help to shape various performance assessment activities held locally in general practice/family medicine, e. g. in continuing professional development cycles, re-certification/re-accreditation/licensing procedures, peer hospitation programmes and practice audit programmes in quality management. It can give orientation for self-assessment for reflective practitioners in their continuing professional development. The EURACT Performance Agenda (EUPA) encourages general practitioners to initialize performance agendas adapted to their national health system to further strengthen the role of general practice/family medicine in their country
- …