4 research outputs found

    User or Labor: An Interaction Framework for Human-Machine Relationships in NLP

    Full text link
    The bridging research between Human-Computer Interaction and Natural Language Processing is developing quickly these years. However, there is still a lack of formative guidelines to understand the human-machine interaction in the NLP loop. When researchers crossing the two fields talk about humans, they may imply a user or labor. Regarding a human as a user, the human is in control, and the machine is used as a tool to achieve the human's goals. Considering a human as a laborer, the machine is in control, and the human is used as a resource to achieve the machine's goals. Through a systematic literature review and thematic analysis, we present an interaction framework for understanding human-machine relationships in NLP. In the framework, we propose four types of human-machine interactions: Human-Teacher and Machine-Learner, Machine-Leading, Human-Leading, and Human-Machine Collaborators. Our analysis shows that the type of interaction is not fixed but can change across tasks as the relationship between the human and the machine develops. We also discuss the implications of this framework for the future of NLP and human-machine relationships

    Annotation Imputation to Individualize Predictions: Initial Studies on Distribution Dynamics and Model Predictions

    Full text link
    Annotating data via crowdsourcing is time-consuming and expensive. Owing to these costs, dataset creators often have each annotator label only a small subset of the data. This leads to sparse datasets with examples that are marked by few annotators; if an annotator is not selected to label an example, their opinion regarding it is lost. This is especially concerning for subjective NLP datasets where there is no correct label: people may have different valid opinions. Thus, we propose using imputation methods to restore the opinions of all annotators for all examples, creating a dataset that does not leave out any annotator's view. We then train and prompt models with data from the imputed dataset (rather than the original sparse dataset) to make predictions about majority and individual annotations. Unfortunately, the imputed data provided by our baseline methods does not improve predictions. However, through our analysis of it, we develop a strong understanding of how different imputation methods impact the original data in order to inform future imputation techniques. We make all of our code and data publicly available.Comment: 12 pages, 5 figure

    Social and behavioral determinants of health in the era of artificial intelligence with electronic health records: A scoping review

    Full text link
    Background: There is growing evidence that social and behavioral determinants of health (SBDH) play a substantial effect in a wide range of health outcomes. Electronic health records (EHRs) have been widely employed to conduct observational studies in the age of artificial intelligence (AI). However, there has been little research into how to make the most of SBDH information from EHRs. Methods: A systematic search was conducted in six databases to find relevant peer-reviewed publications that had recently been published. Relevance was determined by screening and evaluating the articles. Based on selected relevant studies, a methodological analysis of AI algorithms leveraging SBDH information in EHR data was provided. Results: Our synthesis was driven by an analysis of SBDH categories, the relationship between SBDH and healthcare-related statuses, and several NLP approaches for extracting SDOH from clinical literature. Discussion: The associations between SBDH and health outcomes are complicated and diverse; several pathways may be involved. Using Natural Language Processing (NLP) technology to support the extraction of SBDH and other clinical ideas simplifies the identification and extraction of essential concepts from clinical data, efficiently unlocks unstructured data, and aids in the resolution of unstructured data-related issues. Conclusion: Despite known associations between SBDH and disease, SBDH factors are rarely investigated as interventions to improve patient outcomes. Gaining knowledge about SBDH and how SBDH data can be collected from EHRs using NLP approaches and predictive models improves the chances of influencing health policy change for patient wellness, and ultimately promoting health and health equity. Keywords: Social and Behavioral Determinants of Health, Artificial Intelligence, Electronic Health Records, Natural Language Processing, Predictive ModelComment: 32 pages, 5 figure

    Everyone’s Voice Matters: Quantifying Annotation Disagreement Using Demographic Information

    No full text
    In NLP annotation, it is common to have multiple annotators label the text and then obtain the ground truth labels based on major annotators’ agreement. However, annotators are individuals with different backgrounds and various voices. When annotation tasks become subjective, such as detecting politeness, offense, and social norms, annotators’ voices differ and vary. Their diverse voices may represent the true distribution of people’s opinions on subjective matters. Therefore, it is crucial to study the disagreement from annotation to understand which content is controversial from the annotators. In our research, we extract disagreement labels from five subjective datasets, then fine-tune language models to predict annotators’ disagreement. Our results show that knowing annotators’ demographic information (e.g., gender, ethnicity, education level), in addition to the task text, helps predict the disagreement. To investigate the effect of annotators’ demographics on their disagreement level, we simulate different combinations of their artificial demographics and explore the variance of the prediction to distinguish the disagreement from the inherent controversy from text content and the disagreement in the annotators’ perspective. Overall, we propose an innovative disagreement prediction mechanism for better design of the annotation process that will achieve more accurate and inclusive results for NLP systems. Our code and dataset are publicly available
    corecore