4 research outputs found
User or Labor: An Interaction Framework for Human-Machine Relationships in NLP
The bridging research between Human-Computer Interaction and Natural Language
Processing is developing quickly these years. However, there is still a lack of
formative guidelines to understand the human-machine interaction in the NLP
loop. When researchers crossing the two fields talk about humans, they may
imply a user or labor. Regarding a human as a user, the human is in control,
and the machine is used as a tool to achieve the human's goals. Considering a
human as a laborer, the machine is in control, and the human is used as a
resource to achieve the machine's goals. Through a systematic literature review
and thematic analysis, we present an interaction framework for understanding
human-machine relationships in NLP. In the framework, we propose four types of
human-machine interactions: Human-Teacher and Machine-Learner, Machine-Leading,
Human-Leading, and Human-Machine Collaborators. Our analysis shows that the
type of interaction is not fixed but can change across tasks as the
relationship between the human and the machine develops. We also discuss the
implications of this framework for the future of NLP and human-machine
relationships
Annotation Imputation to Individualize Predictions: Initial Studies on Distribution Dynamics and Model Predictions
Annotating data via crowdsourcing is time-consuming and expensive. Owing to
these costs, dataset creators often have each annotator label only a small
subset of the data. This leads to sparse datasets with examples that are marked
by few annotators; if an annotator is not selected to label an example, their
opinion regarding it is lost. This is especially concerning for subjective NLP
datasets where there is no correct label: people may have different valid
opinions. Thus, we propose using imputation methods to restore the opinions of
all annotators for all examples, creating a dataset that does not leave out any
annotator's view. We then train and prompt models with data from the imputed
dataset (rather than the original sparse dataset) to make predictions about
majority and individual annotations. Unfortunately, the imputed data provided
by our baseline methods does not improve predictions. However, through our
analysis of it, we develop a strong understanding of how different imputation
methods impact the original data in order to inform future imputation
techniques. We make all of our code and data publicly available.Comment: 12 pages, 5 figure
Social and behavioral determinants of health in the era of artificial intelligence with electronic health records: A scoping review
Background: There is growing evidence that social and behavioral determinants
of health (SBDH) play a substantial effect in a wide range of health outcomes.
Electronic health records (EHRs) have been widely employed to conduct
observational studies in the age of artificial intelligence (AI). However,
there has been little research into how to make the most of SBDH information
from EHRs. Methods: A systematic search was conducted in six databases to find
relevant peer-reviewed publications that had recently been published. Relevance
was determined by screening and evaluating the articles. Based on selected
relevant studies, a methodological analysis of AI algorithms leveraging SBDH
information in EHR data was provided. Results: Our synthesis was driven by an
analysis of SBDH categories, the relationship between SBDH and
healthcare-related statuses, and several NLP approaches for extracting SDOH
from clinical literature. Discussion: The associations between SBDH and health
outcomes are complicated and diverse; several pathways may be involved. Using
Natural Language Processing (NLP) technology to support the extraction of SBDH
and other clinical ideas simplifies the identification and extraction of
essential concepts from clinical data, efficiently unlocks unstructured data,
and aids in the resolution of unstructured data-related issues. Conclusion:
Despite known associations between SBDH and disease, SBDH factors are rarely
investigated as interventions to improve patient outcomes. Gaining knowledge
about SBDH and how SBDH data can be collected from EHRs using NLP approaches
and predictive models improves the chances of influencing health policy change
for patient wellness, and ultimately promoting health and health equity.
Keywords: Social and Behavioral Determinants of Health, Artificial
Intelligence, Electronic Health Records, Natural Language Processing,
Predictive ModelComment: 32 pages, 5 figure
Everyone’s Voice Matters: Quantifying Annotation Disagreement Using Demographic Information
In NLP annotation, it is common to have multiple annotators label the text and then obtain the ground truth labels based on major annotators’ agreement. However, annotators are individuals with different backgrounds and various voices. When annotation tasks become subjective, such as detecting politeness, offense, and social norms, annotators’ voices differ and vary. Their diverse voices may represent the true distribution of people’s opinions on subjective matters. Therefore, it is crucial to study the disagreement from annotation to understand which content is controversial from the annotators. In our research, we extract disagreement labels from five subjective datasets, then fine-tune language models to predict annotators’ disagreement. Our results show that knowing annotators’ demographic information (e.g., gender, ethnicity, education level), in addition to the task text, helps predict the disagreement. To investigate the effect of annotators’ demographics on their disagreement level, we simulate different combinations of their artificial demographics and explore the variance of the prediction to distinguish the disagreement from the inherent controversy from text content and the disagreement in the annotators’ perspective. Overall, we propose an innovative disagreement prediction mechanism for better design of the annotation process that will achieve more accurate and inclusive results for NLP systems. Our code and dataset are publicly available