610 research outputs found
ACI-BENCH: a Novel Ambient Clinical Intelligence Dataset for Benchmarking Automatic Visit Note Generation
Recent immense breakthroughs in generative models such as in GPT4 have
precipitated re-imagined ubiquitous usage of these models in all applications.
One area that can benefit by improvements in artificial intelligence (AI) is
healthcare. The note generation task from doctor-patient encounters, and its
associated electronic medical record documentation, is one of the most arduous
time-consuming tasks for physicians. It is also a natural prime potential
beneficiary to advances in generative models. However with such advances,
benchmarking is more critical than ever. Whether studying model weaknesses or
developing new evaluation metrics, shared open datasets are an imperative part
of understanding the current state-of-the-art. Unfortunately as clinic
encounter conversations are not routinely recorded and are difficult to
ethically share due to patient confidentiality, there are no sufficiently large
clinic dialogue-note datasets to benchmark this task. Here we present the
Ambient Clinical Intelligence Benchmark (ACI-BENCH) corpus, the largest dataset
to date tackling the problem of AI-assisted note generation from visit
dialogue. We also present the benchmark performances of several common
state-of-the-art approaches
UMASS_BioNLP at MEDIQA-Chat 2023: Can LLMs generate high-quality synthetic note-oriented doctor-patient conversations?
This paper presents UMASS_BioNLP team participation in the MEDIQA-Chat 2023
shared task for Task-A and Task-C. We focus especially on Task-C and propose a
novel LLMs cooperation system named a doctor-patient loop to generate
high-quality conversation data sets. The experiment results demonstrate that
our approaches yield reasonable performance as evaluated by automatic metrics
such as ROUGE, medical concept recall, BLEU, and Self-BLEU. Furthermore, we
conducted a comparative analysis between our proposed method and ChatGPT and
GPT-4. This analysis also investigates the potential of utilizing cooperation
LLMs to generate high-quality datasets
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
In recent years, machine learning models have rapidly become better at
generating clinical consultation notes; yet, there is little work on how to
properly evaluate the generated consultation notes to understand the impact
they may have on both the clinician using them and the patient's clinical
safety. To address this we present an extensive human evaluation study of
consultation notes where 5 clinicians (i) listen to 57 mock consultations, (ii)
write their own notes, (iii) post-edit a number of automatically generated
notes, and (iv) extract all the errors, both quantitative and qualitative. We
then carry out a correlation study with 18 automatic quality metrics and the
human judgements. We find that a simple, character-based Levenshtein distance
metric performs on par if not better than common model-based metrics like
BertScore. All our findings and annotations are open-sourced.Comment: To be published in proceedings of ACL 202
Human Evaluation and Correlation with Automatic Metrics in Consultation Note Generation
The authors would like to thank Rachel Young and Tom Knoll for supporting the team and hiring the evaluators, Vitalii Zhelezniak for his advice on revising the paper, and Kristian Boda for helping to set up the Stanza+Snomed fact-extraction system.Publisher PD
Reflections on the nature of measurement in language-based automated assessments of patients' mental state and cognitive function
Modern advances in computational language processing methods have enabled new approaches to the measurement of mental processes. However, the field has primarily focused on model accuracy in predicting performance on a task or a diagnostic category. Instead the field should be more focused on determining which computational analyses align best with the targeted neurocognitive/psychological functions that we want to assess. In this paper we reflect on two decades of experience with the application of language-based assessment to patients' mental state and cognitive function by addressing the questions of what we are measuring, how it should be measured and why we are measuring the phenomena. We address the questions by advocating for a principled framework for aligning computational models to the constructs being assessed and the tasks being used, as well as defining how those constructs relate to patient clinical states. We further examine the assumptions that go into the computational models and the effects that model design decisions may have on the accuracy, bias and generalizability of models for assessing clinical states. Finally, we describe how this principled approach can further the goal of transitioning language-based computational assessments to part of clinical practice while gaining the trust of critical stakeholders
Reflections on the nature of measurement in language-based automated assessments of patients' mental state and cognitive function
Modern advances in computational language processing methods have enabled new approaches to the measurement of mental processes. However, the field has primarily focused on model accuracy in predicting performance on a task or a diagnostic category. Instead the field should be more focused on determining which
computational analyses align best with the targeted neurocognitive/psychological functions that we want to
assess. In this paper we reflect on two decades of experience with the application of language-based assessment
to patients' mental state and cognitive function by addressing the questions of what we are measuring, how it
should be measured and why we are measuring the phenomena. We address the questions by advocating for a
principled framework for aligning computational models to the constructs being assessed and the tasks being
used, as well as defining how those constructs relate to patient clinical states. We further examine the assumptions that go into the computational models and the effects that model design decisions may have on the
accuracy, bias and generalizability of models for assessing clinical states. Finally, we describe how this principled
approach can further the goal of transitioning language-based computational assessments to part of clinical
practice while gaining the trust of critical stakeholders
A Survey on Biomedical Text Summarization with Pre-trained Language Model
The exponential growth of biomedical texts such as biomedical literature and
electronic health records (EHRs), provides a big challenge for clinicians and
researchers to access clinical information efficiently. To address the problem,
biomedical text summarization has been proposed to support clinical information
retrieval and management, aiming at generating concise summaries that distill
key information from single or multiple biomedical documents. In recent years,
pre-trained language models (PLMs) have been the de facto standard of various
natural language processing tasks in the general domain. Most recently, PLMs
have been further investigated in the biomedical field and brought new insights
into the biomedical text summarization task. In this paper, we systematically
summarize recent advances that explore PLMs for biomedical text summarization,
to help understand recent progress, challenges, and future directions. We
categorize PLMs-based approaches according to how they utilize PLMs and what
PLMs they use. We then review available datasets, recent approaches and
evaluation metrics of the task. We finally discuss existing challenges and
promising future directions. To facilitate the research community, we line up
open resources including available datasets, recent approaches, codes,
evaluation metrics, and the leaderboard in a public project:
https://github.com/KenZLuo/Biomedical-Text-Summarization-Survey/tree/master.Comment: 19 pages, 6 figures, TKDE under revie
A Survey on Semantic Processing Techniques
Semantic processing is a fundamental research domain in computational
linguistics. In the era of powerful pre-trained language models and large
language models, the advancement of research in this domain appears to be
decelerating. However, the study of semantics is multi-dimensional in
linguistics. The research depth and breadth of computational semantic
processing can be largely improved with new technologies. In this survey, we
analyzed five semantic processing tasks, e.g., word sense disambiguation,
anaphora resolution, named entity recognition, concept extraction, and
subjectivity detection. We study relevant theoretical research in these fields,
advanced methods, and downstream applications. We connect the surveyed tasks
with downstream applications because this may inspire future scholars to fuse
these low-level semantic processing tasks with high-level natural language
processing tasks. The review of theoretical research may also inspire new tasks
and technologies in the semantic processing domain. Finally, we compare the
different semantic processing techniques and summarize their technical trends,
application trends, and future directions.Comment: Published at Information Fusion, Volume 101, 2024, 101988, ISSN
1566-2535. The equal contribution mark is missed in the published version due
to the publication policies. Please contact Prof. Erik Cambria for detail
Proceedings of the Seventh Italian Conference on Computational Linguistics CLiC-it 2020
On behalf of the Program Committee, a very warm welcome to the Seventh Italian Conference on Computational Linguistics (CLiC-it 2020). This edition of the conference is held in Bologna and organised by the University of Bologna. The CLiC-it conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after six years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
- …