175 research outputs found
ChatGPT as a Factual Inconsistency Evaluator for Abstractive Text Summarization
The performance of abstractive text summarization has been greatly boosted by
pre-trained language models recently. The main concern of existing abstractive
summarization methods is the factual inconsistency problem of their generated
summary. To alleviate the problem, many efforts have focused on developing
effective factuality evaluation metrics based on natural language inference and
question answering et al. However, they have limitations of high computational
complexity and relying on annotated data. Most recently, large language models
such as ChatGPT have shown strong ability in not only natural language
understanding but also natural language inference. In this paper, we study the
factual inconsistency evaluation ability of ChatGPT under the zero-shot setting
by evaluating it on the coarse-grained and fine-grained factuality evaluation
tasks including binary natural language inference (NLI), summary ranking, and
consistency rating. Experimental results show that ChatGPT outperforms previous
SOTA evaluation metrics on 6/9 datasets across three tasks, demonstrating its
great potential for assessing factual inconsistency in the zero-shot setting.
The results also highlight the importance of prompt design and the need for
future efforts to address ChatGPT's limitations on evaluation bias, wrong
reasoning, and hallucination.Comment: ongoing work, 12 pages, 4 figure
LongDocFACTScore: Evaluating the Factuality of Long Document Abstractive Summarisation
Maintaining factual consistency is a critical issue in abstractive text
summarisation, however, it cannot be assessed by traditional automatic metrics
used for evaluating text summarisation, such as ROUGE scoring. Recent efforts
have been devoted to developing improved metrics for measuring factual
consistency using pre-trained language models, but these metrics have
restrictive token limits, and are therefore not suitable for evaluating long
document text summarisation. Moreover, there is limited research evaluating
whether existing automatic evaluation metrics are fit for purpose when applied
to long document data sets. In this work, we evaluate the efficacy of automatic
metrics at assessing factual consistency in long document text summarisation
and propose a new evaluation framework LongDocFACTScore. This framework allows
metrics to be extended to any length document. This framework outperforms
existing state-of-the-art metrics in its ability to correlate with human
measures of factuality when used to evaluate long document summarisation data
sets. Furthermore, we show LongDocFACTScore has performance comparable to
state-of-the-art metrics when evaluated against human measures of factual
consistency on short document data sets. We make our code and annotated data
publicly available: https://github.com/jbshp/LongDocFACTScore.Comment: 12 pages, 5 figure
A Survey on Biomedical Text Summarization with Pre-trained Language Model
The exponential growth of biomedical texts such as biomedical literature and
electronic health records (EHRs), provides a big challenge for clinicians and
researchers to access clinical information efficiently. To address the problem,
biomedical text summarization has been proposed to support clinical information
retrieval and management, aiming at generating concise summaries that distill
key information from single or multiple biomedical documents. In recent years,
pre-trained language models (PLMs) have been the de facto standard of various
natural language processing tasks in the general domain. Most recently, PLMs
have been further investigated in the biomedical field and brought new insights
into the biomedical text summarization task. In this paper, we systematically
summarize recent advances that explore PLMs for biomedical text summarization,
to help understand recent progress, challenges, and future directions. We
categorize PLMs-based approaches according to how they utilize PLMs and what
PLMs they use. We then review available datasets, recent approaches and
evaluation metrics of the task. We finally discuss existing challenges and
promising future directions. To facilitate the research community, we line up
open resources including available datasets, recent approaches, codes,
evaluation metrics, and the leaderboard in a public project:
https://github.com/KenZLuo/Biomedical-Text-Summarization-Survey/tree/master.Comment: 19 pages, 6 figures, TKDE under revie
MentalLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models
With the development of web technology, social media texts are becoming a
rich source for automatic mental health analysis. As traditional discriminative
methods bear the problem of low interpretability, the recent large language
models have been explored for interpretable mental health analysis on social
media, which aims to provide detailed explanations along with predictions. The
results show that ChatGPT can generate approaching-human explanations for its
correct classifications. However, LLMs still achieve unsatisfactory
classification performance in a zero-shot/few-shot manner. Domain-specific
finetuning is an effective solution, but faces 2 challenges: 1) lack of
high-quality training data. 2) no open-source LLMs for interpretable mental
health analysis were released to lower the finetuning cost. To alleviate these
problems, we build the first multi-task and multi-source interpretable mental
health instruction (IMHI) dataset on social media, with 105K data samples. The
raw social media data are collected from 10 existing sources covering 8 mental
health analysis tasks. We use expert-written few-shot prompts and collected
labels to prompt ChatGPT and obtain explanations from its responses. To ensure
the reliability of the explanations, we perform strict automatic and human
evaluations on the correctness, consistency, and quality of generated data.
Based on the IMHI dataset and LLaMA2 foundation models, we train MentalLLaMA,
the first open-source LLM series for interpretable mental health analysis with
instruction-following capability. We also evaluate the performance of
MentalLLaMA on the IMHI evaluation benchmark with 10 test sets, where their
correctness for making predictions and the quality of explanations are
examined. The results show that MentalLLaMA approaches state-of-the-art
discriminative methods in correctness and generates high-quality explanations.Comment: Work in progres
Recommended from our members
Television Viewing Time in Hong Kong Adult Population: Associations with Body Mass Index and Obesity
Background: Obesity is increasing dramatically in the Asia-Pacific region particularly China. The population of Hong Kong was exposed to modernization far earlier than the rest of China, reflecting conditions that are likely to be replicated as other Chinese cities undergo rapid change. This study examined the relationship between television viewing and obesity in a Hong Kong sample. Information about the relationship between a key sedentary behavior, TV viewing, and obesity, and its moderation by demographic characteristics may identify sectors of the population at highest risk for excess weight. Methods: Data were from Hong Kong Family and Health Information Trends Survey (2009–2010), a population-based survey on the public's use of media for health information and family communication by telephone interviews with 3,016 Hong Kong adults (age≥18 years). TV viewing time, body mass index (BMI), physical activity and other lifestyle variables were analyzed. Results: Viewing time was longer in women, increased with age but decreased with education level and vigorous physical activity (all P<0.01). Longer TV viewing time was significantly associated with higher BMI (Coefficients B = 0.17, 95% CI: 0.11, 0.24) after adjusting for age, gender, employment status, marital status, education level, smoking activity and vigorous physical activity. This association was stronger in women than men (Coefficients B: 0.19 versus 0.15) and strongest in those aged 18 to 34 years (Coefficients B = 0.35). Furthermore, an hour increase in daily TV viewing was associated with 10% greater odds of being obese. Conclusions: A significant socioeconomic gradient in television viewing time was observed. TV viewing time positively associated with BMI and obesity. The TV viewing – BMI associations were strongest in women and young adults, suggesting vulnerable groups to target for obesity prevention by decreasing TV viewing
Overview of the BioLaySumm 2023 Shared Task on Lay Summarization of Biomedical Research Articles
This paper presents the results of the shared task on Lay Summarisation of
Biomedical Research Articles (BioLaySumm), hosted at the BioNLP Workshop at ACL
2023. The goal of this shared task is to develop abstractive summarisation
models capable of generating "lay summaries" (i.e., summaries that are
comprehensible to non-technical audiences) in both a controllable and
non-controllable setting. There are two subtasks: 1) Lay Summarisation, where
the goal is for participants to build models for lay summary generation only,
given the full article text and the corresponding abstract as input; and 2)
Readability-controlled Summarisation, where the goal is for participants to
train models to generate both the technical abstract and the lay summary, given
an article's main text as input. In addition to overall results, we report on
the setup and insights from the BioLaySumm shared task, which attracted a total
of 20 participating teams across both subtasks.Comment: Published at BioNLP@ACL202
Deep learning based single image super-resolution : a survey
Single image super-resolution has attracted increasing attention and has a wide range of applications in satellite imaging, medical imaging, computer vision, security surveillance imaging, remote sensing, objection detection, and recognition. Recently, deep learning techniques have emerged and blossomed, producing “the state-of-the-art” in many domains. Due to their capability in feature extraction and mapping, it is very helpful to predict high-frequency details lost in low-resolution images. In this paper, we give an overview of recent advances in deep learning-based models and methods that have been applied to single image super-resolution tasks. We also summarize, compare and discuss various models from the past and present for comprehensive understanding and finally provide open problems and possible directions for future research
Identifying neuropsychiatric disorders in the Medicare Current Beneficiary Survey: the benefits of combining health survey and claims data
FoodWise: Food Waste Reduction and Behavior Change on Campus with Data Visualization and Gamification
Food waste presents a substantial challenge with significant environmental
and economic ramifications, and its severity on campus environments is of
particular concern. In response to this, we introduce FoodWise, a
dual-component system tailored to inspire and incentivize campus communities to
reduce food waste. The system consists of a data storytelling dashboard that
graphically displays food waste information from university canteens, coupled
with a mobile web application that encourages users to log their food waste
reduction actions and rewards active participants for their efforts.
Deployed during a two-week food-saving campaign at The Hong Kong University
of Science and Technology (HKUST) in March 2023, FoodWise engaged over 200
participants from the university community, resulting in the logging of over
800 daily food-saving actions. Feedback collected post-campaign underscores the
system's efficacy in elevating user consciousness about food waste and
prompting behavioral shifts towards a more sustainable campus. This paper also
provides insights for enhancing our system, contributing to a broader discourse
on sustainable campus initiatives
Strong structural and electronic coupling in metavalent PbS moire superlattices
Moire superlattices are twisted bilayer materials, in which the tunable
interlayer quantum confinement offers access to new physics and novel device
functionalities. Previously, moire superlattices were built exclusively using
materials with weak van der Waals interactions and synthesizing moire
superlattices with strong interlayer chemical bonding was considered to be
impractical. Here using lead sulfide (PbS) as an example, we report a strategy
for synthesizing of moire superlattices coupled by strong chemical bonding. We
use water-soluble ligands as a removable template to obtain free-standing
ultra-thin PbS nanosheets and assemble them into direct-contact bilayers with
various twist angles. Atomic-resolution imaging shows the moire periodic
structural reconstruction at superlattice interface, due to the strong
metavalent coupling. Electron energy loss spectroscopy and theoretical
calculations collectively reveal the twist angle26 dependent electronic
structure, especially the emergent separation of flat bands at small twist
angles. The localized states of flat bands are similar to well-arranged quantum
dots, promising an application in devices. This study opens a new door to the
exploration of deep energy modulations within moire superlattices alternative
to van der Waals twistronics
- …