Search CORE

35 research outputs found

Evaluating NLG Evaluation Metrics: A Measurement Theory Perspective

Author: Lai Vivian
Liao Q. Vera
Xiao Ziang
Zhang Susu
Publication venue
Publication date: 24/05/2023
Field of study

We address the fundamental challenge in Natural Language Generation (NLG) model evaluation, the design and validation of evaluation metrics. Recognizing the limitations of existing metrics and issues with human judgment, we propose using measurement theory, the foundation of test design, as a framework for conceptualizing and evaluating the validity and reliability of NLG evaluation metrics. This approach offers a systematic method for defining "good" metrics, developing robust metrics, and assessing metric performance. In this paper, we introduce core concepts in measurement theory in the context of NLG evaluation and key methods to evaluate the performance of NLG metrics. Through this framework, we aim to promote the design, evaluation, and interpretation of valid and reliable metrics, ultimately contributing to the advancement of robust and effective NLG models in real-world settings

arXiv.org e-Print Archive

Supporting Qualitative Analysis with Large Language Models: Combining Codebook with GPT-3 for Deductive Coding

Author: Abdelghani Rania
Liao Q. Vera
Oudeyer Pierre-Yves
Xiao Ziang
Yuan Xingdi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/03/2023
Field of study

Qualitative analysis of textual contents unpacks rich and valuable information by assigning labels to the data. However, this process is often labor-intensive, particularly when working with large datasets. While recent AI-based tools demonstrate utility, researchers may not have readily available AI resources and expertise, let alone be challenged by the limited generalizability of those task-specific models. In this study, we explored the use of large language models (LLMs) in supporting deductive coding, a major category of qualitative analysis where researchers use pre-determined codebooks to label the data into a fixed set of codes. Instead of training task-specific models, a pre-trained LLM could be used directly for various tasks without fine-tuning through prompt learning. Using a curiosity-driven questions coding task as a case study, we found, by combining GPT-3 with expert-drafted codebooks, our proposed approach achieved fair to substantial agreements with expert-coded results. We lay out challenges and opportunities in using LLMs to support qualitative coding and beyond.Comment: 28th International Conference on Intelligent User Interfaces (IUI '23 Companion), March 27--31, 2023, Sydney, NSW, Australi

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

ByteSized32: A Corpus and Challenge Task for Generating Task-Specific World Models Expressed as Text Games

Author: Côté Marc-Alexandre
Jansen Peter
Todd Graham
Wang Ruoyao
Xiao Ziang
Yuan Eric
Publication venue
Publication date: 24/05/2023
Field of study

In this work we examine the ability of language models to generate explicit world models of scientific and common-sense reasoning tasks by framing this as a problem of generating text-based games. To support this, we introduce ByteSized32, a corpus of 32 highly-templated text games written in Python totaling 24k lines of code, each centered around a particular task, and paired with a set of 16 unseen text game specifications for evaluation. We propose a suite of automatic and manual metrics for assessing simulation validity, compliance with task specifications, playability, winnability, and alignment with the physical world. In a single-shot evaluation of GPT-4 on this simulation-as-code-generation task, we find it capable of producing runnable games in 27% of cases, highlighting the difficulty of this challenge task. We discuss areas of future improvement, including GPT-4's apparent capacity to perform well at simulating near canonical task solutions, with performance dropping off as simulations include distractors or deviate from canonical solutions in the action space.Comment: 10 page

arXiv.org e-Print Archive

Prevalence of insomnia symptoms and their associated factors in patients treated in outpatient clinics of four general hospitals in Guangzhou, China

Author: Dai Qing
Ke Xiao-Yin
Li Hai-Yan
Luo Xin-Ni
Ng Chee H
Ning Yu-Ping
Ungvari Gabor S
Zhang Chan-Juan
Zheng Wei
Ziang Yu-Tao
Publication venue: ResearchOnline@ND
Publication date: 01/01/2018
Field of study

Background: Data on the prevalence of insomnia symptoms in medical outpatient clinics in China are lacking. This study examined the prevalence of insomnia symptoms and their socio-demographic correlates in patients treated at medical outpatient clinics affiliated with four general hospitals in Guangzhou, a large metropolis in southern China. Method: A total of 4399 patients were consecutively invited to participate in the study. Data on insomnia and its socio-demographic correlates were collected with standardized questionnaires. Results: The prevalence of any type of insomnia symptoms was 22.1% (95% confidence interval (CI): 20.9–23.3%); the prevalence of difficulty initiating sleep was 14.3%, difficulty maintaining sleep was 16.2%, and early morning awakening was 12.4%. Only 17.5% of the patients suffering from insomnia received sleeping pills. Multiple logistic regression analysis revealed that male gender, education level, rural residence, and being unemployed or retired were negatively associated with insomnia symptoms, while lacking health insurance, older age and more severe depressive symptoms were positively associated with insomnia symptoms. Conclusions: Insomnia symptoms are common in patients attending medical outpatient clinics in Guangzhou. Increasing awareness of sleep hygiene measures, regular screening and psychosocial and pharmacological interventions for insomnia are needed in China. Trial registration: ChiCTR-INR-16008066. Registered 8 March 2016

ResearchOnline@ND

Directory of Open Access Journals

University of Melbourne Institutional Repository

FigShare

If I Hear You Correctly: Building and Evaluating Interview Chatbots with Active Listening Skills

Author: Araki Jun
Bauer Christine
Blei David M
Bordes Antoine
Cer Daniel
Decker Bert
DeVault David
Devlin Jacob
Friedman Jerome
Gebhard Patrick
Gordon Thomas
Grech Matt
Gupta Sambhav
Hu Zhiting
Jones Douglas
Keppel Geoffrey
Liu Shixia
Louw Stephen
McCord Michael C
Rehrek Radim
Rogers Carl R
Traum David
Wang Guoyin
Xiao Ziang
Zhou Hao
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/02/2020
Field of study

Interview chatbots engage users in a text-based conversation to draw out their views and opinions. It is, however, challenging to build effective interview chatbots that can handle user free-text responses to open-ended questions and deliver engaging user experience. As the first step, we are investigating the feasibility and effectiveness of using publicly available, practical AI technologies to build effective interview chatbots. To demonstrate feasibility, we built a prototype scoped to enable interview chatbots with a subset of active listening skills - the abilities to comprehend a user's input and respond properly. To evaluate the effectiveness of our prototype, we compared the performance of interview chatbots with or without active listening skills on four common interview topics in a live evaluation with 206 users. Our work presents practical design implications for building effective interview chatbots, hybrid chatbot platforms, and empathetic chatbots beyond interview tasks.Comment: Working draft. To appear in the ACM CHI Conference on Human Factors in Computing Systems (CHI 2020

arXiv.org e-Print Archive

Crossref

The Ninth Visual Object Tracking VOT2021 Challenge Results

Author: Abdelpakey Mohamed
Bhat Goutam
Cerkezi Llukman
Cevikalp Hakan
Chen Shengyong
Chen Xin
Cheng Miao
Cheng Ziyi
Cirakman Ozgun
Cui Yutao
Dai Kenan
Danelljan Martin
Deng Qili
Dong Xingping
Drbohlav Ondrej
Du Daniel K.
Dunnhofer Matteo
Felsberg Michael
Feng Zhen-Hua
Feng Zhiyong
Fernández Gustavo
Fu Zhihong
Ge Shiming
Gorthi Rama Krishna
Gu Yuzhang
Gunsel Bilge
Guo Qing
Gurkan Filiz
Han Wencheng
Huang Yanyan
Häger Gustav
Jhang Shang-Jhih
Ji Rongrong
Jiang Cheng
Jiang Yingjie
Jin Chang Hyung
Juefei-Xu Felix
Jun Yin
Ke Xiao
Khan Fahad Shahbaz
Kim Byeong Hak
Kittler Josef
Kristan Matej
Kämäräinen Joni
Käpylä Jani
Lan Xiangyuan
Lawin Felix Järemo
Lee Jun Ha
Leibe Bastian
Leonardis Aleš
Li Hui
Li Jianhua
Li Xianxian
Li Yuezhou
Liu Bo
Liu Chang
Liu Jingen
Liu Li
Liu Qingjie
Lu Huchuan
Lu Wei
Luiten Jonathon
Lukežič Alan
Ma Jie
Ma Ziang
Martinel Niki
Matas Jiri
Mayer Christoph
Memarmoghadam Alireza
Micheloni Christian
Murali Dasari Mohana
Niu Yuzhen
Paudel Danda
Peng Houwen
Pflugfelder Roman
Qiu Shoumeng
Rajiv Aravindh
Rana Muhammad
Robinson Andreas
Saribas Hasan
Shao Ling
Shehata Mohamed
Shen Furao
Shen Jianbing
Simonato Kristian
Song Xiaoning
Tang Zhangyong
Timofte Radu
Torr Philip
Tsai Chi-Yi
Uzun Bedirhan
Van Gool Luc
Voigtlaender Paul
Wang Dong
Wang Guangting
Wang Liangliang
Wang Lijun
Wang Limin
Wang Linyuan
Wang Yong
Wang Yunhong
Wu Chenyan
Wu Gangshan
Wu Xiao-Jun
Xie Fei
Xu Tianyang
Xu Xiang
Xue Wanli
Yan Bin
Yan Song
Yang Jinyu
Yang Wankou
Yang Xiaoyun
Ye Yu
Yin Jun
Zhang Chengwei
Zhang Chunhui
Zhang Haitao
Zhang Kaihua
Zhang Kangkai
Zhang Xiaohan
Zhang Xiaolin
Zhang Xinyu
Zhang Zhibin
Zhang Zhongqun
Zhao Shaochuan
Zhen Ming
Zhong Bineng
Zhu Jiawen
Zhu Xue-Feng
Čehovin Zajc Luka
Publication venue
Publication date: 01/01/2021
Field of study

acceptedVersionPeer reviewe

Trepo - Institutional Repository of Tampere University

What should I Ask: A Knowledge-driven Approach for Follow-up Questions Generation in Conversational Surveys

Author: Diesner Jana
Ge Yubin
Ji Heng
Karahalios Karrie
Sundaram Hari
Xiao Ziang
Publication venue
Publication date: 22/05/2022
Field of study

Conversational surveys, where an agent asks open-ended questions through natural language interfaces, offer a new way to collect information from people. A good follow-up question in a conversational survey prompts high-quality information and delivers engaging experiences. However, generating high-quality follow-up questions on the fly is a non-trivial task. The agent needs to understand the diverse and complex participant responses, adhere to the survey goal, and generate clear and coherent questions. In this study, we propose a knowledge-driven follow-up question generation framework. The framework combines a knowledge selection module to identify salient topics in participants' responses and a generative model guided by selected knowledge entity-relation pairs. To investigate the effectiveness of the proposed framework, we build a new dataset for open-domain follow-up question generation and present a new set of reference-free evaluation metrics based on Gricean Maxim. Our experiments demonstrate that our framework outperforms a GPT-based baseline in both objective evaluation and human-expert evaluation

arXiv.org e-Print Archive

An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval

Author: Beiji Zou
Chengzhang Zhu
Han Wang
Meng Zeng
Yalong Xiao
Ziang Fan
Zixi Liu
Publication venue: 'MDPI AG'
Publication date: 01/02/2023
Field of study

In medical services, the amount of data generated by medical devices is increasing explosively, and access to medical data is also put forward with higher requirements. Although HBase-based medical data storage solutions exist, they cannot meet the needs of fast locating and diversified access to medical data. In order to improve the retrieval speed, the recognition model S-TCR and the dynamic management algorithm SL-TCR, based on the behavior characteristics of access, were proposed to identify the frequently accessed hot data and dynamically manage the data storage medium as to maximize the system access performance. In order to improve the search performance of keys, an optimized secondary index strategy was proposed to reduce I/O overhead and optimize the search performance of non-primary key indexes. Comparative experiments were conducted on real medical data sets. The experimental results show that the optimized retrieval model can meet the needs of hot data access and diversified medical data retrieval

Directory of Open Access Journals

Insufficient Fruit and Vegetable Intake and Low Potassium Intake Aggravate Early Renal Damage in Children: A Longitudinal Study

Author: Huidi Xiao
Jiawulan Zunong
Lifang Gao
Menglong Li
Nubiya Amaerjiang
Sten H. Vermund
Yifei Hu
Ziang Li
Publication venue: 'MDPI AG'
Publication date: 01/03/2022
Field of study

Insufficient fruit and vegetable intake (FVI) and low potassium intake are associated with many non-communicable diseases, but the association with early renal damage in children is uncertain. We aimed to identify the associations of early renal damage with insufficient FVI and daily potassium intake in a general pediatric population. We conducted four waves of urine assays based on our child cohort (PROC) study from October 2018 to November 2019 in Beijing, China. We investigated FVI and other lifestyle status via questionnaire surveys and measured urinary potassium, β2-microglobulin (β2-MG), and microalbumin (MA) excretion to assess daily potassium intake and renal damage among 1914 primary school children. The prevalence of insufficient FVI (<4/d) was 48.6% (95% CI: 46.4%, 50.9%) and the estimated potassium intake at baseline was 1.63 ± 0.48 g/d. Short sleep duration, long screen time, lower estimated potassium intake, higher β2-MG and MA excretion were significantly more frequent in the insufficient FVI group. We generated linear mixed effects models and observed the bivariate associations of urinary β2-MG and MA excretion with insufficient FVI (β = 0.012, 95% CI: 0.005, 0.020; β = 0.717, 95% CI: 0.075, 1.359), and estimated potassium intake (β = −0.042, 95% CI: −0.052, −0.033; β = −1.778, 95% CI: −2.600, −0.956), respectively; after adjusting for age, sex, BMI, SBP, sleep duration, screen time and physical activity. In multivariate models, we observed that urinary β2-MG excretion increased with insufficient FVI (β = 0.011, 95% CI: 0.004, 0.018) and insufficient potassium intake (<1.5 g/d) (β = 0.031, 95% CI: 0.023, 0.038); and urinary MA excretion increased with insufficient FVI (β = 0.658, 95% CI: 0.017, 1.299) and insufficient potassium intake (β = 1.185, 95% CI: 0.492, 1.878). We visualized different quartiles of potassium intake showing different renal damage with insufficient FVI for interpretation and validation of the findings. Insufficient FVI and low potassium intake aggravate early renal damage in children and underscores that healthy lifestyles, especially adequate FVI, should be advocated

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

PubMed Central