Search CORE

22 research outputs found

Human evaluation and statistical analyses on machine reading comprehension, question generation and open-domain dialogue

Author: Ji Tianbo
Publication venue: Dublin City University. ADAPT
Publication date: 01/11/2022
Field of study

Evaluation is a critical element in the development process of many natural language based systems. In this thesis, we will present critical analyses of standard evaluation methodologies applied in the following Natural Language Processing (NLP) domains: machine reading comprehension (MRC), question generation (QG), and open-domain dialogue. Generally speaking, systems from tasks like MRC are usually evaluated by comparing the similarity between hand-crafted references and system generated outputs using automatic evaluation metrics, thus these metrics are mainly borrowed from other NLP tasks that have been well-developed, such as machine translation and text summarization. Meanwhile, the evaluation of QG and dialogues is even a known open problem as such tasks do not have the corresponding references for computing the similarity, and human evaluation is indispensable when assessing the performance of the systems from these tasks. However, human evaluation is unfortunately not always valid because: i) human evaluation may cost too much and be hard to deploy when experts are involved; ii) human assessors can lack reliability in the crowd-sourcing environment. To overcome the challenges from both automatic metrics and human evaluation, we first design specific crowdsourcing human evaluation methods for these three target tasks, respectively. We then show that these human evaluation methods are reproducible, highly reliable, easy to deploy, and cost-effective. Additionally, with the data collected from our experiments, we measure the accuracy of existing automatic metrics and analyse the potential limitations and disadvantages of the direct application of these metrics. Furthermore, in allusion to the specific features of different tasks, we provide detailed statistical analyses on the collected data to discover their underlying trends, and further give suggestions about the directions to improving systems on different aspects

DCU Online Research Access Service

QAScore -- An Unsupervised Unreferenced Metric for the Question Generation Evaluation

Author: Graham Yvette
Ji Tianbo
Jones Gareth
Lyu Chenyang
Zhou Liting
Publication venue: 'MDPI AG'
Publication date: 09/10/2022
Field of study

Question Generation (QG) aims to automate the task of composing questions for a passage with a set of chosen answers found within the passage. In recent years, the introduction of neural generation models has resulted in substantial improvements of automatically generated questions in terms of quality, especially compared to traditional approaches that employ manually crafted heuristics. However, the metrics commonly applied in QG evaluations have been criticized for their low agreement with human judgement. We therefore propose a new reference-free evaluation metric that has the potential to provide a better mechanism for evaluating QG systems, called QAScore. Instead of fine-tuning a language model to maximize its correlation with human judgements, QAScore evaluates a question by computing the cross entropy according to the probability that the language model can correctly generate the masked words in the answer to that question. Furthermore, we conduct a new crowd-sourcing human evaluation experiment for the QG evaluation to investigate how QAScore and other metrics can correlate with human judgements. Experiments show that QAScore obtains a stronger correlation with the results of our proposed human evaluation method compared to existing traditional word-overlap-based metrics such as BLEU and ROUGE, as well as the existing pretrained-model-based metric BERTScore.Comment: 19 pages, 5 figures, 7 table

arXiv.org e-Print Archive

Document-Level Machine Translation with Large Language Models

Author: Ji Tianbo
Lyu Chenyang
Shi Shuming
Tu Zhaopeng
Wang Longyue
Yu Dian
Zhang Zhirui
Publication venue
Publication date: 04/04/2023
Field of study

Large language models (LLMs) such as Chat-GPT can produce coherent, cohesive, relevant, and fluent answers for various natural language processing (NLP) tasks. Taking document-level machine translation (MT) as a testbed, this paper provides an in-depth evaluation of LLMs' ability on discourse modeling. The study fo-cuses on three aspects: 1) Effects of Discourse-Aware Prompts, where we investigate the impact of different prompts on document-level translation quality and discourse phenomena; 2) Comparison of Translation Models, where we compare the translation performance of Chat-GPT with commercial MT systems and advanced document-level MT methods; 3) Analysis of Discourse Modelling Abilities, where we further probe discourse knowledge encoded in LLMs and examine the impact of training techniques on discourse modeling. By evaluating a number of benchmarks, we surprisingly find that 1) leveraging their powerful long-text mod-eling capabilities, ChatGPT outperforms commercial MT systems in terms of human evaluation. 2) GPT-4 demonstrates a strong ability to explain discourse knowledge, even through it may select incorrect translation candidates in contrastive testing. 3) ChatGPT and GPT-4 have demonstrated superior performance and show potential to become a new and promising paradigm for document-level translation. This work highlights the challenges and opportunities of discourse modeling for LLMs, which we hope can inspire the future design and evaluation of LLMs

arXiv.org e-Print Archive

Semantic-aware dynamic retrospective-prospective reasoning for event-level video question answering

Author: Foster Jennifer
Graham Yvette
Ji Tianbo
Lyu Chenyang
Publication venue: Association for Computational Linguistics
Publication date: 01/07/2023
Field of study

Event-Level Video Question Answering (EVQA) requires complex reasoning across video events to obtain the visual information needed to provide optimal answers. However, despite significant progress in model performance, few studies have focused on using the explicit semantic connections between the question and visual information especially at the event level. There is need for using such semantic connections to facilitate complex reasoning across video frames. Therefore, we propose a semantic-aware dynamic retrospective-prospective reasoning approach for video-based question answering. Specifically, we explicitly use the Semantic Role Labeling (SRL) structure of the question in the dynamic reasoning process where we decide to move to the next frame based on which part of the SRL structure (agent, verb, patient, etc.) of the question is being focused on. We conduct experiments on a benchmark EVQA dataset - TrafficQA. Results show that our proposed approach achieves superior performance compared to previous state-of-the-art models. Our code is publicly available at https://github.com/lyuchenyang/Semantic-aware-VideoQA}

DCU Online Research Access Service

Is a video worth n × n Images? A highly efficient approach to transformer-based video question answering

Author: Foster Jennifer
Graham Yvette
Ji Tianbo
Lyu Chenyang
Publication venue: Association for Computational Linguistics (ACL)
Publication date: 12/07/2023
Field of study

Conventional Transformer-based Video Question Answering (VideoQA) approaches generally encode frames independently through one or more image encoders followed by interaction between frames and question. However, such schema incur significant memory use and inevitably slow down the training and inference speed. In this work, we present a highly efficient approach for VideoQA based on existing vision-language pre-trained models where we concatenate video frames to a n × n matrix and then convert it to one image. By doing so, we reduce the use of the image encoder from n 2 to 1 while maintaining the temporal structure of the original video. Experimental results on MSRVTT and TrafficQA show that our proposed approach achieves state-of-theart performance with nearly 4× faster speed and only 30% memory use. We show that by integrating our approach into VideoQA systems we can achieve comparable, even superior, performance with a significant speed up for training and inference. We believe the proposed approach can facilitate VideoQA-related research by reducing the computational requirements for those who have limited access to budgets and resources. Our code is publicly available at https://github.com/lyuchenyang/ Efficient-VideoQA for research use

DCU Online Research Access Service

Azimuthal asymmetries in lepton-pair production at a fixed-target experiment using the LHC beams (AFTER)

Author: A. Airapetian
A. Airapetian
A. Airapetian
A. Airapetian
A. Bacchetta
A. Bacchetta
A. Bianconi
A. Brandenburg
A. Brandenburg
A. Bravar
A. Bravar
A. Sissakian
A.N. Sissakian
A.V. Belitsky
A.V. Efremov
A.V. Efremov
A.V. Efremov
B. Pasquini
B. Pasquini
B. Zhang
B. Zhang
B.-Q. Ma
B.-Q. Ma
B.-Q. Ma
B.-Q. Ma
B.-Q. Ma
B.-Q. Ma
Bo-Qiang Ma
C. Boros
C.G. Callan
C.J. Bomhof
C.S. Lam
D. Boer
D. Boer
D. Boer
D. Boer
D. Boer
D.L. Adams
D.L. Adams
D.L. Adams
D.L. Adams
D.L. Adams
D.W. Sivers
D.W. Sivers
E. Mirkes
E.L. Berger
E.L. Berger
E.P. Wigner
E.S. Ageev
F. Yuan
F. Yuan
G.L. Kane
H. Avakian
H. Avakian
H.J. Melosh
I. Schmidt
J. Collins
J. She
J. Zhu
J. Zhu
J. Zhu
J.C. Collins
J.C. Collins
J.C. Collins
J.C. Collins
J.C. Collins
J.C. Collins
J.G. Heinrich
J.S. Conway
K.J. Eskola
L.P. Gamberg
L.Y. Zhu
L.Y. Zhu
M. Anselmino
M. Anselmino
M. Anselmino
M. Anselmino
M. Anselmino
M. Anselmino
M. Blazek
M. Burkardt
M. Diefenthaler
M. Guanziroli
M. Göckeler
M.G. Alekseev
P.J. Mulders
R.K. Ellis
R.N. Cahn
S. Arnold
S. Boffi
S. Falciano
S.J. Brodsky
S.J. Brodsky
T.C. Rogers
Tianbo Liu
U. D’Alesio
V. Barone
V. Barone
V. Barone
V. Barone
V. Barone
V.Y. Alexakhin
W. Vogelsang
W. Vogelsang
W.J. Stirling
X. Ji
X. Qian
X.-d. Ji
X.-d. Ji
Z. Lu
Z. Lu
Z. Lu
Z. Lu
Z. Lu
Z. Lu
Z. Lu
Z. Lu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2012
Field of study

A multi-purpose fixed-target experiment using the proton and lead-ion beams of the LHC was recently proposed by Brodsky, Fleuret, Hadjidakis and Lansberg, and here we concentrate our study on some issues related to the spin physics part of this project (referred to as AFTER). We study the nucleon spin structure through

pp

and

pd

processes with a fixed-target experiment using the LHC proton beams, for the kinematical region with 7 TeV proton beams at the energy in center-of-mass frame of two nucleons

\sqrt{s}=115

GeV. We calculate and estimate the

\cos2\phi

azimuthal asymmetries of unpolarized

pp

and

pd

dilepton production processes in the Drell--Yan continuum region and at the

Z

-pole. We also calculate the

\sin(2\phi-\phi_S)

\sin(2\phi+\phi_S)

and

\sin2\phi

azimuthal asymmetries of

pp

and

pd

dilepton production processes with the target proton and deuteron longitudinally or transversally polarized in the Drell--Yan continuum region and around

Z

resonances region. We conclude that it is feasible to measure these azimuthal asymmetries, consequently the three-dimensional or transverse momentum dependent parton distribution functions (3dPDFs or TMDs), at this new AFTER facility.Comment: 15 pages, 40 figures. Version accepted for publication in EPJ

arXiv.org e-Print Archive

Crossref

EDP Sciences OAI-PMH repository (1.2.0)

B_c meson rare decays in the light-cone quark model

Author: A. Faessler
A.J. Buras
A.P. Szczepaniak
B. Aubert
B. Aubert
B. Aubert
B. Grinstein
B.-Q. Ma
B.-Q. Ma
B.-Q. Ma
B.-Q. Ma
B.-W. Xiao
B.-W. Xiao
B.-W. Xiao
Bo-Qiang Ma
C. Amsler
C.H. Chang
C.H. Chang
C.H. Chang
C.H. Chang
C.H. Chang
C.H. Chang
C.H. Chang
C.Q. Geng
C.Q. Geng
C.R. Ji
C.S. Lim
D. Besson
D.S. Du
D.S. Du
Da-Xin Zhang
E. Wigner
G. Burdman
G. West
G.P. Lepage
H.J. Melosh
H.M. Choi
H.M. Choi
H.N. Li
H.Y. Cheng
H.Y. Cheng
H.Y. Cheng
H.Y. Cheng
J.H. Yu
J.P. Ma
K. Azizi
M. Wirbel
M.A. Nowak
M.S. Alam
N. Ghahramany
N.G. Deshpande
R. Louvot
S. Godfrey
S. Godfrey
S.D. Drell
S.J. Brodsky
S.J. Brodsky
T. Aaltonen
T. Aaltonen
T. Barnes
T. Huang
T. Huang
Teng Wang
Tianbo Liu
W.A. Bardeen
X.G. He
X.G. He
X.X. Wang
Y. Li
Y.L. Wu
Z.J. Xiao
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

We investigate the rare decays

B_c \rightarrow D_s(1968) \ell \bar{\ell}

and

B_c\rightarrow D_s^*(2317) \ell \bar{\ell}

in the framework of the light-cone quark model (LCQM). The transition form factors are calculated in the space-like region and then analytically continued to the time-like region via exponential parametrization. The branching ratios and longitudinal lepton polarization asymmetries (LPAs) for the two decays are given and compared with each other. The results are helpful to investigating the structure of

B_c

meson and to testing the unitarity of CKM quark mixing matrix. All these results can be tested in the future experiments at the LHC.Comment: 9 pages, 11 figures, version accepted for publication in EPJ

arXiv.org e-Print Archive

Crossref

Genetic polymorphisms analysis of CYP2D6 in the Uygur population

Author: A Koski
A Menoyo
B Beer
D Wang
DC Friedrich
Dongya Yuan
G Mikus
GR Wilkinson
H Kouhi
J Barrett
J Saruwatari
J Zuo
K Sakuyama
KR Crews
KT Zondervan
L Ji
Lisong Ren
Longli Kang
M Britzi
M Ingelman-Sundberg
M Rebsamen
M Sosa-Macías
MO Baclig
Na He
Ning Zhang
PP Gor
RL Slaughter
RY Qumsieh
S Xu
S Zhou
S-J Lee
S-Y Lee
T Ota
T-B Jin
Tianbo Jin
W-Y Zhang
Xue He
Y Jin
Y Nakamura
Y-G Yao
Yini Ma
Yongri Ouyang
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Semantic-aware Dynamic Retrospective-Prospective Reasoning for Event-level Video Question Answering

Author: Foster Jennifer
Graham Yvette
Ji Tianbo
Lyu Chenyang
Publication venue
Publication date: 13/05/2023
Field of study

arXiv.org e-Print Archive

Is a Video worth $n\times n$ Images? A Highly Efficient Approach to Transformer-based Video Question Answering

Author: Foster Jennifer
Graham Yvette
Ji Tianbo
Lyu Chenyang
Publication venue
Publication date: 15/05/2023
Field of study

n\times n

matrix and then convert it to one image. By doing so, we reduce the use of the image encoder from

n^{2}

1

while maintaining the temporal structure of the original video. Experimental results on MSRVTT and TrafficQA show that our proposed approach achieves state-of-the-art performance with nearly

4\times

faster speed and only 30% memory use. We show that by integrating our approach into VideoQA systems we can achieve comparable, even superior, performance with a significant speed up for training and inference. We believe the proposed approach can facilitate VideoQA-related research by reducing the computational requirements for those who have limited access to budgets and resources. Our code will be made publicly available for research use

arXiv.org e-Print Archive