338 research outputs found
Predicting Subjective Features from Questions on QA Websites using BERT
Community Question-Answering websites, such as StackOverflow and Quora,
expect users to follow specific guidelines in order to maintain content
quality. These systems mainly rely on community reports for assessing contents,
which has serious problems such as the slow handling of violations, the loss of
normal and experienced users' time, the low quality of some reports, and
discouraging feedback to new users. Therefore, with the overall goal of
providing solutions for automating moderation actions in Q&A websites, we aim
to provide a model to predict 20 quality or subjective aspects of questions in
QA websites. To this end, we used data gathered by the CrowdSource team at
Google Research in 2019 and a fine-tuned pre-trained BERT model on our problem.
Based on the evaluation by Mean-Squared-Error (MSE), the model achieved a value
of 0.046 after 2 epochs of training, which did not improve substantially in the
next ones. Results confirm that by simple fine-tuning, we can achieve accurate
models in little time and on less amount of data.Comment: 5 pages, 4 figures, 2 table
Development of a Corpus for Userbased Scientific Question Answering
Tese de mestrado, Bioinformática e Biologia Computacional, Universidade de Lisboa, Faculdade de Ciências, 2021In recent years Question & Answering (QA) tasks became particularly relevant in the research field of natural language understanding. However, the lack of good quality datasets has been an important limiting factor in the quest for better models. Particularly in the biomedical domain, the scarcity of gold standard labelled datasets has been a recognized obstacle given its idiosyncrasies and complexities often require the participation of skilled domain¬specific experts in producing such datasets. To address this issue, a method for automatically gather Question¬Answer pairs from online QA biomedical forums has been suggested yielding a corpus named BiQA. The authors describe several strategies to validate this new dataset but a human manual verification has not been conducted. With this in mind, this dissertation was set out with the objectives of performing a manual verification of a sample of 1200 questions of BiQA and also to expanding these questions, by adding features, into a new corpus of text ¬ BiQA2 ¬ with the goal of contributing with a new corpusfor biomedical QA research. Regarding the manual verification of BiQA, a methodology for its characterization was laid out and allowed the identification of an array of potential problems related to the nature of its questions and answers aptness for which possible improvement solutions were presented. Concomitantly, the proposed new BiQA2 corpus ¬ created upon the validated questions and answers from the perused samples from BiQA ¬ builds new features similar to those observed in other biomedical corpus such as the BioASQ dataset. Both BiQA and BiQA2 were applied to deep learning strategies previously submitted to the BioASQ competition to assess their performance as a source of training data. Although the results achieved with the models created using BiQA2 exhibit limited capability pertaining to the BioASQ challenge, they also show some potential to contribute positively to model training in tasks such as Document re-ranking and answering to ‘yes/no’ questions
DiPlomat: A Dialogue Dataset for Situated Pragmatic Reasoning
Pragmatic reasoning plays a pivotal role in deciphering implicit meanings
that frequently arise in real-life conversations and is essential for the
development of communicative social agents. In this paper, we introduce a novel
challenge, DiPlomat, aiming at benchmarking machines' capabilities on pragmatic
reasoning and situated conversational understanding. Compared with previous
works that treat different figurative expressions (e.g. metaphor, sarcasm) as
individual tasks, DiPlomat provides a cohesive framework towards general
pragmatic understanding. Our dataset is created through the utilization of
Amazon Mechanical Turk ( AMT ), resulting in a total of 4, 177 multi-turn
dialogues. In conjunction with the dataset, we propose two tasks, Pragmatic
Identification and Reasoning (PIR) and Conversational Question Answering (CQA).
Experimental results with state-of-the-art (SOTA) neural architectures reveal
several significant findings: 1) large language models ( LLMs) exhibit poor
performance in tackling this subjective domain; 2) comprehensive comprehension
of context emerges as a critical factor for establishing benign human-machine
interactions; 3) current models defect in the application of pragmatic
reasoning. As a result, we call on more attention to improve the ability of
context understanding, reasoning, and implied meaning modeling
Bringing order into the realm of Transformer-based language models for artificial intelligence and law
Transformer-based language models (TLMs) have widely been recognized to be a
cutting-edge technology for the successful development of deep-learning-based
solutions to problems and applications that require natural language processing
and understanding. Like for other textual domains, TLMs have indeed pushed the
state-of-the-art of AI approaches for many tasks of interest in the legal
domain. Despite the first Transformer model being proposed about six years ago,
there has been a rapid progress of this technology at an unprecedented rate,
whereby BERT and related models represent a major reference, also in the legal
domain. This article provides the first systematic overview of TLM-based
methods for AI-driven problems and tasks in the legal sphere. A major goal is
to highlight research advances in this field so as to understand, on the one
hand, how the Transformers have contributed to the success of AI in supporting
legal processes, and on the other hand, what are the current limitations and
opportunities for further research development.Comment: Please refer to the published version: Greco, C.M., Tagarelli, A.
(2023) Bringing order into the realm of Transformer-based language models for
artificial intelligence and law. Artif Intell Law, Springer Nature. November
2023. https://doi.org/10.1007/s10506-023-09374-
ReadMe++: Benchmarking Multilingual Language Models for Multi-Domain Readability Assessment
We present a systematic study and comprehensive evaluation of large language
models for automatic multilingual readability assessment. In particular, we
construct ReadMe++, a multilingual multi-domain dataset with human annotations
of 9757 sentences in Arabic, English, French, Hindi, and Russian collected from
112 different data sources. ReadMe++ offers more domain and language diversity
than existing readability datasets, making it ideal for benchmarking
multilingual and non-English language models (including mBERT, XLM-R, mT5,
Llama-2, GPT-4, etc.) in the supervised, unsupervised, and few-shot prompting
settings. Our experiments reveal that models fine-tuned on ReadMe++ outperform
those trained on single-domain datasets, showcasing superior performance on
multi-domain readability assessment and cross-lingual transfer capabilities. We
also compare to traditional readability metrics (such as Flesch-Kincaid Grade
Level and Open Source Metric for Measuring Arabic Narratives), as well as the
state-of-the-art unsupervised metric RSRS (Martinc et al., 2021). We will make
our data and code publicly available at: https://github.com/tareknaous/readme.Comment: We have added French and Russian as two new languages to the corpu
3rd International Conference on Advanced Research Methods and Analytics (CARMA 2020)
Research methods in economics and social sciences are evolving with the increasing availability of Internet and Big Data sources of information.As these sources, methods, and applications become more interdisciplinary, the 3rd International Conference on Advanced Research Methods and Analytics (CARMA) is an excellent forum for researchers and practitioners to exchange ideas and advances on how emerging research methods and sources are applied to different fields of social sciences as well as to discuss current and future challenges.Doménech I De Soria, J.; Vicente Cuervo, MR. (2020). 3rd International Conference on Advanced Research Methods and Analytics (CARMA 2020). Editorial Universitat Politècnica de València. http://hdl.handle.net/10251/149510EDITORIA
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Pretraining is the preliminary and fundamental step in developing capable
language models (LM). Despite this, pretraining data design is critically
under-documented and often guided by empirically unsupported intuitions. To
address this, we pretrain 28 1.5B parameter decoder-only models, training on
data curated (1) at different times, (2) with varying toxicity and quality
filters, and (3) with different domain compositions. First, we quantify the
effect of pretraining data age. A temporal shift between evaluation data and
pretraining data leads to performance degradation, which is not overcome by
finetuning. Second, we explore the effect of quality and toxicity filters,
showing a trade-off between performance on standard benchmarks and risk of
toxic generations. Our findings indicate there does not exist a
one-size-fits-all solution to filtering training data. We also find that the
effects of different types of filtering are not predictable from text domain
characteristics. Lastly, we empirically validate that the inclusion of
heterogeneous data sources, like books and web, is broadly beneficial and
warrants greater prioritization. These findings constitute the largest set of
experiments to validate, quantify, and expose many undocumented intuitions
about text pretraining, which we hope will help support more informed
data-centric decisions in LM development
- …