68 research outputs found
Recommended from our members
Mitigating Data Scarcity for Neural Language Models
In recent years, pretrained neural language models (PNLMs) have taken the field of natural language processing by storm, achieving new benchmarks and state-of-theart performances. These models often rely heavily on annotated data, which may not always be available. Data scarcity are commonly found in specialized domains, such as medical, or in low-resource languages that are underexplored by AI research. In this dissertation, we focus on mitigating data scarcity using data augmentation and neural ensemble learning techniques for neural language models. In both research directions, we implement neural network algorithms and evaluate their impact on assisting neural language models in downstream NLP tasks. Specifically, for data augmentation, we explore two techniques: 1) creating positive training data by moving an answer span around its original context and 2) using text simplification techniques to introduce a variety of writing styles to the original training data. Our results indicate that these simple and effective solutions improve the performance of neural language models considerably in low-resource NLP domains and tasks. For neural ensemble learning, we use a multi-label neural classifier to select the best prediction outcome from a variety of individual pretrained neural language models trained for a low-resource medical text simplification task
One-sided differentiability: a challenge for computer algebra systems
Computer Algebra Systems (CASs) are extremely powerful and widely used digital tools. Focusing on differentiation, CASs include a command that computes the derivative of functions in one variable (and also the partial derivative of functions in several variables). We will focus in this article on real-valued functions of one real variable. Since CASs usually compute the derivative of real-valued functions as a whole, the value of the computed derivative at points where the left derivative and the right derivative are different (that we will call conflicting points) should be something like "undefined", although this isn't always the case: the output could strongly differ depending on the chosen CAS. We have analysed and compared in this article how some well-known CASs behave when addressing differentiation at the conflicting points of five different functions chosen by the authors. Finally, the ability for calculating one-sided limits of CASs allows to directly compute the result in these cumbersome cases using the formal definition of one-sided derivative, which we have also analysed and compared for the selected CASs. Regarding teaching, this is an important issue, as it is a topic of Secondary Education and nowadays the use of CASs as an auxiliary digital tool for teaching mathematics is very common
Automatic and Human-AI Interactive Text Generation
In this tutorial, we focus on text-to-text generation, a class of natural
language generation (NLG) tasks, that takes a piece of text as input and then
generates a revision that is improved according to some specific criteria
(e.g., readability or linguistic styles), while largely retaining the original
meaning and the length of the text. This includes many useful applications,
such as text simplification, paraphrase generation, style transfer, etc. In
contrast to text summarization and open-ended text completion (e.g., story),
the text-to-text generation tasks we discuss in this tutorial are more
constrained in terms of semantic consistency and targeted language styles. This
level of control makes these tasks ideal testbeds for studying the ability of
models to generate text that is both semantically adequate and stylistically
appropriate. Moreover, these tasks are interesting from a technical standpoint,
as they require complex combinations of lexical and syntactical
transformations, stylistic control, and adherence to factual knowledge, -- all
at once. With a special focus on text simplification and revision, this
tutorial aims to provide an overview of the state-of-the-art natural language
generation research from four major aspects -- Data, Models, Human-AI
Collaboration, and Evaluation -- and to discuss and showcase a few significant
and recent advances: (1) the use of non-retrogressive approaches; (2) the shift
from fine-tuning to prompting with large language models; (3) the development
of new learnable metric and fine-grained human evaluation framework; (4) a
growing body of studies and datasets on non-English languages; (5) the rise of
HCI+NLP+Accessibility interdisciplinary research to create real-world writing
assistant systems.Comment: To appear at ACL 2024, Tutoria
CLARIN
The book provides a comprehensive overview of the Common Language Resources and Technology Infrastructure – CLARIN – for the humanities. It covers a broad range of CLARIN language resources and services, its underlying technological infrastructure, the achievements of national consortia, and challenges that CLARIN will tackle in the future. The book is published 10 years after establishing CLARIN as an Europ. Research Infrastructure Consortium
Матеріали 1-го симпозіуму з передових освітніх технологій - Том 1: AET
Матеріали 1-го симпозіуму з передових освітніх технологій.Proceedings of the 1st Symposium on Advances in Educational Technology
CLARIN. The infrastructure for language resources
CLARIN, the "Common Language Resources and Technology Infrastructure", has established itself as a major player in the field of research infrastructures for the humanities. This volume provides a comprehensive overview of the organization, its members, its goals and its functioning, as well as of the tools and resources hosted by the infrastructure. The many contributors representing various fields, from computer science to law to psychology, analyse a wide range of topics, such as the technology behind the CLARIN infrastructure, the use of CLARIN resources in diverse research projects, the achievements of selected national CLARIN consortia, and the challenges that CLARIN has faced and will face in the future.
The book will be published in 2022, 10 years after the establishment of CLARIN as a European Research Infrastructure Consortium by the European Commission (Decision 2012/136/EU)
Proceedings of the Eighth Italian Conference on Computational Linguistics CliC-it 2021
The eighth edition of the Italian Conference on Computational Linguistics (CLiC-it 2021) was held at Università degli Studi di Milano-Bicocca from 26th to 28th January 2022. After the edition of 2020, which was held in fully virtual mode due to the health emergency related to Covid-19, CLiC-it 2021 represented the first moment for the Italian research community of Computational Linguistics to meet in person after more than one year of full/partial lockdown
- …