10 research outputs found
Predicting suicide risk from online postings in Reddit : the UGent-IDLab submission to the CLPysch 2019 Shared Task A
This paper describes IDLab’s text classification systems submitted to Task A as part of the CLPsych 2019 shared task. The aim of this shared task was to develop automated systems that predict the degree of suicide risk of people based on their posts on Reddit. Bag-of-words features, emotion features and post level predictions are used to derive user-level predictions. Linear models and ensembles of these models are used to predict final scores. We find that predicting fine-grained risk levels is much more difficult than flagging potentially at-risk users. Furthermore, we do not find clear added value from building richer ensembles compared to simple baselines, given the available training data and the nature of the prediction task
Learning from Partially Annotated Data: Example-aware Creation of Gap-filling Exercises for Language Learning
Since performing exercises (including, e.g., practice tests) forms a crucial
component of learning, and creating such exercises requires non-trivial effort
from the teacher. There is a great value in automatic exercise generation in
digital tools in education. In this paper, we particularly focus on automatic
creation of gapfilling exercises for language learning, specifically grammar
exercises. Since providing any annotation in this domain requires human expert
effort, we aim to avoid it entirely and explore the task of converting existing
texts into new gap-filling exercises, purely based on an example exercise,
without explicit instruction or detailed annotation of the intended grammar
topics. We contribute (i) a novel neural network architecture specifically
designed for aforementioned gap-filling exercise generation task, and (ii) a
real-world benchmark dataset for French grammar. We show that our model for
this French grammar gap-filling exercise generation outperforms a competitive
baseline classifier by 8% in F1 percentage points, achieving an average F1
score of 82%. Our model implementation and the dataset are made publicly
available to foster future research, thus offering a standardized evaluation
and baseline solution of the proposed partially annotated data prediction task
in grammar exercise creation.Comment: 12 pages, Accepted in the 18th Workshop on Innovative Use of NLP for
Building Educational Application
Zero-Shot Cross-Lingual Sentiment Classification under Distribution Shift: an Exploratory Study
The brittleness of finetuned language model performance on
out-of-distribution (OOD) test samples in unseen domains has been well-studied
for English, yet is unexplored for multi-lingual models. Therefore, we study
generalization to OOD test data specifically in zero-shot cross-lingual
transfer settings, analyzing performance impacts of both language and domain
shifts between train and test data. We further assess the effectiveness of
counterfactually augmented data (CAD) in improving OOD generalization for the
cross-lingual setting, since CAD has been shown to benefit in a monolingual
English setting. Finally, we propose two new approaches for OOD generalization
that avoid the costly annotation process associated with CAD, by exploiting the
power of recent large language models (LLMs). We experiment with 3 multilingual
models, LaBSE, mBERT, and XLM-R trained on English IMDb movie reviews, and
evaluate on OOD test sets in 13 languages: Amazon product reviews, Tweets, and
Restaurant reviews. Results echo the OOD performance decline observed in the
monolingual English setting. Further, (i) counterfactuals from the original
high-resource language do improve OOD generalization in the low-resource
language, and (ii) our newly proposed cost-effective approaches reach similar
or up to +3.1% better accuracy than CAD for Amazon and Restaurant reviews.Comment: The 3rd Workshop on Multilingual Representation Learning
(MRL@EMNLP2023
CAW-coref: Conjunction-Aware Word-level Coreference Resolution
State-of-the-art coreference resolutions systems depend on multiple LLM calls
per document and are thus prohibitively expensive for many use cases (e.g.,
information extraction with large corpora). The leading word-level coreference
system (WL-coref) attains 96.6% of these SOTA systems' performance while being
much more efficient. In this work, we identify a routine yet important failure
case of WL-coref: dealing with conjoined mentions such as 'Tom and Mary'. We
offer a simple yet effective solution that improves the performance on the
OntoNotes test set by 0.9% F1, shrinking the gap between efficient word-level
coreference resolution and expensive SOTA approaches by 34.6%. Our
Conjunction-Aware Word-level coreference model (CAW-coref) and code is
available at https://github.com/KarelDO/wl-coref.Comment: Accepted at CRAC 202
Learning to Reuse Distractors to support Multiple Choice Question Generation in Education
Multiple choice questions (MCQs) are widely used in digital learning systems,
as they allow for automating the assessment process. However, due to the
increased digital literacy of students and the advent of social media
platforms, MCQ tests are widely shared online, and teachers are continuously
challenged to create new questions, which is an expensive and time-consuming
task. A particularly sensitive aspect of MCQ creation is to devise relevant
distractors, i.e., wrong answers that are not easily identifiable as being
wrong. This paper studies how a large existing set of manually created answers
and distractors for questions over a variety of domains, subjects, and
languages can be leveraged to help teachers in creating new MCQs, by the smart
reuse of existing distractors. We built several data-driven models based on
context-aware question and distractor representations, and compared them with
static feature-based models. The proposed models are evaluated with automated
metrics and in a realistic user test with teachers. Both automatic and human
evaluations indicate that context-aware models consistently outperform a static
feature-based approach. For our best-performing context-aware model, on average
3 distractors out of the 10 shown to teachers were rated as high-quality
distractors. We create a performance benchmark, and make it public, to enable
comparison between different approaches and to introduce a more standardized
evaluation of the task. The benchmark contains a test of 298 educational
questions covering multiple subjects & languages and a 77k multilingual pool of
distractor vocabulary for future research.Comment: 24 pages and 4 figures Accepted for publication in IEEE Transactions
on Learning technologie
EduQG : a multi-format multiple-choice dataset for the educational domain
Natural language processing technology has made significant progress in recent years, fuelled by increasingly powerful general language models. This has also inspired a sizeable body of work targeted specifically towards the educational domain, where the creation of questions (both for assessment and practice) is a laborious/expensive effort. Thus, automatic Question-Generation (QG) solutions have been proposed and studied. Yet, according to a recent survey of the educational QG community's progress, a common baseline dataset unifying multiple domains and question forms (e.g., multiple choice vs. fill-the-gap), including readily available baseline models to compare against, is largely missing. This is the gap we aim to fill with this paper. In particular, we introduce a high-quality dataset in the educational domain, containing over 3,000 entries, comprising (i) multiple-choice questions, (ii) the corresponding answers (including distractors), and (iii) associated passages from the course material used as sources for the questions. Each question is phrased in two forms, normal and cloze (i.e., fill-the-gap), and correct answers are linked to source documents with sentence-level annotations. Thus, our versatile dataset can be used for both question and distractor generation, as well as to explore new challenges such as question format conversion. Furthermore, 903 questions are accompanied by their cognitive complexity level as per Bloom's taxonomy. All questions have been generated by educational experts rather than crowd workers to ensure they are maintaining educational and learning standards. Our analysis and experiments suggest distinguishable differences between our dataset and commonly used ones for question generation for educational purposes. We believe this new dataset can serve as a valuable resource for research and evaluation in the educational domain. The dataset and baselines are made available to support further research in question generation for education (https://github.com/hadifar/question-generation)
Learning from partially annotated data : example-aware creation of gap-filling exercises for language learning
Since performing exercises (including, e.g.,practice tests) forms a crucial component oflearning, and creating such exercises requiresnon-trivial effort from the teacher. There is agreat value in automatic exercise generationin digital tools in education. In this paper, weparticularly focus on automatic creation of gap-filling exercises for language learning, specifi-cally grammar exercises. Since providing anyannotation in this domain requires human ex-pert effort, we aim to avoid it entirely and ex-plore the task of converting existing texts intonew gap-filling exercises, purely based on anexample exercise, without explicit instructionor detailed annotation of the intended gram-mar topics. We contribute (i) a novel neuralnetwork architecture specifically designed foraforementioned gap-filling exercise generationtask, and (ii) a real-world benchmark datasetfor French grammar. We show that our modelfor this French grammar gap-filling exercisegeneration outperforms a competitive baselineclassifier by 8% in F1 percentage points, achiev-ing an average F1 score of 82%. Our model im-plementation and the dataset are made publiclyavailable to foster future research, thus offeringa standardized evaluation and baseline solutionof the proposed partially annotated data predic-tion task in grammar exercise creation