Search CORE

239 research outputs found

Empowering Active Learning to Jointly Optimize System and User Demands

Author: Gurevych Iryna
Lee Ji-Ung
Meyer Christian M.
Publication venue
Publication date: 01/01/2020
Field of study

Existing approaches to active learning maximize the system performance by sampling unlabeled instances for annotation that yield the most efficient training. However, when active learning is integrated with an end-user application, this can lead to frustration for participating users, as they spend time labeling instances that they would not otherwise be interested in reading. In this paper, we propose a new active learning approach that jointly optimizes the seemingly counteracting objectives of the active learning system (training efficiently) and the user (receiving useful instances). We study our approach in an educational application, which particularly benefits from this technique as the system needs to rapidly learn to predict the appropriateness of an exercise to a particular user, while the users should receive only exercises that match their skills. We evaluate multiple learning strategies and user types with data from real users and find that our joint approach better satisfies both objectives when alternative methods lead to many unsuitable exercises for end users.Comment: To appear as a long paper in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL 2020). Download our code and simulated user models at github: https://github.com/UKPLab/acl2020-empowering-active-learnin

arXiv.org e-Print Archive

TUbiblio

Crossref

Constrained Generation and Adaptive Selection of C-Tests

Author: Lee Ji-Ung
Publication venue
Publication date: 26/07/2024
Field of study

Increasing globalization and immigration is driving the importance of multi-lingual proficiency. Being able to communicate across different languages is already one of the key competencies that can define success—however, various institutions such as the European Council or the United Nations High Commissioner for Refugees predict that this trend will intensify even further with climate change and rising refugee numbers. Despite these concerning developments, a shortage of proficient human translators remains, while existing automated solutions fall far behind the requirements. For instance, current translation tools have been shown to perform substantially worse in low-resource languages or in specialized domains such as legal or medical—causing real-world harm through unreflected use. Large language models (LLMs) still exhibit biases and hallucinations—rendering them unreliable. At the same time, the continuous shortage of teachers leads to an increasing gap for language learning opportunities. While self-directed learning and intelligent tutoring systems (ITS) have the potential to alleviate some of the issues, research in this area suffers from limited available data—a result of proprietary software and data protection regulations. This calls for methods that are capable of learning efficiently from little user feedback. The goal of this thesis is to provide new language learning opportunities by devising methods that alleviate the work for teachers and that empower learners to self-directed learning. For evaluation we use C-Tests, a type of gap filling exercise that is similar to cloze tests, but less ambiguous. In the first part of this thesis, we develop novel methods for generating C-Tests. In contrast to previous works, our methods—that are based on heuristics and constrained optimization—are capable of generating C-Tests with a specific target difficulty. Moreover, our method based on mixed-integer programming allows teachers to pose specific constraints which are guaranteed to be adhered, resulting in C-Tests that better suit their needs. In the second part of this thesis, we devise a new sampling method to interactively train a C-Test selection model. We draw inspiration from active learning that aims to improve model training by only annotating instances that presumably help the model most (model objective). At first glance, active learning seems to be unfit for educational scenarios as it can lead to instances that are more difficult to annotate—or likewise, result in C-Tests that do not suit a learner’s current proficiency. Conversely, only selecting instances that suit the learner’s current proficiency—ideally with a high certainty (user objective)—will result in feedback that is uninformative for the model. We show that it is indeed possible to sample instances that optimize both and that this results in C-Tests which benefit model and learner better than sampling instances for each objective individually. Finally, we explore interactive data annotation as a scenario that could benefit from our joint sampling strategy. We first develop an application that showcases the usefulness of interactive data annotation in a scenario where domain experts can interactively annotate data to ease their work. We then show how annotation studies in general comprise a learning process, and devise annotation curricula, a method to reorder annotated instances which significantly reduces annotation time

tuprints

Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings

Author: Geyken Alexander
Gurevych Iryna
Hamster Ulf A.
Lee Ji-Ung
Publication venue
Publication date: 13/03/2023
Field of study

Training and inference on edge devices often requires an efficient setup due to computational limitations. While pre-computing data representations and caching them on a server can mitigate extensive edge device computation, this leads to two challenges. First, the amount of storage required on the server that scales linearly with the number of instances. Second, the bandwidth required to send extensively large amounts of data to an edge device. To reduce the memory footprint of pre-computed data representations, we propose a simple, yet effective approach that uses randomly initialized hyperplane projections. To further reduce their size by up to 98.96%, we quantize the resulting floating-point representations into binary vectors. Despite the greatly reduced size, we show that the embeddings remain effective for training models across various English and German sentence classification tasks that retain 94%--99% of their floating-point

arXiv.org e-Print Archive

Transformers with Learnable Activation Functions

Author: Fang Haishuo
Gurevych Iryna
Lee Ji-Ung
Moosavi Nafise Sadat
Publication venue
Publication date: 14/02/2023
Field of study

Activation functions can have a significant impact on reducing the topological complexity of input data and therefore improve the performance of the model. Selecting a suitable activation function is an essential step in neural model design. However, the choice of activation function is seldom discussed or explored in Transformer-based language models. Their activation functions are chosen beforehand and then remain fixed from pre-training to fine-tuning. As a result, the inductive biases they imposed on models cannot be adjusted during this long life cycle. Moreover, subsequently developed models (e.g., RoBERTa, BART, and GPT-3) often follow up prior work (e.g., BERT) to use the same activation function without justification. In this paper, we investigate the effectiveness of using Rational Activation Function (RAF), a learnable activation function, in the Transformer architecture. In contrast to conventional, predefined activation functions, RAFs can adaptively learn optimal activation functions during training according to input data. Our experiments show the RAF-based Transformer (RAFT) achieves a lower validation perplexity than a vanilla BERT with the GELU function. We further evaluate RAFT on downstream tasks in low- and full-data settings. Our results show that RAFT outperforms the counterpart model across the majority of tasks and settings. For instance, RAFT outperforms vanilla BERT on the GLUE benchmark by 5.71 points on average in low-data scenario (where 100 training examples are available) and by 2.05 points on SQuAD in full-data setting. Analysis of the shapes of learned RAFs further unveils that they substantially vary between different layers of the pre-trained model and mostly look very different from conventional activation functions. RAFT opens a new research direction for analyzing and interpreting pre-trained models according to the learned activation functions.Comment: Accepted by EACL2023 finding

arXiv.org e-Print Archive

White Rose Research Online

Tumor necrosis factor-like weak inducer of apoptosis (TWEAK) promotes glioma cell invasion through induction of NF-κB-inducing kinase (NIK) and noncanonical NF-κB signaling

Author: Dong W Lee
Evan M Cherry
Ji-Ung Jung
Raquel Sitcheran
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

BACKGROUND: High-grade gliomas are one of the most invasive and therapy-resistant cancers. We have recently shown that noncanonical NF-κB/RelB signaling is a potent driver of tumorigenesis and invasion in the aggressive, mesenchymal subtype of glioma. However, the relevant signals that induce activation of noncanonical NF-κB signaling in glioma and its function relative to the canonical NF-κB pathway remain elusive. METHODS: The ability of tumor necrosis factor (TNF)-like weak inducer of apoptosis (TWEAK) to regulate NF-κB signaling and promote tumor progression was investigated in both established and primary high-grade glioma tumor lines using a three-dimensional (3-D) collagen invasion assay. The roles of specific NF-κB proteins in regulating glioma cell invasion and expression of Matrix Metalloproteinase 9 (MMP9) in response to TWEAK were evaluated using shRNA-mediated loss-of-function studies. The ability of NF-κB-inducing kinase (NIK) to promote glioma growth in vivo was investigated using an orthotopic xenograft mouse model. RESULTS: In glioma cells that display elevated noncanonical NF-κB signaling, loss of RelB attenuates invasion without affecting RelA expression or phosphorylation and RelB is sufficient to promote invasion in the absence of RelA. The cytokine TWEAK preferentially activates the noncanonical NF-κB pathway through induction of p100 processing to p52 and nuclear accumulation of both RelB and p52 without activating the canonical NF-κB pathway. Moreover, TWEAK, but not TNFα, significantly increases NIK mRNA levels. TWEAK also promotes noncanonical NFκB-dependent MMP9 expression and glioma cell invasion. Finally, expression of NIK is sufficient to increase gliomagenesis in vivo. CONCLUSIONS: Our data establish a key role for NIK and noncanonical NF-κB in mediating TWEAK-induced, MMP-dependent glioma cell invasion. The findings also demonstrate that TWEAK induces noncanonical NF-κB signaling and signal-specific regulation of NIK mRNA expression. Together, these studies reveal the important role of noncanonical NF-κB signaling in regulating glioma invasiveness and highlight the therapeutic potential of targeting activation of NIK in this deadly disease. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12943-014-0273-1) contains supplementary material, which is available to authorized users

Crossref

Springer - Publisher Connector

OAKTrust Digital Repository (Texas A&M Univ)

PubMed Central