239 research outputs found
Empowering Active Learning to Jointly Optimize System and User Demands
Existing approaches to active learning maximize the system performance by
sampling unlabeled instances for annotation that yield the most efficient
training. However, when active learning is integrated with an end-user
application, this can lead to frustration for participating users, as they
spend time labeling instances that they would not otherwise be interested in
reading. In this paper, we propose a new active learning approach that jointly
optimizes the seemingly counteracting objectives of the active learning system
(training efficiently) and the user (receiving useful instances). We study our
approach in an educational application, which particularly benefits from this
technique as the system needs to rapidly learn to predict the appropriateness
of an exercise to a particular user, while the users should receive only
exercises that match their skills. We evaluate multiple learning strategies and
user types with data from real users and find that our joint approach better
satisfies both objectives when alternative methods lead to many unsuitable
exercises for end users.Comment: To appear as a long paper in Proceedings of the 58th Annual Meeting
of the Association for Computational Linguistics (ACL 2020). Download our
code and simulated user models at github:
https://github.com/UKPLab/acl2020-empowering-active-learnin
Constrained Generation and Adaptive Selection of C-Tests
Increasing globalization and immigration is driving the importance of multi-lingual proficiency. Being able to communicate across different languages is already one of the key competencies that can define success—however, various institutions such as the European Council or the United Nations High Commissioner for Refugees predict that this trend will intensify even further with climate change and rising refugee numbers. Despite these concerning developments, a shortage of proficient human translators remains, while existing automated solutions fall far behind the requirements. For instance, current translation tools have been shown to perform substantially worse in low-resource languages or in specialized domains such as legal or medical—causing real-world harm through unreflected use. Large language models (LLMs) still exhibit biases and hallucinations—rendering them unreliable. At the same time, the continuous shortage of teachers leads to an increasing gap for language learning opportunities. While self-directed learning and intelligent tutoring systems (ITS) have the potential to alleviate some of the issues, research in this area suffers from limited available data—a result of proprietary software and data protection regulations. This calls for methods that are capable of learning efficiently from little user feedback.
The goal of this thesis is to provide new language learning opportunities by devising methods that alleviate the work for teachers and that empower learners to self-directed learning. For evaluation we use C-Tests, a type of gap filling exercise that is similar to cloze tests, but less ambiguous. In the first part of this thesis, we develop novel methods for generating C-Tests. In contrast to previous works, our methods—that are based on heuristics and constrained optimization—are capable of generating C-Tests with a specific target difficulty. Moreover, our method based on mixed-integer programming allows teachers to pose specific constraints which are guaranteed to be adhered, resulting in C-Tests that better suit their needs.
In the second part of this thesis, we devise a new sampling method to interactively train a C-Test selection model. We draw inspiration from active learning that aims to improve model training by only annotating instances that presumably help the model most (model objective). At first glance, active learning seems to be unfit for educational scenarios as it can lead to instances that are more difficult to annotate—or likewise, result in C-Tests that do not suit a learner’s current proficiency. Conversely, only selecting instances that suit the learner’s current proficiency—ideally with a high certainty (user objective)—will result in feedback that is uninformative for the model. We show that it is indeed possible to sample instances that optimize both and that this results in C-Tests which benefit model and learner better than sampling instances for each objective individually.
Finally, we explore interactive data annotation as a scenario that could benefit from our joint sampling strategy. We first develop an application that showcases the usefulness of interactive data annotation in a scenario where domain experts can interactively annotate data to ease their work. We then show how annotation studies in general comprise a learning process, and devise annotation curricula, a method to reorder annotated instances which significantly reduces annotation time
Rediscovering Hashed Random Projections for Efficient Quantization of Contextualized Sentence Embeddings
Training and inference on edge devices often requires an efficient setup due
to computational limitations. While pre-computing data representations and
caching them on a server can mitigate extensive edge device computation, this
leads to two challenges. First, the amount of storage required on the server
that scales linearly with the number of instances. Second, the bandwidth
required to send extensively large amounts of data to an edge device. To reduce
the memory footprint of pre-computed data representations, we propose a simple,
yet effective approach that uses randomly initialized hyperplane projections.
To further reduce their size by up to 98.96%, we quantize the resulting
floating-point representations into binary vectors. Despite the greatly reduced
size, we show that the embeddings remain effective for training models across
various English and German sentence classification tasks that retain 94%--99%
of their floating-point
Transformers with Learnable Activation Functions
Activation functions can have a significant impact on reducing the
topological complexity of input data and therefore improve the performance of
the model. Selecting a suitable activation function is an essential step in
neural model design. However, the choice of activation function is seldom
discussed or explored in Transformer-based language models. Their activation
functions are chosen beforehand and then remain fixed from pre-training to
fine-tuning. As a result, the inductive biases they imposed on models cannot be
adjusted during this long life cycle. Moreover, subsequently developed models
(e.g., RoBERTa, BART, and GPT-3) often follow up prior work (e.g., BERT) to use
the same activation function without justification. In this paper, we
investigate the effectiveness of using Rational Activation Function (RAF), a
learnable activation function, in the Transformer architecture. In contrast to
conventional, predefined activation functions, RAFs can adaptively learn
optimal activation functions during training according to input data. Our
experiments show the RAF-based Transformer (RAFT) achieves a lower validation
perplexity than a vanilla BERT with the GELU function. We further evaluate RAFT
on downstream tasks in low- and full-data settings. Our results show that RAFT
outperforms the counterpart model across the majority of tasks and settings.
For instance, RAFT outperforms vanilla BERT on the GLUE benchmark by 5.71
points on average in low-data scenario (where 100 training examples are
available) and by 2.05 points on SQuAD in full-data setting. Analysis of the
shapes of learned RAFs further unveils that they substantially vary between
different layers of the pre-trained model and mostly look very different from
conventional activation functions. RAFT opens a new research direction for
analyzing and interpreting pre-trained models according to the learned
activation functions.Comment: Accepted by EACL2023 finding
Tumor necrosis factor-like weak inducer of apoptosis (TWEAK) promotes glioma cell invasion through induction of NF-κB-inducing kinase (NIK) and noncanonical NF-κB signaling
BACKGROUND: High-grade gliomas are one of the most invasive and therapy-resistant cancers. We have recently shown that noncanonical NF-κB/RelB signaling is a potent driver of tumorigenesis and invasion in the aggressive, mesenchymal subtype of glioma. However, the relevant signals that induce activation of noncanonical NF-κB signaling in glioma and its function relative to the canonical NF-κB pathway remain elusive. METHODS: The ability of tumor necrosis factor (TNF)-like weak inducer of apoptosis (TWEAK) to regulate NF-κB signaling and promote tumor progression was investigated in both established and primary high-grade glioma tumor lines using a three-dimensional (3-D) collagen invasion assay. The roles of specific NF-κB proteins in regulating glioma cell invasion and expression of Matrix Metalloproteinase 9 (MMP9) in response to TWEAK were evaluated using shRNA-mediated loss-of-function studies. The ability of NF-κB-inducing kinase (NIK) to promote glioma growth in vivo was investigated using an orthotopic xenograft mouse model. RESULTS: In glioma cells that display elevated noncanonical NF-κB signaling, loss of RelB attenuates invasion without affecting RelA expression or phosphorylation and RelB is sufficient to promote invasion in the absence of RelA. The cytokine TWEAK preferentially activates the noncanonical NF-κB pathway through induction of p100 processing to p52 and nuclear accumulation of both RelB and p52 without activating the canonical NF-κB pathway. Moreover, TWEAK, but not TNFα, significantly increases NIK mRNA levels. TWEAK also promotes noncanonical NFκB-dependent MMP9 expression and glioma cell invasion. Finally, expression of NIK is sufficient to increase gliomagenesis in vivo. CONCLUSIONS: Our data establish a key role for NIK and noncanonical NF-κB in mediating TWEAK-induced, MMP-dependent glioma cell invasion. The findings also demonstrate that TWEAK induces noncanonical NF-κB signaling and signal-specific regulation of NIK mRNA expression. Together, these studies reveal the important role of noncanonical NF-κB signaling in regulating glioma invasiveness and highlight the therapeutic potential of targeting activation of NIK in this deadly disease. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12943-014-0273-1) contains supplementary material, which is available to authorized users
- …