17 research outputs found
Few-Shot and Zero-Shot Learning for Historical Text Normalization
Historical text normalization often relies on small training datasets. Recent
work has shown that multi-task learning can lead to significant improvements by
exploiting synergies with related datasets, but there has been no systematic
study of different multi-task learning architectures. This paper evaluates
63~multi-task learning configurations for sequence-to-sequence-based historical
text normalization across ten datasets from eight languages, using
autoencoding, grapheme-to-phoneme mapping, and lemmatization as auxiliary
tasks. We observe consistent, significant improvements across languages when
training data for the target task is limited, but minimal or no improvements
when training data is abundant. We also show that zero-shot learning
outperforms the simple, but relatively strong, identity baseline.Comment: Accepted at DeepLo-201
When is multitask learning effective? Semantic sequence prediction under varying data conditions
Multitask learning has been applied successfully to a range of tasks, mostly
morphosyntactic. However, little is known on when MTL works and whether there
are data characteristics that help to determine its success. In this paper we
evaluate a range of semantic sequence labeling tasks in a MTL setup. We examine
different auxiliary tasks, amongst which a novel setup, and correlate their
impact to data-dependent conditions. Our results show that MTL is not always
effective, significant improvements are obtained only for 1 out of 5 tasks.
When successful, auxiliary tasks with compact and more uniform label
distributions are preferable.Comment: In EACL 201
Multi-Task Learning for Argumentation Mining in Low-Resource Settings
We investigate whether and where multi-task learning (MTL) can improve
performance on NLP problems related to argumentation mining (AM), in particular
argument component identification. Our results show that MTL performs
particularly well (and better than single-task learning) when little training
data is available for the main task, a common scenario in AM. Our findings
challenge previous assumptions that conceptualizations across AM datasets are
divergent and that MTL is difficult for semantic or higher-level tasks.Comment: Accepted at NAACL 201
Integrating lexical and prosodic features for automatic paragraph segmentation
Spoken documents, such as podcasts or lectures, are a growing presence in everyday life. Being able to automatically
identify their discourse structure is an important step to understanding what a spoken document is about. Moreover,
finer-grained units, such as paragraphs, are highly desirable for presenting and analyzing spoken content. However, little
work has been done on discourse based speech segmentation below the level of broad topics. In order to examine how
discourse transitions are cued in speech, we investigate automatic paragraph segmentation of TED talks using lexical
and prosodic features. Experiments using Support Vector Machines, AdaBoost, and Neural Networks show that models
using supra-sentential prosodic features and induced cue words perform better than those based on the type of lexical
cohesion measures often used in broad topic segmentation. Moreover, combining a wide range of individually weak
lexical and prosodic predictors improves performance, and modelling contextual information using recurrent neural
networks outperforms other approaches by a large margin. Our best results come from using late fusion methods that
integrate representations generated by separate lexical and prosodic models while allowing interactions between these
features streams rather than treating them as independent information sources. Application to ASR outputs shows that
adding prosodic features, particularly using late fusion, can significantly ameliorate decreases in performance due to
transcription errors.The second author was funded from the EU’s Horizon
2020 Research and Innovation Programme under the GA
H2020-RIA-645012 and the Spanish Ministry of Economy
and Competitivity Juan de la Cierva program. The other
authors were funded by the University of Edinburgh