150 research outputs found
Is Supervised Syntactic Parsing Beneficial for Language Understanding? An Empirical Investigation
Traditional NLP has long held (supervised) syntactic parsing necessary for
successful higher-level language understanding. The recent advent of end-to-end
neural language learning, self-supervised via language modeling (LM), and its
success on a wide range of language understanding tasks, however, questions
this belief. In this work, we empirically investigate the usefulness of
supervised parsing for semantic language understanding in the context of
LM-pretrained transformer networks. Relying on the established fine-tuning
paradigm, we first couple a pretrained transformer with a biaffine parsing
head, aiming to infuse explicit syntactic knowledge from Universal Dependencies
(UD) treebanks into the transformer. We then fine-tune the model for language
understanding (LU) tasks and measure the effect of the intermediate parsing
training (IPT) on downstream LU performance. Results from both monolingual
English and zero-shot language transfer experiments (with intermediate
target-language parsing) show that explicit formalized syntax, injected into
transformers through intermediate supervised parsing, has very limited and
inconsistent effect on downstream LU performance. Our results, coupled with our
analysis of transformers' representation spaces before and after intermediate
parsing, make a significant step towards providing answers to an essential
question: how (un)availing is supervised parsing for high-level semantic
language understanding in the era of large neural models
Adversarial Propagation and Zero-Shot Cross-Lingual Transfer of Word Vector Specialization
Semantic specialization is the process of fine-tuning pre-trained
distributional word vectors using external lexical knowledge (e.g., WordNet) to
accentuate a particular semantic relation in the specialized vector space.
While post-processing specialization methods are applicable to arbitrary
distributional vectors, they are limited to updating only the vectors of words
occurring in external lexicons (i.e., seen words), leaving the vectors of all
other words unchanged. We propose a novel approach to specializing the full
distributional vocabulary. Our adversarial post-specialization method
propagates the external lexical knowledge to the full distributional space. We
exploit words seen in the resources as training examples for learning a global
specialization function. This function is learned by combining a standard
L2-distance loss with an adversarial loss: the adversarial component produces
more realistic output vectors. We show the effectiveness and robustness of the
proposed method across three languages and on three tasks: word similarity,
dialog state tracking, and lexical simplification. We report consistent
improvements over distributional word vectors and vectors specialized by other
state-of-the-art specialization frameworks. Finally, we also propose a
cross-lingual transfer method for zero-shot specialization which successfully
specializes a full target distributional space without any lexical knowledge in
the target language and without any bilingual data.Comment: Accepted at EMNLP 201
University of Mannheim @ CLSciSumm-17: Citation-Based Summarization of Scientific Articles Using Semantic Textual Similarity
The number of publications is rapidly growing and it is essential to enable fast access and analysis of relevant articles. In this paper, we describe a set of methods based on measuring semantic textual similarity, which we use to semantically analyze and summarize publications through other publications that cite them. We report the performance of our approach in the context of the third CL-SciSumm shared task and
show that our system performs favorably to competing systems in terms of produced summaries
Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval
State-of-the-art neural (re)rankers are notoriously data hungry which - given
the lack of large-scale training data in languages other than English - makes
them rarely used in multilingual and cross-lingual retrieval settings. Current
approaches therefore typically transfer rankers trained on English data to
other languages and cross-lingual setups by means of multilingual encoders:
they fine-tune all the parameters of a pretrained massively multilingual
Transformer (MMT, e.g., multilingual BERT) on English relevance judgments and
then deploy it in the target language. In this work, we show that two
parameter-efficient approaches to cross-lingual transfer, namely Sparse
Fine-Tuning Masks (SFTMs) and Adapters, allow for a more lightweight and more
effective zero-shot transfer to multilingual and cross-lingual retrieval tasks.
We first train language adapters (or SFTMs) via Masked Language Modelling and
then train retrieval (i.e., reranking) adapters (SFTMs) on top while keeping
all other parameters fixed. At inference, this modular design allows us to
compose the ranker by applying the task adapter (or SFTM) trained with source
language data together with the language adapter (or SFTM) of a target
language. Besides improved transfer performance, these two approaches offer
faster ranker training, with only a fraction of parameters being updated
compared to full MMT fine-tuning. We benchmark our models on the CLEF-2003
benchmark, showing that our parameter-efficient methods outperform standard
zero-shot transfer with full MMT fine-tuning, while enabling modularity and
reducing training times. Further, we show on the example of Swahili and Somali
that, for low(er)-resource languages, our parameter-efficient neural re-rankers
can improve the ranking of the competitive machine translation-based ranker
- …