Search CORE

OpenEdition

Edinburgh Research Explorer

The Impact of Annotation on the Performance of Protein Tagging in Biomedical Text

Author: Alex Beatrice
Grover Claire
Nissim Malvina
Publication venue
Publication date: 01/01/2006
Field of study

In this paper we discuss five different corpora annotated for protein names. We present several within- and cross-dataset protein tagging experiments showing that different annotation schemes severely affect the portability of statistical protein taggers. By means of a detailed error analysis we identify crucial annotation issues that future annotation projects should take into careful consideration

CiteSeerX

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation

Author: Nissim Malvina
Sarti Gabriele
Publication venue: 'Center for Open Science'
Publication date: 07/03/2022
Field of study

The T5 model and its unified text-to-text paradigm contributed in advancing the state-of-the-art for many natural language processing tasks. While some multilingual variants of the T5 model have recently been introduced, their performances were found to provide suboptimal performances for languages other than English if compared to monolingual variants. We are motivated by these findings to introduce IT5, the first family of encoder-decoder transformer models pretrained specifically on Italian. We perform a thorough cleaning of a web-crawled Italian corpus including more than 40 billion words and use it to pretrain three IT5 models of different sizes. The performance of IT5 models and their multilingual counterparts is then evaluated on a broad range of natural language understanding and generation benchmarks for Italian. We find the monolingual IT5 models to provide the best scale-to-performance ratio across tested models, consistently outperforming their multilingual counterparts and setting a new state-of-the-art for most Italian conditional language generation tasks.Comment: 13 pages, 7 tables, 1 figure. Code and checkpoints available: https://github.com/gsarti/it

arXiv.org e-Print Archive

Breeding Fillmore’s Chickens and Hatching the Eggs:Recombining Frames and Roles in Frame-Semantic Parsing

Author: Minnema Gosse
Nissim Malvina
Publication venue: Association for Computational Linguistics, ACL Anthology
Publication date: 17/06/2021
Field of study

Syntactic Features and Word Similarity for Supervised Metonymy Resolution

Author: Katja Markert
Malvina Nissim
Publication venue
Publication date: 01/01/2003
Field of study

We present a supervised machine learning algorithm for metonymy resolution, which exploits the similarity between examples of conventional metonymy. We show that syntactic head-modifier relations are a high precision feature for metonymy recognition but suffer from data sparseness

CiteSeerX

Crossref

IT5: Large-scale Text-to-text Pretraining for Italian Language Understanding and Generation

Author: Nissim Malvina
Sarti Gabriele
Publication venue: 'Center for Open Science'
Publication date: 09/03/2022
Field of study

A Critical Assessment of a Method for Detecting Diachronic Meaning Shifts:Lessons Learnt from Experiments on Dutch

Author: Haagsma Hessel
Nissim Malvina
Publication venue
Publication date: 13/10/2017
Field of study

Multi-Figurative Language Generation

Author: Lai Huiyuan
Nissim Malvina
Publication venue
Publication date: 01/10/2022
Field of study

Figurative language generation is the task of reformulating a given text in the desired figure of speech while still being faithful to the original context. We take the first step towards multi-figurative language modelling by providing a benchmark for the automatic generation of five common figurative forms in English. We train mFLAG employing a scheme for multi-figurative language pre-training on top of BART, and a mechanism for injecting the target figurative information into the encoder; this enables the generation of text with the target figurative form from another figurative form without parallel figurative-figurative sentence pairs. Our approach outperforms all strong baselines. We also offer some qualitative analysis and reflections on the relationship between the different figures of speech

Breeding Fillmore’s Chickens and Hatching the Eggs:Recombining Frames and Roles in Frame-Semantic Parsing

Author: Minnema Gosse
Nissim Malvina
Publication venue: Association for Computational Linguistics, ACL Anthology
Publication date: 17/06/2021
Field of study

Frame-semantic parsers traditionally predict predicates, frames, and semantic roles in a fixed order. This paper explores the ‘chicken-or-egg’ problem of interdependencies between these components theoretically and practically. We introduce a flexible BERT-based sequence labeling architecture that allows for predicting frames and roles independently from each other or combining them in several ways. Our results show that our setups can approximate more complex traditional models’ performance, while allowing for a clearer view of the interdependencies between the pipeline’s components, and of how frame and role prediction models make different use of BERT’s layers

arXiv.org e-Print Archive

To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging

Author: Nissim Malvina
Plank Barbara
van der Goot Rob
Publication venue
Publication date: 01/01/2017
Field of study

Does normalization help Part-of-Speech (POS) tagging accuracy on noisy, non-canonical data? To the best of our knowledge, little is known on the actual impact of normalization in a real-world scenario, where gold error detection is not available. We investigate the effect of automatic normalization on POS tagging of tweets. We also compare normalization to strategies that leverage large amounts of unlabeled data kept in its raw form. Our results show that normalization helps, but does not add consistently beyond just word embedding layer initialization. The latter approach yields a tagging model that is competitive with a Twitter state-of-the-art tagger.Comment: In WNUT 201