Search CORE

122 research outputs found

Generating High-Quality Surface Realizations Using Data Augmentation and Factored Sequence Models

Author: Elder Henry
Hokamp Chris
Publication venue
Publication date: 01/01/2018
Field of study

This work presents a new state of the art in reconstruction of surface realizations from obfuscated text. We identify the lack of sufficient training data as the major obstacle to training high-performing models, and solve this issue by generating large amounts of synthetic training data. We also propose preprocessing techniques which make the structure contained in the input features more accessible to sequence models. Our models were ranked first on all evaluation metrics in the English portion of the 2018 Surface Realization shared task

arXiv.org e-Print Archive

Crossref

Predictive performance comparisons of different feature extraction methods in a financial column corpus

Author: Andrea Sciandra
Riccardo Ferretti
Publication venue: Pearson
Publication date: 01/01/2022
Field of study

Questo contributo riguarda il trattamento di un corpus costituito da una rubrica finanziaria settimanale. In particolare, ci siamo concentrati sull'estrazione di indici a livello di documento e sull'estrazione di variabili testuali. Inoltre, abbiamo confrontato alcuni metodi di estrazione delle variabili per valutare la loro capacità predittiva. I risultati confermano l'ipotesi che i vettori derivati dal word embedding non migliorano la capacità predittiva rispetto ad altri metodi di estrazione delle variabili, ma restano una risorsa fondamentale per cogliere la semantica nei testi.This work concerns the processing of a corpus made up of a financial weekly column. Specifically, we focused on document-level index extraction and textual feature extraction. Moreover, some feature extraction methods had been compared to evaluate their predictive capacity. Results confirm the hypothesis that vectors derived from word embedding do not improve the predictive power compared to other feature extraction methods but remain a fundamental resource for capturing semantics in texts

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia

Archivio istituzionale della ricerca - Università di Padova

Russian word sense induction by clustering averaged word embeddings

Author: Kutuzov Andrey
Publication venue
Publication date: 01/01/2018
Field of study

The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE-2018). Our team was ranked 2nd for the wiki-wiki dataset (containing mostly homonyms) and 5th for the bts-rnc and active-dict datasets (containing mostly polysemous words) among all 19 participants. The method we employed was extremely naive. It implied representing contexts of ambiguous words as averaged word embedding vectors, using off-the-shelf pre-trained distributional models. Then, these vector representations were clustered with mainstream clustering techniques, thus producing the groups corresponding to the ambiguous word senses. As a side result, we show that word embedding models trained on small but balanced corpora can be superior to those trained on large but noisy data - not only in intrinsic evaluation, but also in downstream tasks like word sense induction.Comment: Proceedings of the 24rd International Conference on Computational Linguistics and Intellectual Technologies (Dialogue-2018

arXiv.org e-Print Archive

NORA - Norwegian Open Research Archives

MaskParse@Deskin at SemEval-2019 Task 1: Cross-lingual UCCA Semantic Parsing using Recursive Masked Sequence Tagging

Author: Damnati Geraldine
Heinecke Johannes
Marzinotto Gabriel
Publication venue: HAL CCSD
Publication date: 06/06/2019
Field of study

International audienceThis paper describes our recursive system for SemEval-2019 \textit{ Task 1: Cross-lingual Semantic Parsing with UCCA}. Each recursive step consists of two parts. We first perform semantic parsing using a sequence tagger to estimate the probabilities of the UCCA categories in the sentence. Then, we apply a decoding policy which interprets these probabilities and builds the graph nodes. Parsing is done recursively, we perform a first inference on the sentence to extract the main scenes and links and then we recursively apply our model on the sentence using a masking feature that reflects the decisions made in previous steps. Process continues until the terminal nodes are reached. We choose a standard neural tagger and we focused on our recursive parsing strategy and on the cross lingual transfer problem to develop a robust model for the French language, using only few training samples

arXiv.org e-Print Archive

HAL AMU

Neural Surface Realization for Italian

Author: Basile Valerio
Mazzei Alessandro
Publication venue: CEUR
Publication date: 01/01/2018
Field of study

Institutional Research Information System University of Turin

Viable Dependency Parsing as Sequence Labeling

Author: Gómez-Rodríguez Carlos
Strzyz Michalina
Vilares David
Publication venue
Publication date: 01/01/2019
Field of study

We recast dependency parsing as a sequence labeling problem, exploring several encodings of dependency trees as labels. While dependency parsing by means of sequence labeling had been attempted in existing work, results suggested that the technique was impractical. We show instead that with a conventional BiLSTM-based model it is possible to obtain fast and accurate parsers. These parsers are conceptually simple, not needing traditional parsing algorithms or auxiliary structures. However, experiments on the PTB and a sample of UD treebanks show that they provide a good speed-accuracy tradeoff, with results competitive with more complex approaches.Comment: Camera-ready version to appear at NAACL 2019 (final peer-reviewed manuscript). 8 pages (incl. appendix

arXiv.org e-Print Archive

Repositorio da Universidade da Coruña

Crossref

The Tenuousness of lemmatization in lexicon-based sentiment analysis

Author: Basile Valerio
Bosco Cristina
Giuliano Gabrieli
Marco Vassallo
Publication venue: Ceur
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin