21 research outputs found
The Perfect Recipe: Add SUGAR, Add Data
We present the FBK participation at the EVALITA 2018 Shared Task ``SUGAR -- Spoken Utterances Guiding Chef's Assistant Robots''. There are two peculiar, and challenging, characteristics of the task: first, the amount of available training data is very limited; second, training consists of pairs \texttt{[audio-utterance, system-action]}, without any intermediate representation. Given the characteristics of the task, we experimented two different approaches: (i) design and implement a neural architecture that can use as less training data as possible, and (ii) use a state of art tagging system, and then augment the initial training set with synthetically generated data. In the paper we present the two approaches, and show the results obtained by their respective runs
FBK-HLT: An Application of Semantic Textual Similarity for Answer Selection in Community Question Answering
This paper reports the description and perfor- mance of our system, FBK-HLT, participating in the SemEval 2015, Task #3 "Answer Se- lection in Community Question Answering" for English, for both subtasks. We submit two runs with different classifiers in combining typ- ical features (lexical similarity, string similar- ity, word n-grams, etc.) with machine transla- tion evaluation metrics and with some ad hoc features (e.g user overlapping, spam filtering). We outperform the baseline system and achieve interesting results on both subtasks
Whatās in a Food Name: Knowledge Induction from Gazetteers of Food Main Ingredient
We investigate head-noun identification in complex noun-compounds (e.g. table is the head-noun in three legs table with white marble top). The task is of high relevancy in several application scenarios, including utterance interpretation for dialogue systems, particularly in the context of e-commerce applications, where dozens of thousand of product descriptions for several domains and different languages have to be analyzed. We define guidelines for data annotation and propose a supervised neural model that is able to achieve 0.79 F1 on Italian food noun-compounds, which we consider an excellent result given both the minimal supervision required and the high linguistic complexity of the domain.Affrontiamo il problema di identificare head-noun in nomi composti complessi (ad esempio ātavoloā is the headnoun in ātavolo con tre gambe e piano in marmo biancoā). Il compito Ć© di alta rilevanza in numerosi contesti applicativi, inclusa lāinterpretazione di enunciati nei sistemi di dialogo, in particolare nelle applicazioni di e-commerce, dove decine di migliaia di descrizioni di prodotti per vari domini e lingue differenti devono essere analizzate. Proponiamo un modello neurale supervisionato che riesce a raggiungere lo 0.79 di F-measure, che consideriamo un risultato eccellente data la minima quantitĆ” di supervisione richiesta e la alta complessitĆ” linguistica del dominio
FBK-HLT: An Effective System for Paraphrase Identification and Semantic Similarity in Twitter
This paper reports the description and perfor- mance of our system, FBK-HLT, participating in the SemEval 2015, Task #1 "Paraphrase and Semantic Similarity in Twitter", for both sub- tasks. We submitted two runs with different classifiers in combining typical features (lexi- cal similarity, string similarity, word n-grams, etc) with machine translation metrics and edit distance features. We outperform the baseline system and achieve a very competitive result to the best system on the first subtask. Eventually, we are ranked 4th out of 18 teams participating in subtask "Paraphrase Identification"
FBK-HLT: A New Framework for Semantic Textual Similarity
This paper reports the description and perfor- mance of our system, FBK-HLT, participat- ing in the SemEval 2015, Task #2 āSemantic Textual Similarityā, English subtask. We sub- mitted three runs with different hypothesis in combining typical features (lexical similarity, string similarity, word n-grams, etc) with syn- tactic structure features, resulting in different sets of features. The results evaluated on both STS 2014 and 2015 datasets prove our hypoth- esis of building a STS system taking into con- sideration of syntactic information. We out- perform the best system on STS 2014 datasets and achieve a very competitive result to the best system on STS 2015 datasets
Proceedings of the Fifth Italian Conference on Computational Linguistics CLiC-it 2018
On behalf of the Program Committee, a very warm welcome to the Fifth Italian Conference on Computational Linguistics (CLiC-Āāit 2018). This edition of the conference is held in Torino. The conference is locally organised by the University of Torino and hosted into its prestigious main lecture hall āCavallerizza Realeā. The CLiC-Āāit conference series is an initiative of the Italian Association for Computational Linguistics (AILC) which, after five years of activity, has clearly established itself as the premier national forum for research and development in the fields of Computational Linguistics and Natural Language Processing, where leading researchers and practitioners from academia and industry meet to share their research results, experiences, and challenges
Predicting Correlations Between Lexical Alignments and Semantic Inferences
While there is a strong intuition that word alignments (e.g. synonymy, hyperonymy) play a relevant role in recognizing text-to-text semantic inferences (e.g. textual entailment, semantic similarity), this intuition is often not reflected in the system performances and there is a general need of a deeper comprehension of the role of lexical resources. This paper provides an empirical analysis of the dependencies between data-sets, lexical resources and algorithms that are commonly used in text-to-text inference tasks. We define a resource impact index , based on lexical alignments between pairs of texts, and
show that such index is significantly correlated with the performance of different textual entailment algorithms. The result is an operational, algorithm-independent, procedure for predicting the performance
of a class of available RTE algorithms
Comparing Machine Learning and Deep Learning Approaches on NLP Tasks for the Italian Language
We present a comparison between deep learning and traditional machine learning methods for various NLP tasks in Italian. We carried on experiments using available datasets (e.g., from the Evalita shared tasks) on two sequence tagging tasks (i.e., named entities recognition and nominal entities recognition) and four classification tasks (i.e., lexical relations among words, semantic relations among sentences, sentiment analysis and text classification). We show that deep learning approaches outperform traditional machine learning algorithms in sequence tagging, while for classification tasks that heavily rely on semantics approaches based on feature engineering are still competitive. We think that a similar analysis could be carried out for other languages to provide an assessment of machine learning / deep learning models across different languages