14 research outputs found

    Representing Verbs with Visual Argument Vectors

    Get PDF
    Is it possible to use images to model verb semantic similarities? Starting from this core question, we developed two textual distributionalsemantic models and a visual one. We found it particularly interesting and challenging to investigate this Part of Speech since verbsare not often analysed in researches focused on multimodal distributional semantics. After the creation of the visual and textualdistributional space, the three models were evaluated in relation to SimLex-999, a gold standard resource. Through this evaluation,we demonstrate that, using visual distributional models, it is possible to extract meaningful information and to effectively capture thesemantic similarity between verbs

    Using Embeddings for Both Entity Recognition and Linking in Tweets

    Get PDF
    L’articolo descrive la nostra partecipazione al task di Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) a Evalita 2016. Il nostro approccio si basa sull’utilizzo di un Named Entity tagger che sfrutta embeddings sia character-level che word-level. I primi consentono di apprendere le idiosincrasie della scrittura nei tweet. L’uso di un tagger completo consente di riconoscere uno spettro più ampio di entità rispetto a quelle conosciute per la loro presenza in Knowledge Base o gazetteer. Le prove sottomesse hanno ottenuto il primo, secondo e quarto dei punteggi ufficiali.The paper describes our sub-missions to the task on Named Entity rEcognition and Linking in Italian Tweets (NEEL-IT) at Evalita 2016. Our approach relies on a technique of Named Entity tagging that exploits both charac-ter-level and word-level embeddings. Character-based embeddings allow learn-ing the idiosyncrasies of the language used in tweets. Using a full-blown Named Entity tagger allows recognizing a wider range of entities than those well known by their presence in a Knowledge Base or gazetteer. Our submissions achieved first, second and fourth top offi-cial scores

    Becoming JILDA

    Get PDF
    The difficulty in finding use-ful dialogic data to train a conversationalagent is an open issue even nowadays,when chatbots and spoken dialogue sys-tems are widely used. For this reason wedecided to build JILDA, a novel data col-lection of chat-based dialogues, producedby Italian native speakers and related to thejob-offer domain. JILDA is the first dia-logue collection related to this domain forthe Italian language. Because of its collec-tion modalities, we believe that JILDA canbe a useful resource not only for the Italianresearch community, but also for the inter-national one

    MATILDA - Multi-AnnoTator multi-language InteractiveLight-weight Dialogue Annotator

    Get PDF
    Dialogue Systems are becoming ubiquitous in various forms and shapes - virtual assistants(Siri, Alexa, etc.), chat-bots, customer sup-port, chit-chat systems just to name a few.The advances in language models and their publication have democratised advanced NLP.However, data remains a crucial bottleneck.Our contribution to this essential pillar isMATILDA, to the best of our knowledge the first multi-annotator, multi-language dialogue annotation tool. MATILDA allows the creation of corpora, the management of users, the annotation of dialogues, the quick adaptation of the user interface to any language and the resolution of inter-annotator disagreement. We evaluate the tool on ease of use, annotation speed and interannotation resolution for both experts and novices and conclude that this tool not only supports the full pipeline for dialogue annotation, but also allows non-technical people to easily use it. We are completely open-sourcing the tool at https://github.com/wluper/matilda and provide a tutorial video1

    MATE, a Meta Layer Between Natural Language and Database

    Get PDF
    Nowadays, the knowledge of query languages like SQL is mandatory for accessing the most relevant information concerning a company’s business, stored in richly structured databases. The advancement of Natural Language Processing and Deep Learning research has made it possible to develop different models for the conversion of natural language questions into formalized queries. Although the performance of these models is very satisfactory on internationally established benchmarks for the English language, it is undoubtedly necessary to investigate their portability with respect to the types of databases and natural languages used for formulating the questions. For this reason, we realised Mate (Meta lAyer between naTural language and databasE), a framework for interfacing humans and databases, thus facilitating the access in natural language to the data stored in databases. Indeed, Mate is developed with the aim of accessing information in a very simple way, from the perspective of a human-centered AI

    Aumentare i vettori tramite modelli multimodali: perche per descrivere un verbo servono le immagini

    No full text
    Nella tesi viene proposto un modello semantico distribuzionale visivo da applicare all'analisi dei verbi, una delle classi grammaticali più complesse da rappresentare distribuzionalmente ma al contempo in grado di fornire informazioni elaborate su eventi e azioni. Tramite questo progetto di ricerca si vuole dimostrare i) che il significato di un verbo può essere ottenuto dalla scomposizione e dalla distinzione dei sostantivi che insieme co-occorrono con il verbo in funzione di soggetto o di oggetto e ii) che il significato di un verbo può essere descritto in maniera efficace tramite l'utilizzo di risorse visive

    Training conversational agents to understand complex dialogues

    No full text
    Nowadays, conversational agents are inspiring the academic and non-academic world thanks to the engaging interaction they establish with the user. However, finding valuable data to train a system able to converse as human-like as possible is not a trivial task. This is even more challenging for the Italian language, where only a few dialogic datasets are available. This thesis expressly addresses this challenge, proposing JILDA (Job Interview Labelled Dialogues Assembly), a new Italian dialogue dataset for the job-offer domain, and demonstrating its practical application for the training of a conversational agent able to understand syntactically and semantically complex data. JILDA dialogues, after being annotated via MATILDA, a new annotation tool developed in collaboration with Wluper, are used to train the Natural Language Understanding module of a conversational agent, as this is an essential component of any dialogue system. Three of the most recent pretrained LMs are benchmarked: Italian BERT, Multilingual BERT, and AlBERTo. Analysing the performance obtained, it was developed JILDA 2.0, an updated version of the resource useful to realise a first step in improving NLU for Italian dialogues. Finally, this thesis frames the research topic within a global ethical framework, considering the ethical issues which emerge in human-machine interaction, the gender biases embedded in the Embodied Conversational Agents (ECAs) and their impacts on modern society

    PARAD-it: Eliciting Italian Paradigmatic Relations with Crowdsourcing

    Get PDF
    In this paper, we present a new dataset of semantically related Italian word pairs. The dataset consists of nouns, adjectives and verbs together with their synonyms, antonyms and hypernyms. The data have been collected with crowdsourcing from a pool of Italian native speakers. The dataset, the first of its kind, is useful not only to evaluate computational models of Italian semantic relations, but also for linguistic and psycholinguistic investigations of the mental lexicon

    Toward Data-Driven Collaborative Dialogue Systems: The JILDA Dataset

    No full text
    Today’s goal-oriented dialogue systems are designed to operate in restricted domains and with the implicit assumption that the user goals fit the domain ontology of the system. Under these assumptions dialogues exhibit only limited collaborative phenomena. However, this is not necessarily true in more complex scenarios, where user and system need to collaborate to align their knowledge of the domain in order to improve the conversation and achieve their goals. To foster research on data-driven collaborative dialogues, in this paper we present JILDA, a fully annotated dataset of chat-based, mixed-initiative Italian dialogues related to the job-offer domain. As far as we know, JILDA is the first dialogic corpus completely annotated in this domain. The analysis realised on top of the semantic annotations clearly shows the naturalness and greater complexity of JILDA’s dialogues. In fact, the new dataset offers a large number of examples of pragmatic phenomena, such as proactivity (i.e., providing information not explicitly requested) and grounding, which are rarely investigated in AI conversational agents based on neural architectures. In conclusion, the annotated JILDA corpus, given its innovative characteristics, represents a new challenge for conversational agents and an important resource for tackling more complex scenarios, thus advancing the state of the art in this field
    corecore