Search CORE

136 research outputs found

An investigation on the impact of natural language on conversational recommendations

Author: de Gemmis M.
Iovine A.
Narducci F.
Semeraro G.
Publication venue: CEUR-WS
Publication date: 01/01/2021
Field of study

In this paper, we investigate the combination of Virtual Assistants and Conversational Recommender Systems (CoRSs) by designing and implementing a framework named ConveRSE, for building chatbots that can recommend items from different domains and interact with the user through natural language. An user experiment was carried out to understand how natural language influences both the cost of interaction and recommendation accuracy of a CoRS. Experimental results show that natural language can indeed improve user experience, but some critical aspects of the interaction should be mitigated appropriately

Archivio istituzionale della ricerca - Università di Bari

Analysis of lexical semantic changes in corpora with the diachronic engine

Author: Basile P.
Cassotti P.
de Gemmis M.
Semeraro G.
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

With the growing availability of digitized diachronic corpora, the need for tools capable of taking into account the diachronic component of corpora becomes ever more pressing. Recent works on diachronic embeddings show that computational approaches to the diachronic analysis of language seem to be promising, but they are not user friendly for people without a technical background. This paper presents the Diachronic Engine, a system for the diachronic analysis of corpora lexical features. Diachronic Engine computes word frequency, concordances and collocations taking into account the temporal dimension. It is also able to compute temporal word embeddings and time-series that can be exploited for lexical semantic change detection

Archivio istituzionale della ricerca - Università di Bari

A comparative study of approaches for the diachronic analysis of the Italian language

Author: Basile P.
Cassotti P.
De Gemmis M.
Semeraro G.
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic drift. Most of those approaches rely on diachronic word embeddings. Some of them are created as post-processing of static word embeddings, while others produce dynamic word embeddings where vectors share the same geometric space for all time slices. The large majority of the methods use English as the target language for the diachronic analysis, while other languages remain under-explored. In this work, we compare state-of-the-art approaches in computational historical linguistics to evaluate the pros and cons of each model, and we present the results of an in-depth analysis conducted using an Italian diachronic corpus. Specifically, several approaches based on both static embeddings and dynamic ones are implemented and evaluated by using the Kronos-It dataset. We train all word embeddings on the Italian Google n-gram corpus. The main result of the evaluation is that all approaches fail to significantly reduce the number of false-positive change points, which confirms that lexical semantic change is still a challenging task

Archivio istituzionale della ricerca - Università di Bari

A deep learning model for the analysis of medical reports in ICD-10 clinical coding task

Author: Basile P.
de Gemmis M.
Lops P.
Polignano M.
Semeraro G.
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

The practice of assigning a uniquely identifiable and easily traceable code to pathology from medical diagnoses is an added value to the current modality of archiving health data collected to build the clinical history of each of us. Unfortunately, the enormous amount of possible pathologies and medical conditions has led to the realization of extremely wide international codifications that are difficult to consult even for a human being. This difficulty makes the practice of annotation of diagnoses with ICD-10 codes very cumbersome and rarely performed. In order to support this operation, a classification model was proposed, able to analyze medical diagnoses written in natural language and automatically assign one or more international reference codes. The model has been evaluated on a dataset released in the Spanish language for the eHealth challenge (CodiEsp) of the international conference CLEF 2020, but it could be extended to any language with latin characters. We proposed a model based on a two-step classification process based on BERT and BiLSTM. Although still far from an accuracy sufficient to do without a licensed physician opinion, the results obtained show the feasibility of the task and are a starting point for future studies in this direction

Archivio istituzionale della ricerca - Università di Bari

AlBERTo: Italian BERT language understanding model for NLP challenging tasks based on tweets

Author: Basile P.
Basile V.
de Gemmis M.
Polignano M.
Semeraro G.
Publication venue: CEUR
Publication date: 01/01/2019
Field of study

Institutional Research Information System University of Turin

A study of Machine Learning models for Clinical Coding of Medical Reports at CodiEsp 2020

Author: de Gemmis M.
Lops P.
Polignano M.
Semeraro G.
Suriano V.
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

The task of identifying one or more diseases associated with a patient’s clinical condition is often very complex, even for doctors and specialists. This process is usually time-consuming and has to take into account different aspects of what has occurred, including symptoms elicited and previous healthcare situations. The medical diagnosis is often provided to patients in the form of written paper without any correlation with a national or international standard. Even if the WHO (World Health Organization) released the ICD10 international glossary of diseases, almost no doctor has enough time to manually associate the patient’s clinical history with international codes. The CodiEsp task at CLEF 2020 addressed this issue by proposing the development of an automatic system to deal with this task. Our solution investigated different machine learning strategies in order to identify an approach to face that challenge. The main outcomes of the experiments showed that a strategy based on BERT for pre-filtering and one based on BiLSTMCNN-SelfAttention for classification provide valuable results. We carried out several experiments on a subset of the training set for tuning the final model submitted to the challenge. In particular, we analyzed the impact of the algorithm, the input encoding strategy, and the thresholds for multi-label classification. A set of experiments has been carried out also during a post hoc analysis. The experiments confirmed that the strategy submitted to the CodiEsp task is the best performing one among those evaluated, and it allowed us to obtain a final mean average error value on the test set equal to 0.202. To support future developments of the proposed approach and the replicability of the experiments we decided to make the source code publicly accessible

Archivio istituzionale della ricerca - Università di Bari

A comparison of services for intent and entity recognition for conversational recommender systems

Author: Basile P.
de Gemmis M.
Iovine A.
Narducci F.
Polignano M.
Semeraro G.
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

Conversational Recommender Systems (CoRSs) are becoming increasingly popular. However, designing and developing a CoRS is a challenging task since it requires multi-disciplinary skills. Even though several third-party services are available for supporting the creation of a CoRS, a comparative study of these platforms for the specific recommendation task is not available yet. In this work, we focus our attention on two crucial steps of the Conversational Recommendation (CoR) process, namely Intent and Entity Recognition. We compared four of the most popular services, both commercial and open source. Furthermore, we proposed two custom-made solutions for Entity Recognition, whose aim is to overcome the limitations of the other services. Results are very interesting and give a clear picture of the strengths and weaknesses of each solution

Archivio istituzionale della ricerca - Università di Bari

A domain-independent framework for building conversational recommender systems

Author: Basile P.
De Gemmis M.
Iovine A.
Lops P.
Narducci F.
Semeraro G.
Publication venue: CEUR-WS
Publication date: 01/01/2018
Field of study

Conversational Recommender Systems (CoRSs) implement a paradigm where users can interact with the system for defining their preferences and discovering items that best fit their needs. A CoRS can be straightforwardly implemented as a chatbot. Chatbots are becoming more and more popular for several applications like customer care, health care, medical diagnoses. In the most complex form, the implementation of a chatbot is a challenging task since it requires knowledge about natural language processing, human-computer interaction, and so on. In this paper, we propose a general framework for making easy the generation of conversational recommender systems. The framework, based on a content-based recommendation algorithm, is independent from the domain. Indeed, it allows to build a conversational recommender system with different interaction modes (natural language, buttons, hybrid) for any domain. The framework has been evaluated on two state-of-the-art datasets with the aim of identifying the components that mainly influence the final recommendation accuracy

Archivio istituzionale della ricerca - Università di Bari

Extracting relations from Italian wikipedia using self-training

Author: Basile P.
Cassotti P.
de Gemmis M.
Lops P.
Semeraro G.
Siciliani L.
Publication venue: CEUR-WS
Publication date: 01/01/2021
Field of study

In this paper, we describe a supervised approach for extracting relations from Wikipedia. In particular, we exploit a self-training strategy for enriching a small number of manually labeled triples with new self-labeled examples. We integrate the supervised stage in WikiOIE, an existing framework for unsupervised extraction of relations from Wikipedia. We rely on WikiOIE and its unsupervised pipeline for extracting the initial set of unlabelled triples. An evaluation involving different algorithms and parameters proves that self-training helps to improve performance. Finally, we provide a dataset of about three million triples extracted from the Italian version of Wikipedia and perform a preliminary evaluation conducted on a sample dataset, obtaining promising results

Archivio istituzionale della ricerca - Università di Bari

A virtual customer assistant for the wealth management domain in the UWMP project

Author: de Gemmis M.
Filisetti D.
Ingoglia D.
Iovine A.
Lekkas G.
Narducci F.
Semeraro G.
Publication venue: CEUR-WS
Publication date: 01/01/2020
Field of study

The Universal Wealth Management Platform (UWMP) project has the objective of creating a new service model in the financial domain. An integral part of this service model is the creation of a new Virtual Customer Assistant, that is able to assist customers via natural language dialogues. This paper is a report of the activities performed to develop this assistant. It illustrates a general architecture of the system, and describes the most important decisions made for its implementation. It also describes the main financial operations that it is able to assist customers with. Finally, it delineates some avenues for future work

Archivio istituzionale della ricerca - Università di Bari