115 research outputs found

    Analysis of lexical semantic changes in corpora with the diachronic engine

    Get PDF
    With the growing availability of digitized diachronic corpora, the need for tools capable of taking into account the diachronic component of corpora becomes ever more pressing. Recent works on diachronic embeddings show that computational approaches to the diachronic analysis of language seem to be promising, but they are not user friendly for people without a technical background. This paper presents the Diachronic Engine, a system for the diachronic analysis of corpora lexical features. Diachronic Engine computes word frequency, concordances and collocations taking into account the temporal dimension. It is also able to compute temporal word embeddings and time-series that can be exploited for lexical semantic change detection

    A comparative study of approaches for the diachronic analysis of the Italian language

    Get PDF
    In recent years, there has been a significant increase in interest in lexical semantic change detection. Many are the existing approaches, data used, and evaluation strategies to detect semantic drift. Most of those approaches rely on diachronic word embeddings. Some of them are created as post-processing of static word embeddings, while others produce dynamic word embeddings where vectors share the same geometric space for all time slices. The large majority of the methods use English as the target language for the diachronic analysis, while other languages remain under-explored. In this work, we compare state-of-the-art approaches in computational historical linguistics to evaluate the pros and cons of each model, and we present the results of an in-depth analysis conducted using an Italian diachronic corpus. Specifically, several approaches based on both static embeddings and dynamic ones are implemented and evaluated by using the Kronos-It dataset. We train all word embeddings on the Italian Google n-gram corpus. The main result of the evaluation is that all approaches fail to significantly reduce the number of false-positive change points, which confirms that lexical semantic change is still a challenging task

    A deep learning model for the analysis of medical reports in ICD-10 clinical coding task

    Get PDF
    The practice of assigning a uniquely identifiable and easily traceable code to pathology from medical diagnoses is an added value to the current modality of archiving health data collected to build the clinical history of each of us. Unfortunately, the enormous amount of possible pathologies and medical conditions has led to the realization of extremely wide international codifications that are difficult to consult even for a human being. This difficulty makes the practice of annotation of diagnoses with ICD-10 codes very cumbersome and rarely performed. In order to support this operation, a classification model was proposed, able to analyze medical diagnoses written in natural language and automatically assign one or more international reference codes. The model has been evaluated on a dataset released in the Spanish language for the eHealth challenge (CodiEsp) of the international conference CLEF 2020, but it could be extended to any language with latin characters. We proposed a model based on a two-step classification process based on BERT and BiLSTM. Although still far from an accuracy sufficient to do without a licensed physician opinion, the results obtained show the feasibility of the task and are a starting point for future studies in this direction

    Extracting relations from Italian wikipedia using self-training

    Get PDF
    In this paper, we describe a supervised approach for extracting relations from Wikipedia. In particular, we exploit a self-training strategy for enriching a small number of manually labeled triples with new self-labeled examples. We integrate the supervised stage in WikiOIE, an existing framework for unsupervised extraction of relations from Wikipedia. We rely on WikiOIE and its unsupervised pipeline for extracting the initial set of unlabelled triples. An evaluation involving different algorithms and parameters proves that self-training helps to improve performance. Finally, we provide a dataset of about three million triples extracted from the Italian version of Wikipedia and perform a preliminary evaluation conducted on a sample dataset, obtaining promising results

    A domain-independent framework for building conversational recommender systems

    Get PDF
    Conversational Recommender Systems (CoRSs) implement a paradigm where users can interact with the system for defining their preferences and discovering items that best fit their needs. A CoRS can be straightforwardly implemented as a chatbot. Chatbots are becoming more and more popular for several applications like customer care, health care, medical diagnoses. In the most complex form, the implementation of a chatbot is a challenging task since it requires knowledge about natural language processing, human-computer interaction, and so on. In this paper, we propose a general framework for making easy the generation of conversational recommender systems. The framework, based on a content-based recommendation algorithm, is independent from the domain. Indeed, it allows to build a conversational recommender system with different interaction modes (natural language, buttons, hybrid) for any domain. The framework has been evaluated on two state-of-the-art datasets with the aim of identifying the components that mainly influence the final recommendation accuracy

    A study of Machine Learning models for Clinical Coding of Medical Reports at CodiEsp 2020

    Get PDF
    The task of identifying one or more diseases associated with a patient’s clinical condition is often very complex, even for doctors and specialists. This process is usually time-consuming and has to take into account different aspects of what has occurred, including symptoms elicited and previous healthcare situations. The medical diagnosis is often provided to patients in the form of written paper without any correlation with a national or international standard. Even if the WHO (World Health Organization) released the ICD10 international glossary of diseases, almost no doctor has enough time to manually associate the patient’s clinical history with international codes. The CodiEsp task at CLEF 2020 addressed this issue by proposing the development of an automatic system to deal with this task. Our solution investigated different machine learning strategies in order to identify an approach to face that challenge. The main outcomes of the experiments showed that a strategy based on BERT for pre-filtering and one based on BiLSTMCNN-SelfAttention for classification provide valuable results. We carried out several experiments on a subset of the training set for tuning the final model submitted to the challenge. In particular, we analyzed the impact of the algorithm, the input encoding strategy, and the thresholds for multi-label classification. A set of experiments has been carried out also during a post hoc analysis. The experiments confirmed that the strategy submitted to the CodiEsp task is the best performing one among those evaluated, and it allowed us to obtain a final mean average error value on the test set equal to 0.202. To support future developments of the proposed approach and the replicability of the experiments we decided to make the source code publicly accessible

    A comparison of services for intent and entity recognition for conversational recommender systems

    Get PDF
    Conversational Recommender Systems (CoRSs) are becoming increasingly popular. However, designing and developing a CoRS is a challenging task since it requires multi-disciplinary skills. Even though several third-party services are available for supporting the creation of a CoRS, a comparative study of these platforms for the specific recommendation task is not available yet. In this work, we focus our attention on two crucial steps of the Conversational Recommendation (CoR) process, namely Intent and Entity Recognition. We compared four of the most popular services, both commercial and open source. Furthermore, we proposed two custom-made solutions for Entity Recognition, whose aim is to overcome the limitations of the other services. Results are very interesting and give a clear picture of the strengths and weaknesses of each solution

    Early findings from a large-scale user study of CHESTNUT: Validations and implications

    Get PDF
    Towards a serendipitous recommender system with user-centred understanding, we have built CHESTNUT , an Information Theory-based Movie Recommender System, which introduced a more comprehensive understanding of the concept. Although off-line evaluations have already demonstrated that CHESTNUT has greatly improved serendip-ity performance, feedback on CHESTNUT from real-world users through online services are still unclear now. In order to evaluate how serendip-itous results could be delivered by CHESTNUT , we consequently designed , organized and conducted large-scale user study, which involved 104 participants from 10 campuses in 3 countries. Our preliminary feedback has shown that, compared with mainstream collaborative filtering techniques, though CHESTNUT limited users' feelings of unex-pectedness to some extent, it showed significant improvement in their feelings about certain metrics being both beneficial and interesting, which substantially increased users' experience of serendipity. Based on them, we have summarized three key takeaways, which would be beneficial for further designs and engineering of serendipitous recommender systems, from our perspective. All details of our large-scale user study could be found at https://github.com/unnc-idl-ucc/Early-Lessons-From-CHESTNU

    CHESTNUT: Improve serendipity in movie recommendation by an Information Theory-based collaborative filtering approach

    Get PDF
    The term serendipity has been understood narrowly in the Recommender System. Applying a user-centered approach, user-friendly serendipitous recommender systems are expected to be developed based on a good understanding of serendipity. In this paper, we introduce CHESTNUT , a memory-based movie collaborative filtering system to improve serendipity performance. Relying on a proposed Information Theory-based algorithm and previous study, we demonstrate a method of successfully injecting insight, unexpectedness and usefulness, which are key metrics for a more comprehensive understanding of serendipity, into a practical serendipitous runtime system. With lightweight experiments, we have revealed a few runtime issues and further optimized the same. We have evaluated CHESTNUT in both practicability and effectiveness , and the results show that it is fast, scalable and improves serendip-ity performance significantly, compared with mainstream memory-based collaborative filtering. The source codes of CHESTNUT are online at https://github.com/unnc-idl-ucc/CHESTNUT/
    • …
    corecore