2,944 research outputs found

    Event-based Access to Historical Italian War Memoirs

    Full text link
    The progressive digitization of historical archives provides new, often domain specific, textual resources that report on facts and events which have happened in the past; among these, memoirs are a very common type of primary source. In this paper, we present an approach for extracting information from Italian historical war memoirs and turning it into structured knowledge. This is based on the semantic notions of events, participants and roles. We evaluate quantitatively each of the key-steps of our approach and provide a graph-based representation of the extracted knowledge, which allows to move between a Close and a Distant Reading of the collection.Comment: 23 pages, 6 figure

    An automatically built named entity lexicon for Arabic

    Get PDF
    We have successfully adapted and extended the automatic Multilingual, Interoperable Named Entity Lexicon approach to Arabic, using Arabic WordNet (AWN) and Arabic Wikipedia (AWK). First, we extract AWN’s instantiable nouns and identify the corresponding categories and hyponym subcategories in AWK. Then, we exploit Wikipedia inter-lingual links to locate correspondences between articles in ten different languages in order to identify Named Entities (NEs). We apply keyword search on AWK abstracts to provide for Arabic articles that do not have a correspondence in any of the other languages. In addition, we perform a post-processing step to fetch further NEs from AWK not reachable through AWN. Finally, we investigate diacritization using matching with geonames databases, MADA-TOKAN tools and different heuristics for restoring vowel marks of Arabic NEs. Using this methodology, we have extracted approximately 45,000 Arabic NEs and built, to the best of our knowledge, the largest, most mature and well-structured Arabic NE lexical resource to date. We have stored and organised this lexicon following the Lexical Markup Framework (LMF) ISO standard. We conduct a quantitative and qualitative evaluation of the lexicon against a manually annotated gold standard and achieve precision scores from 95.83% (with 66.13% recall) to 99.31% (with 61.45% recall) according to different values of a threshold

    Ontology population for open-source intelligence: A GATE-based solution

    Get PDF
    Open-Source INTelligence is intelligence based on publicly available sources such as news sites, blogs, forums, etc. The Web is the primary source of information, but once data are crawled, they need to be interpreted and structured. Ontologies may play a crucial role in this process, but because of the vast amount of documents available, automatic mechanisms for their population are needed, starting from the crawled text. This paper presents an approach for the automatic population of predefined ontologies with data extracted from text and discusses the design and realization of a pipeline based on the General Architecture for Text Engineering system, which is interesting for both researchers and practitioners in the field. Some experimental results that are encouraging in terms of extracted correct instances of the ontology are also reported. Furthermore, the paper also describes an alternative approach and provides additional experiments for one of the phases of our pipeline, which requires the use of predefined dictionaries for relevant entities. Through such a variant, the manual workload required in this phase was reduced, still obtaining promising results

    Survey on Evaluation Methods for Dialogue Systems

    Get PDF
    In this paper we survey the methods and concepts developed for the evaluation of dialogue systems. Evaluation is a crucial part during the development process. Often, dialogue systems are evaluated by means of human evaluations and questionnaires. However, this tends to be very cost and time intensive. Thus, much work has been put into finding methods, which allow to reduce the involvement of human labour. In this survey, we present the main concepts and methods. For this, we differentiate between the various classes of dialogue systems (task-oriented dialogue systems, conversational dialogue systems, and question-answering dialogue systems). We cover each class by introducing the main technologies developed for the dialogue systems and then by presenting the evaluation methods regarding this class

    Advancing Italian Biomedical Information Extraction with Large Language Models: Methodological Insights and Multicenter Practical Application

    Full text link
    The introduction of computerized medical records in hospitals has reduced burdensome operations like manual writing and information fetching. However, the data contained in medical records are still far underutilized, primarily because extracting them from unstructured textual medical records takes time and effort. Information Extraction, a subfield of Natural Language Processing, can help clinical practitioners overcome this limitation, using automated text-mining pipelines. In this work, we created the first Italian neuropsychiatric Named Entity Recognition dataset, PsyNIT, and used it to develop a Large Language Model for this task. Moreover, we conducted several experiments with three external independent datasets to implement an effective multicenter model, with overall F1-score 84.77%, Precision 83.16%, Recall 86.44%. The lessons learned are: (i) the crucial role of a consistent annotation process and (ii) a fine-tuning strategy that combines classical methods with a "few-shot" approach. This allowed us to establish methodological guidelines that pave the way for future implementations in this field and allow Italian hospitals to tap into important research opportunities

    Sentiment analysis in context: Investigating the use of BERT and other techniques for ChatBot improvement

    Get PDF
    openIn an increasingly digitized world, where large amounts of data are generated daily, its efficient analysis has become more and more stringent. Natural Language Processing (NLP) offers a solution by exploiting the power of artificial intelligence to process texts, to understand their content and to perform specific tasks. The thesis is based on an internship at Pat Srl, a company devoted to create solutions to support digital innovation, process automation, and service quality with the ultimate goal of improving leadership and customer satisfaction. The primary objective of this thesis is to develop a sentiment analysis model in order to improve the customer experience for clients using the ChatBot system created by the company itself. This task has gained significant attention in recent years as it can be applied to different fields, including social media monitoring, market research, brand monitoring or customer experience and feedback analysis. Following a careful analysis of the available data, a comprehensive evaluation of various models was conducted. Notably, BERT, a large language model that has provided promising results in several NLP tasks, emerged among all. Different approaches utilizing the BERT models were explored, such as the fine-tuning modality or the architectural structure. Moreover, some preprocessing steps of the data were emphasized and studied, due to the particular nature of the sentiment analysis task. During the course of the internship, the dataset underwent revisions aimed to mitigate the problem of inaccurate predictions. Additionally, techniques for data balancing were tested and evaluated, enhancing the overall quality of the analysis. Another important aspect of this project involved the deployment of the model. In a business environment, it is essential to carefully consider and balance resources before transitioning to production. The model distribution was carried out using specific tools, such as Docker and Kubernetes. These specialized technologies played a pivotal role in ensuring efficient and seamless deployment.In an increasingly digitized world, where large amounts of data are generated daily, its efficient analysis has become more and more stringent. Natural Language Processing (NLP) offers a solution by exploiting the power of artificial intelligence to process texts, to understand their content and to perform specific tasks. The thesis is based on an internship at Pat Srl, a company devoted to create solutions to support digital innovation, process automation, and service quality with the ultimate goal of improving leadership and customer satisfaction. The primary objective of this thesis is to develop a sentiment analysis model in order to improve the customer experience for clients using the ChatBot system created by the company itself. This task has gained significant attention in recent years as it can be applied to different fields, including social media monitoring, market research, brand monitoring or customer experience and feedback analysis. Following a careful analysis of the available data, a comprehensive evaluation of various models was conducted. Notably, BERT, a large language model that has provided promising results in several NLP tasks, emerged among all. Different approaches utilizing the BERT models were explored, such as the fine-tuning modality or the architectural structure. Moreover, some preprocessing steps of the data were emphasized and studied, due to the particular nature of the sentiment analysis task. During the course of the internship, the dataset underwent revisions aimed to mitigate the problem of inaccurate predictions. Additionally, techniques for data balancing were tested and evaluated, enhancing the overall quality of the analysis. Another important aspect of this project involved the deployment of the model. In a business environment, it is essential to carefully consider and balance resources before transitioning to production. The model distribution was carried out using specific tools, such as Docker and Kubernetes. These specialized technologies played a pivotal role in ensuring efficient and seamless deployment
    • …
    corecore