25 research outputs found

    MQALD: Evaluating the impact of modifiers in question answering over knowledge graphs.

    Get PDF
    Question Answering (QA) over Knowledge Graphs (KG) aims to develop a system that is capable of answering users’ questions using the information coming from one or multiple Knowledge Graphs, like DBpedia, Wikidata, and so on. Question Answering systems need to translate the user’s question, written using natural language, into a query formulated through a specific data query language that is compliant with the underlying KG. This translation process is already non-trivial when trying to answer simple questions that involve a single triple pattern. It becomes even more troublesome when trying to cope with questions that require modifiers in the final query, i.e., aggregate functions, query forms, and so on. The attention over this last aspect is growing but has never been thoroughly addressed by the existing literature. Starting from the latest advances in this field, we want to further step in this direction. This work aims to provide a publicly available dataset designed for evaluating the performance of a QA system in translating articulated questions into a specific data query language. This dataset has also been used to evaluate three QA systems available at the state of the art

    WiC-ITA at EVALITA2023: Overview of the EVALITA2023 Word-in-Context for ITAlian Task

    Get PDF
    WiC-ita is a shared task proposed at the EVALITA 2023 campaign. The task focuses on the meaning of words in specific contexts and has been modelled as both a binary classification and a ranking problem. Overall, 4 groups took part in both subtasks, with 9 different runs. In this report, we describe how the task was set up, we report the system results, and we discuss them

    LLaMAntino: LLaMA 2 Models for Effective Text Generation in Italian Language

    Full text link
    Large Language Models represent state-of-the-art linguistic models designed to equip computers with the ability to comprehend natural language. With its exceptional capacity to capture complex contextual relationships, the LLaMA (Large Language Model Meta AI) family represents a novel advancement in the field of natural language processing by releasing foundational models designed to improve the natural language understanding abilities of the transformer architecture thanks to their large amount of trainable parameters (7, 13, and 70 billion parameters). In many natural language understanding tasks, these models obtain the same performances as private company models such as OpenAI Chat-GPT with the advantage to make publicly available weights and code for research and commercial uses. In this work, we investigate the possibility of Language Adaptation for LLaMA models, explicitly focusing on addressing the challenge of Italian Language coverage. Adopting an open science approach, we explore various tuning approaches to ensure a high-quality text generated in Italian suitable for common tasks in this underrepresented language in the original models' datasets. We aim to release effective text generation models with strong linguistic properties for many tasks that seem challenging using multilingual or general-purpose LLMs. By leveraging an open science philosophy, this study contributes to Language Adaptation strategies for the Italian language by introducing the novel LLaMAntino family of Italian LLMs

    Is Explanation All You Need? An Expert Survey on LLM-generated Explanations for Abusive Language Detection

    Get PDF
    Explainable abusive language detection has proven to help both users and content moderators, and recent research has focused on prompting LLMs to generate explanations for why a specific text is hateful. Yet, understanding the alignment of these generated explanations with human expectations and judgements is far from being solved. In this paper, we design a before-and-after study recruiting AI experts to evaluate the usefulness and trustworthiness of LLM-generated explanations for abusive language detection tasks, investigating multiple LLMs and learning strategies. Our experiments show that expectations in terms of usefulness and trustworthiness of LLM-generated explanations are not met, as their ratings decrease by 47.78% and 64.32%, respectively, after treatment. Further, our results suggest caution in using LLMs for explanation generation of abusive language detection due to (i) their cultural bias, and (ii) difficulty in reliably evaluating them with empirical metrics. In light of our results, we provide three recommendations to use LLMs responsibly for explainable abusive language detection

    Bi-directional LSTM-CNNs-CRF for Italian Sequence Labeling and Multi-Task Learning

    Get PDF
    In this paper, we propose a Deep Learning architecture for several Italian Natural Language Processing tasks based on a state of the art model that exploits both word- and character-level representations through the combination of bidirectional LSTM, CNN and CRF. This architecture provided state of the art performance in several sequence labeling tasks for the English language. We exploit the same approach for the Italian language and extend it for performing a multi-task learning involving PoS-tagging and sentiment analysis. Results show that the system is able to achieve state of the art performance in all the tasks and in some cases overcomes the best systems previously developed for the Italian

    Semantically-Aware Retrieval of Oceanographic Phenomena Annotated on Satellite Images

    Get PDF
    Scientists in the marine domain process satellite images in order to extract information that can be used for monitoring, understanding, and forecasting of marine phenomena, such as turbidity, algal blooms and oil spills. The growing need for effective retrieval of related information has motivated the adoption of semantically aware strategies on satellite images with different spatiotemporal and spectral characteristics. A big issue of these approaches is the lack of coincidence between the information that can be extracted from the visual data and the interpretation that the same data have for a user in a given situation. In this work, we bridge this semantic gap by connecting the quantitative elements of the Earth Observation satellite images with the qualitative information, modelling this knowledge in a marine phenomena ontology and developing a question answering mechanism based on natural language that enables the retrieval of the most appropriate data for each user’s needs. The main objective of the presented methodology is to realize the content-based search of Earth Observation images related to the marine application domain on an application-specific basis that can answer queries such as “Find oil spills that occurred this year in the Adriatic Sea”

    Ghigliottin-AI @ EVALITA2020: Evaluating Artificial Players for the Language Game “La Ghigliottina”

    Get PDF
    Evaluating Artificial Players for the Language Game “La Ghigliottina” (Ghigliottin-AI) task is one of the tasks organized in the context of the 2020 EVALITA edition, a periodic evaluation campaign of Natural Language Processing (NLP) and speech tools for the Italian language. Ghigliottin-AI participants are asked to build an artificial player able to solve “La Ghigliottina”, namely the final game of an Italian TV show called “L’Eredità”. The game involves a single player who is given a set of five words unrelated to each other, but related with a sixth word that represents the solution to the game. Fourteen teams registered to Ghigliottin-AI. Nevertheless, only two teams submitted their run. In order to evaluate the submitted systems, we rely on an API base methodology, via a Remote Evaluation Server (RES). In this report we describe the Ghigliottin-AI task, the data, the evaluation and we discuss results

    From exercise intolerance to functional improvement: The second wind phenomenon in the identification of McArdle disease

    Get PDF
    McArdle disease is the most common of the glycogen storage diseases. Onset of symptoms is usually in childhood with muscle pain and restricted exercise capacity. Signs and symptoms are often ignored in children or put down to 'growing pains' and thus diagnosis is often delayed. Misdiagnosis is not uncommon because several other conditions such as muscular dystrophy and muscle channelopathies can manifest with similar symptoms. A simple exercise test performed in the clinic can however help to identify patients by revealing the second wind phenomenon which is pathognomonic of the condition. Here a patient is reported illustrating the value of using a simple 12 minute walk test.RSS is funded by CiĂȘncias sem Fronteiras/CAPES Foundation. The authors would like to thank the Association for Glycogen Storage Disease (UK), the EUROMAC Registry funded by the European Union, the Muscular Dystrophy Campaign, the NHS National Specialist Commissioning Group and the Myositis Support Group for funding

    Misdiagnosis is an important factor for diagnostic delay in McArdle disease

    Get PDF
    Diagnosis of McArdle disease is frequently delayed by many years following the first presentation of symptoms to a health professional. The aim of this study was to investigate the importance of misdiagnosis in delaying diagnosis of McArdle disease. The frequency of misdiagnosis, duration of diagnostic delay, categories of misdiagnoses and inappropriate medical interventions were assessed in 50 genetically confirmed patients. The results demonstrated a high frequency of misdiagnosis (90%, n = 45/50) most commonly during childhood years (67%; n = 30/45) compared with teenage years and adulthood (teenage: n = 7/45; adult n = 5/45; not known n = 3/45). The correct diagnosis of McArdle disease was rarely made before adulthood (median age of diagnosis 33 years). Thirty-one patients (62%) reported having received more than one misdiagnosis; the most common were “growing pains” (40%, n = 20) and “laziness/being unfit” (46%, n = 23). A psychiatric/psychological misdiagnosis was significantly more common in females than males (females 6/20; males 1/30; p < 0.01). Of the 45 patients who were misdiagnosed, 21 (47%) received incorrect management. This study shows that most patients with McArdle disease received an incorrect explanation of their symptoms providing evidence that misdiagnosis plays an important part in delaying implementation of appropriate medical advice and management to this group of patients.The authors would like to thank Mr Andrew Wakelin for his great and inspiring work. The authors would also like to thank AGSD-UK, CAPES Foundation, Muscular Dystrophy Campaign and the Euromac Registry for their support

    EVALITA Evaluation of NLP and Speech Tools for Italian - December 17th, 2020

    Get PDF
    Welcome to EVALITA 2020! EVALITA is the evaluation campaign of Natural Language Processing and Speech Tools for Italian. EVALITA is an initiative of the Italian Association for Computational Linguistics (AILC, http://www.ai-lc.it) and it is endorsed by the Italian Association for Artificial Intelligence (AIxIA, http://www.aixia.it) and the Italian Association for Speech Sciences (AISV, http://www.aisv.it)
    corecore