43 research outputs found

    The AINA Project, Artificial Intelligence, and Language Technologies

    Get PDF
    Una de les àrees més rellevants de la IA és el processament del llenguatge natural (PLN). En aquest àmbit, tot i que actualment la majoria dels grans models de llenguatge ja són multilingües, hi ha una diferència substancial entre les capacitats dels models pel que fa a l’anglès i a la resta de llengües. En aquest sentit, el projecte AINA té per objectiu desenvolupar la infraestructura necessària per què la inclusió del català a les aplicacions d’IA sigui prou atractiva i viable. Aquest article presenta els objectius del projecte i n’explica les característiques generals.One of the most relevant areas of AI is Natural Language Processing (NLP). In this area, even though most of the large language models are currently multilingual, there is an important difference between the capabilities of English models and the other languages. Thus, the AINA project aims at developing the necessary infrastructure so that the inclusion of Catalan in AI applications becomes appealing and feasible. This article presents the objectives of the project and explains its main characteristics

    The Harvesting Day: an initiative to enhance the visibility of language resources

    Get PDF
    The Harvesting Day es una iniciativa para garantizar la visibilidad, localización y descripción de los recursos lingüísticos mediante un conjunto básico de metadatos. Esta iniciativa aboga por un cambio de estrategia en el que los proveedores de recursos y tecnologías lingüísticos se convierten en responsables de la visibilidad de sus propios recursos así como de su documentación. Una vez creadas y almacenadas debidamente las descripciones de los diferentes recursos, los metadatos son recopilados de manera automática y periódica y se envían a los principales repositorios y catálogos virtuales garantizando así la visibilidad de los recursos así como la veracidad de sus datos, que de este modo se mantendrán actualizados.The Harvesting Day is an initiative to ensure the visibility, accessibility and description of language resources by means of a basic and metadata schema. This initiative believes in a change of strategy: resource and technology providers must be aware of the importance of ensuring the visibility of their resources, as well as the documentation thereof. Once language resources descriptions are appropriately created and saved, the corresponding metadata are automatically and periodically harvested and sent to the main virtual repositories and catalogues. This guarantees not only the visibility of language resources and technologies, but also the trustability of their data, which in turn is continuously updated.Ministerio de Ciencia e Innovación; Departament d’Innovació, Universitats i Empresa de la Generalitat de Catalunya

    De Reglas Léxicas a Marcos de Subcategorización Complejos en la Jerarquía de Tipos

    Get PDF
    In HPSG the grammar consist of a type hierarchy and a set of principIes. In fact, principIes of the grammar (induding ID Schemata) are constraints over featu re structures and can easily be expressed as so. With this, HPSG manages to model alllinguistic knowledge in terms of typed feature structures without resorting to principIes or rules. Inheritance and Lexical Rules (LR) allow to eliminate redundancy. Broadly speaking, inheritance avoids 'vertical' redundancy and LR avoid 'horizontal' redundancy. LRs, however, are not part of the formalism, they are not typed and fall outside the linguistic taxonomy. In this paper we investigate ways of expressing the generalization power of LRs without using LRs. We integrate LRs as feature structures using disjunction and simulating dosure operator by means of the inheritance mechanism of the hierarchy itself. We can express LRs generalizations using new types with extended feature structures and modified inheritance mechanisms allowing for disjunctive inheritance. This allow to homogenize linguistic knowledge representation

    Transferencia de Tareas basada en Implicación Textual para la Clasificación de Textos en Catalán en Escenarios de Pocos Datos

    Get PDF
    This study investigates the application of a state-of-the-art zero-shot and few-shot natural language processing (NLP) technique for text classification tasks in Catalan, a moderately under-resourced language. The approach involves reformulating the downstream task as textual entailment, which is then solved by an entailment model. However, unlike English, where entailment models can be trained on huge Natural Language Inference (NLI) datasets, the lack of such large resources in Catalan poses a challenge. In this context, we comparatively explore training on monolingual and (larger) multilingual resources, and identify the strengths and weaknesses of monolingual and multilingual individual components of entailment models: pre-trained language model and NLI training dataset. Furthermore, we propose and implement a simple task transfer strategy using open Wikipedia resources that demonstrates significant performance improvements, providing a practical and effective alternative for languages with limited or no NLI datasets.El presente trabajo investiga una reciente técnica de aprendizaje zero-shot y few-shot, en que la tarea objetivo se reformula como un problema de implicación textual y se resuelve mediante un modelo de implicación (un modelo de lenguaje entrenado con un corpus de implicación o NLI (Natural Language Inference)), para abordar tareas de clasificación textual en catalán, una lengua con recursos limitados que dispone de un corpus de NLI de tamaño moderado. Comparamos su aplicación con los recursos en esta lengua frente a los multilingües, de tamaño muy superior. Así mismo, identificamos las ventajas y limitaciones de ambas aproximaciones y el impacto del tamaño y la lengua del modelo de lenguaje y corpus de NLI. Finalmente, implementamos una estrategia de transferencia de aprendizaje, empleando datos extraídos de Wikipedia, que consigue mejoras significativas y demuestra ser una opción interesante para lenguas que disponen de un corpus de NLI reducido o carecen de él.This work was funded by the Generalitat de Catalunya (Projecte AINA), the Basque Government (excellence research group IT1570-22) and by the DeepKnowledge (PID2021-127777OB-C21) project funded by MCIN/AEI/10.13039/501100011033

    Vickybot, a Chatbot for Anxiety-Depressive Symptoms and Work-Related Burnout in Primary Care and Health Care Professionals: Development, Feasibility, and Potential Effectiveness Studies

    Get PDF
    Background: Many people attending primary care (PC) have anxiety-depressive symptoms and work-related burnout compounded by a lack of resources to meet their needs. The COVID-19 pandemic has exacerbated this problem, and digital tools have been proposed as a solution. Objective: We aimed to present the development, feasibility, and potential effectiveness of Vickybot, a chatbot aimed at screening, monitoring, and reducing anxiety-depressive symptoms and work-related burnout, and detecting suicide risk in patients from PC and health care workers. Methods: Healthy controls (HCs) tested Vickybot for reliability. For the simulation study, HCs used Vickybot for 2 weeks to simulate different clinical situations. For feasibility and effectiveness study, people consulting PC or health care workers with mental health problems used Vickybot for 1 month. Self-assessments for anxiety (Generalized Anxiety Disorder 7-item) and depression (Patient Health Questionnaire-9) symptoms and work-related burnout (based on the Maslach Burnout Inventory) were administered at baseline and every 2 weeks. Feasibility was determined from both subjective and objective user-engagement indicators (UEIs). Potential effectiveness was measured using paired 2-tailed t tests or Wilcoxon signed-rank test for changes in self-assessment scores. Results: Overall, 40 HCs tested Vickybot simultaneously, and the data were reliably transmitted and registered. For simulation, 17 HCs (n=13, 76{\%} female; mean age 36.5, SD 9.7 years) received 98.8{\%} of the expected modules. Suicidal alerts were received correctly. For the feasibility and potential effectiveness study, 34 patients (15 from PC and 19 health care workers; 76{\%} [26/34] female; mean age 35.3, SD 10.1 years) completed the first self-assessments, with 100{\%} (34/34) presenting anxiety symptoms, 94{\%} (32/34) depressive symptoms, and 65{\%} (22/34) work-related burnout. In addition, 27{\%} (9/34) of patients completed the second self-assessment after 2 weeks of use. No significant differences were found between the first and second self-assessments for anxiety (t8=1.000; P=.34) or depressive (t8=0.40; P=.70) symptoms. However, work-related burnout scores were moderately reduced (z=−2.07, P=.04, r=0.32). There was a nonsignificant trend toward a greater reduction in anxiety-depressive symptoms and work-related burnout with greater use of the chatbot. Furthermore, 9{\%} (3/34) of patients activated the suicide alert, and the research team promptly intervened with successful outcomes. Vickybot showed high subjective UEI (acceptability, usability, and satisfaction), but low objective UEI (completion, adherence, compliance, and engagement). Vickybot was moderately feasible. Conclusions: The chatbot was useful in screening for the presence and severity of anxiety and depressive symptoms, and for detecting suicidal risk. Potential effectiveness was shown to reduce work-related burnout but not anxiety or depressive symptoms. Subjective perceptions of use contrasted with low objective-use metrics. Our results are promising but suggest the need to adapt and enhance the smartphone-based solution to improve engagement. A consensus on how to report UEIs and validate digital solutions, particularly for chatbots, is required.We are grateful to all participants. GA is supported by a Rio Hortega 2021 grant (CM21/00017) from the Spanish Ministry of Health financed by the Instituto de Salud Carlos III (ISCIII) and cofinanced by Fondo Social Europe Plus. MS was supported by a grant from the Baszucki Brain Research Fund. AM is supported by the Agència de Gestió d’Ajudes Universitàries I de Investigació—PANDÈMIES 2020 grant (PI047003) from the Generalitat de Catalunya. IG thanks the support of the Spanish Ministry of Science and Innovation (PI19/00954) integrated into the Plan Nacional de I+D+I and cofinanced by the ISCIII-Subdirección General de Evaluación y el Fondos Europeos de la Unión Europea (FEDER, Fondo Social Europe, Next Generation European Union or Plan de Recuperación Transformación y Resiliencia_PRTR); the Instituto de Salud Carlos III; the CIBER of Mental Health (CIBERSAM); and the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement (2017 SGR 1365), CERCA Programme or Generalitat de Catalunya as well as the Fundació Clínic per la Recerca Biomèdica (Pons Bartran 2022-FRCB_PB1_2022). AHY’s independent research was funded by the National Institute for Health Research Biomedical Research Centre in South London and Maudsley National Health Service Foundation Trust and King’s College London. The views expressed are those of the authors and not necessarily those of the National Health Service, National Institute for Health and Care Research, or Department of Health. JR is supported by a Miguel Servet II contract (CPII19/00009), funded by ISCIII and cofunded by the European Social Fund “Investing in your future.” CT has been supported through a “Miguel Servet” postdoctoral contract (CPI14/00175) and a Miguel Servet II contract (CPII19/00018) and thanks the support of the Spanish Ministry of Innovation and Science (PI17/01066 and PI20/00344), funded by the Instituto de Salud Carlos III and cofinanced by the European Union (FEDER) “Una manera de hacer Europa.” AMA thanks the support of the Spanish Ministry of Science and Innovation (PI18/00789, PI21/00787) integrated into the Plan Nacional de I+D+I and cofinanced by ISCIII-Subdirección General de Evaluación and the Fondo Europeo de Desarrollo Regional (FEDER); the ISCIII; the CIBER of Mental Health (CIBERSAM); the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement (2017 SGR 1365); the CERCA Programme; and the Departament de Salut de la Generalitat de Catalunya for the Pla estratègic de recerca I innovació en salut (PERIS) grant SLT006/17/00177. AM thanks the support of the Spanish Ministry of Science and Innovation (PI19/00672) integrated into the Plan Nacional de I+D+I and cofinanced by the ISCIII-Subdirección General de Evaluación and the FEDER. GF is supported by a fellowship from “La Caixa” Foundation (ID 100010434)—fellowship code—LCF/BQ/DR21/11880019. SA has been supported by a Sara Borrell contract (CD20/00177), funded by ISCIII and founded by the European Social Fund “Investing in your future.” EV thanks the support of the Spanish Ministry of Science, Innovation and Universities (PI15/00283, PI18/00805, PI19/00394, PI21/00787, and CPII19/00009) integrated into the Plan Nacional de I+D+I and cofinanced by the ISCIII-Subdirección General de Evaluación and the FEDER; the ISCIII; the CIBER of Mental Health (CIBERSAM); the Secretaria d’Universitats i Recerca del Departament d’Economia i Coneixement (2017 SGR 1365), and the CERCA Programme or Generalitat de Catalunya. We would like to thank the Departament de Salut de la Generalitat de Catalunya for the PERIS grant SLT006/17/00357. DHM´s research was supported by Juan Rodés JR18/00021 granted by the ISCIII. The PRESTO project has been funded by Fundació Clínic per a la Recerca Biomèdica through the Pons Bartran 2020 grant (PI046549). The development of a version of the digital solution adapted to health workers is funded by the Spanish Foundation for Psychiatry and Mental Health, Spanish Psychiatric Society, and Spanish Society of Biological Psychiatry (PI046813). The enhancement of the digital solution with Natural Language Processing techniques in a chatbot user-interface in collaboration with the text mining technologies in the health domain of the Barcelona Supercomputing Center is funded by the Agència de Gestió d’Ajudes Universitàries I de Investigació—PANDÈMIES 2020 grant (PI047003), from La Generalitat de Catalunya.Peer Reviewed"Article signat per 50 autors/es: Gerard Anmella; Miriam Sanabra; Mireia Primé-Tous; Xavier Segú; Myriam Cavero; Ivette Morilla; Iria Grande; Victoria Ruiz; Ariadna Mas; Inés Martín-Villalba; Alejandro Caballo; Julia-Parisad Esteva; Arturo Rodríguez-Rey; Flavia Piazza; Francisco José Valdesoiro; Claudia Rodriguez-Torrella; Marta Espinosa; Giulia Virgili; Carlota Sorroche; Alicia Ruiz; Aleix Solanes; Joaquim Radua; María Antonieta Also; Elisenda Sant; Sandra Murgui; Mireia Sans-Corrales; Allan H Young; Victor Vicens; Jordi Blanch; Elsa Caballeria; Hugo López-Pelayo; Clara López; Victoria Olivé; Laura Pujol ; Sebastiana Quesada; Brisa Solé; Carla Torrent; Anabel Martínez-Aran; Joana Guarch; Ricard Navinés; Andrea Murru; Giovanna Fico; Michele de Prisco; Vicenzo Oliva; Silvia Amoretti ; Casimiro Pio-Carrino; María Fernández-Canseco; Marta Villegas; Eduard Vieta; Diego Hidalgo-Mazzei"Postprint (published version

    Efforts to foster biomedical text mining efforts beyond English: the Spanish national strategic plan for language technologies

    Get PDF
    Si bien se han hecho esfuerzos considerables para aplicar las tecnologías de minería de texto a la literatura biomédica y los registros clínicos escritos en inglés, lo cierto es que intentos de procesar documentos en otros idiomas han atraído mucha menos atención a pesar de su interés práctico. Debido al considerable número de documentos biomédicos escritos en español, existe una necesidad apremiante de poder acceder a los recursos de minería de textos biomédicos y clínicos desarrollados para esta lengua de alto impacto. Para abordar este asunto, la Secretaría de Estado encargó las actuaciones de apoyo técnico especializado para el desarrollo del Plan de Impulso de las tecnologías del Lenguaje en el ámbito de la biomedicina. El artículo describe brevemente las líneas principales de actuación del proyecto en su primera fase, esto es: facilitar el acceso a recursos y herramientas en PNL, analizar y garantizar la interoperabilidad del sistema, la definición de métodos y herramientas de evaluación, la difusión del proyecto y sus resultados y la alineación y colaboración con otros proyectos nacionales e internacionales. Además, hemos identificado algunas de las tareas críticas en el procesamiento de textos biomédicos que requieren investigación adicional y disponibilidad de herramientas.A considerable effort has been made to apply text mining technologies to biomedical literature and clinical records written in English, while attempts to process documents in other languages have attracted far less attention despite the key practical relevance. Due to the considerable number of biomedical documents written in Spanish, there is a pressing need to be able to access biomedical and clinical text mining resources developed for this high impact language. To address this issue, the Spanish Ministry of State for Telecommunications launched the Plan for Promotion of Language Technologies in the field of biomedicine with the aim of providing specialized technical support to research and development of software solutions adapted to this domain. This article briefly describes the main lines of action of this project in its initial stages, namely: (a) identification of relevant biomedical NLP resources/tools, (b) examining and enabling system interoperability aspects, (c) to outline strategies and support for evaluation settings, (d) to disseminate the project and its results, and (e) to align and collaborate with other related national and international projects. Moreover we have identified some of the critical biomedical text processing tasks that require additional research and availability of tools

    Trends in Adherence to the Mediterranean Diet in Spanish Children and Adolescents across Two Decades.

    Get PDF
    Unhealthy dietary habits determined during childhood may represent a risk factor to many of the chronic non-communicable diseases (NCDs) in adulthood. Mediterranean Diet (MD) adherence in children and adolescents (8–16 years) living in Spain was investigated using the KIDMED questionnaire in a comparative analysis of two cross-sectional nationwide representative studies: enKid (1998–2000, n = 1001) and PASOS (2019–2020, n = 3540). Taking into account the educational level of pupils, as well as the characteristics of the place of living, a significant association was found between a KIDMED score ≥ 8 (optimal MD adherence) and primary education as well as residency in an area of <50,000 inhabitants, while living in the southern regions was associated with non-optimal MD adherence (p < 0.001). Participants of the 2019–2020 study showed an increase in the consumption of dairy products (31.1% increase), pasta/rice (15.4% increase), olive oil (16.9% increase), and nuts (9.7% increase), as well as a decreased sweets and candies intake (12.6% reduction). In contrast, a significantly lower MD adherence was found when comparing the 2019–2020 (mean ± SE: 6.9 ± 0.04) and the 1998–2000 study (7.37 ± 0.08); p < 0.001), due to less consumption of fish (20.3% reduction), pulse (19.4% reduction), and fruits (14.9% reduction), and an increased intake of commercial goods/pastries or fast-food intake (both 19.4% increase). The lowest adherence was recorded for adolescents also in the most recent study, where 10.9% of them presented a KIDMED score ≤ 3. This study shows that eating habits are deteriorating among Spanish children and adolescents. Such findings point out the urgency of undertaking strong measures to promote the consumption of healthy, sustainable, and non-ultra-processed food, such as those available in an MD, not only at a scientific and academic level, but also at a governmental onePartial funding for open access charge: Universidad de Málag

    Integrative epigenomics in Sjögren´s syndrome reveals novel pathways and a strong interaction between the HLA, autoantibodies and the interferon signature

    Get PDF
    Primary Sjögren's syndrome (SS) is a systemic autoimmune disease characterized by lymphocytic infiltration and damage of exocrine salivary and lacrimal glands. The etiology of SS is complex with environmental triggers and genetic factors involved. By conducting an integrated multi-omics study, we confirmed a vast coordinated hypomethylation and overexpression effects in IFN-related genes, what is known as the IFN signature. Stratified and conditional analyses suggest a strong interaction between SS-associated HLA genetic variation and the presence of Anti-Ro/SSA autoantibodies in driving the IFN epigenetic signature and determining SS. We report a novel epigenetic signature characterized by increased DNA methylation levels in a large number of genes enriched in pathways such as collagen metabolism and extracellular matrix organization. We identified potential new genetic variants associated with SS that might mediate their risk by altering DNA methylation or gene expression patterns, as well as disease-interacting genetic variants that exhibit regulatory function only in the SS population. Our study sheds new light on the interaction between genetics, autoantibody profiles, DNA methylation and gene expression in SS, and contributes to elucidate the genetic architecture of gene regulation in an autoimmune population

    Treatment with tocilizumab or corticosteroids for COVID-19 patients with hyperinflammatory state: a multicentre cohort study (SAM-COVID-19)

    Get PDF
    Objectives: The objective of this study was to estimate the association between tocilizumab or corticosteroids and the risk of intubation or death in patients with coronavirus disease 19 (COVID-19) with a hyperinflammatory state according to clinical and laboratory parameters. Methods: A cohort study was performed in 60 Spanish hospitals including 778 patients with COVID-19 and clinical and laboratory data indicative of a hyperinflammatory state. Treatment was mainly with tocilizumab, an intermediate-high dose of corticosteroids (IHDC), a pulse dose of corticosteroids (PDC), combination therapy, or no treatment. Primary outcome was intubation or death; follow-up was 21 days. Propensity score-adjusted estimations using Cox regression (logistic regression if needed) were calculated. Propensity scores were used as confounders, matching variables and for the inverse probability of treatment weights (IPTWs). Results: In all, 88, 117, 78 and 151 patients treated with tocilizumab, IHDC, PDC, and combination therapy, respectively, were compared with 344 untreated patients. The primary endpoint occurred in 10 (11.4%), 27 (23.1%), 12 (15.4%), 40 (25.6%) and 69 (21.1%), respectively. The IPTW-based hazard ratios (odds ratio for combination therapy) for the primary endpoint were 0.32 (95%CI 0.22-0.47; p < 0.001) for tocilizumab, 0.82 (0.71-1.30; p 0.82) for IHDC, 0.61 (0.43-0.86; p 0.006) for PDC, and 1.17 (0.86-1.58; p 0.30) for combination therapy. Other applications of the propensity score provided similar results, but were not significant for PDC. Tocilizumab was also associated with lower hazard of death alone in IPTW analysis (0.07; 0.02-0.17; p < 0.001). Conclusions: Tocilizumab might be useful in COVID-19 patients with a hyperinflammatory state and should be prioritized for randomized trials in this situatio
    corecore