15 research outputs found

    Mining and Leveraging Background Knowledge for Improving Named Entity Linking

    Get PDF
    Knowledge-rich Information Extraction (IE) methods aspire towards combining classical IE with background knowledge obtained from third-party resources. Linked Open Data repositories that encode billions of machine readable facts from sources such as Wikipedia play a pivotal role in this development. The recent growth of Linked Data adoption for Information Extraction tasks has shed light on many data quality issues in these data sources that seriously challenge their usefulness such as completeness, timeliness and semantic correctness. Information Extraction methods are, therefore, faced with problems such as name variance and type confusability. If multiple linked data sources are used in parallel, additional concerns regarding link stability and entity mappings emerge. This paper develops methods for integrating Linked Data into Named Entity Linking methods and addresses challenges in regard to mining knowledge from Linked Data, mitigating data quality issues, and adapting algorithms to leverage this knowledge. Finally, we apply these methods to Recognyze, a graph-based Named Entity Linking (NEL) system, and provide a comprehensive evaluation which compares its performance to other well-known NEL systems, demonstrating the impact of the suggested methods on its own entity linking performance

    StoryLens: A Multiple Views Corpus for Location and EventDetection

    Get PDF
    The news media landscape tends to focus on long-running narratives. Correctly processing new information, therefore, requires considering multiple lenses when analyzing media content. Traditionally it would have been considered sufficient to extract the topics or entities contained in a text in order to classify it, but today it is important to also look at more sophisticated annotations related to fine-grained geolocation, events, stories and the relations between them. In order to leverage such lenses we propose a new corpus that offers a diverse set of annotations over texts collected from multiple media sources. We also showcase the framework used for creating the corpus, as well as how the information from the various lenses can be used in order to support different use cases in the EU project InVID for verifying the veracity of online video

    On the Importance of Drill-Down Analysis for Assessing Gold Standards and Named Entity Linking Performance

    Get PDF
    Rigorous evaluations and analyses of evaluation results are key towards improving Named Entity Linking systems. Nevertheless, most current evaluation tools are focused on benchmarking and comparative evaluations. Therefore, they only provide aggregated statistics such as precision, recall and F1-measure to assess system performance and no means for conducting detailed analyses up to the level of individual annotations. This paper addresses the need for transparent benchmarking and fine-grained error analysis by introducing Orbis, an extensible framework that supports drill-down analysis, multiple annotation tasks and resource versioning. Orbis complements approaches like those deployed through the GERBIL and TAC KBP tools and helps developers to better understand and address shortcomings in their Named Entity Linking tools. We present three uses cases in order to demonstrate the usefulness of Orbis for both research and production systems: (i)improving Named Entity Linking tools; (ii) detecting gold standard errors; and (iii) performing Named Entity Linking evaluations with multiple versions of the included resources

    NLQxform: A Language Model-based Question to SPARQL Transformer

    Full text link
    In recent years, scholarly data has grown dramatically in terms of both scale and complexity. It becomes increasingly challenging to retrieve information from scholarly knowledge graphs that include large-scale heterogeneous relationships, such as authorship, affiliation, and citation, between various types of entities, e.g., scholars, papers, and organizations. As part of the Scholarly QALD Challenge, this paper presents a question-answering (QA) system called NLQxform, which provides an easy-to-use natural language interface to facilitate accessing scholarly knowledge graphs. NLQxform allows users to express their complex query intentions in natural language questions. A transformer-based language model, i.e., BART, is employed to translate questions into standard SPARQL queries, which can be evaluated to retrieve the required information. According to the public leaderboard of the Scholarly QALD Challenge at ISWC 2023 (Task 1: DBLP-QUAD - Knowledge Graph Question Answering over DBLP), NLQxform achieved an F1 score of 0.85 and ranked first on the QA task, demonstrating the competitiveness of the system

    Name Variants for Improving Entity Discovery and Linking

    Get PDF
    Identifying all names that refer to a particular set of named entities is a challenging task, as quite often we need to consider many features that include a lot of variation like abbreviations, aliases, hypocorism, multilingualism or partial matches. Each entity type can also have specific rules for name variances: people names can include titles, country and branch names are sometimes removed from organization names, while locations are often plagued by the issue of nested entities. The lack of a clear strategy for collecting, processing and computing name variants significantly lowers the recall of tasks such as Named Entity Linking and Knowledge Base Population since name variances are frequently used in all kind of textual content. This paper proposes several strategies to address these issues. Recall can be improved by combining knowledge repositories and by computing additional variances based on algorithmic approaches. Heuristics and machine learning methods then analyze the generated name variances and mark ambiguous names to increase precision. An extensive evaluation demonstrates the effects of integrating these methods into a new Named Entity Linking framework and confirms that systematically considering name variances yields significant performance improvements

    Integrated Network Pharmacology Approach for Drug Combination Discovery : A Multi-Cancer Case Study

    Get PDF
    Simple Summary Current treatments for complex diseases, including cancer, are generally characterized by high toxicity due to their low selectivity for target cells. Moreover, patients often develop drug resistance, hence becoming less sensitive to the therapy. For this reason, novel, improved, and more specific pharmacological therapies are needed. The high cost and the time required to develop new drugs poses the attention on the development of computational methods for drug repositioning and combination therapy prediction. In this study, we developed an integrated network pharmacology framework that combines mechanistic and chemocentric approaches in order to predict potential drug combinations for cancer therapy. We applied our paradigm in five cancer types, which we used as case studies. Our strategy can be applied to the study of any complex disease by guiding the prioritization of drug combinations. Despite remarkable efforts of computational and predictive pharmacology to improve therapeutic strategies for complex diseases, only in a few cases have the predictions been eventually employed in the clinics. One of the reasons behind this drawback is that current predictive approaches are based only on the integration of molecular perturbation of a certain disease with drug sensitivity signatures, neglecting intrinsic properties of the drugs. Here we integrate mechanistic and chemocentric approaches to drug repositioning by developing an innovative network pharmacology strategy. We developed a multilayer network-based computational framework integrating perturbational signatures of the disease as well as intrinsic characteristics of the drugs, such as their mechanism of action and chemical structure. We present five case studies carried out on public data from The Cancer Genome Atlas, including invasive breast cancer, colon adenocarcinoma, lung squamous cell carcinoma, hepatocellular carcinoma and prostate adenocarcinoma. Our results highlight paclitaxel as a suitable drug for combination therapy for many of the considered cancer types. In addition, several non-cancer-related genes representing unusual drug targets were identified as potential candidates for pharmacological treatment of cancer.Peer reviewe

    An Intelligent Multicriteria Model for Diagnosing Dementia in People Infected with Human Immunodeficiency Virus

    Get PDF
    Hybrid models to detect dementia based on Machine Learning can provide accurate diagnoses in individuals with neurological disorders and cognitive complications caused by Human Immunodeficiency Virus (HIV) infection. This study proposes a hybrid approach, using Machine Learning algorithms associated with the multicriteria method of Verbal Decision Analysis (VDA). Dementia, which affects many HIV-infected individuals, refers to neurodevelopmental and mental disorders. Some manuals standardize the information used in the correct detection of neurological disorders with cognitive complications. Among the most common manuals used are the DSM-5 (Diagnostic and Statistical Manual of Mental Disorders, 5th edition) of the American Psychiatric Association and the International Classification of Diseases, 10th edition (ICD-10)—both published byWorld Health Organization (WHO). The model is designed to explore the predictive of specific data. Furthermore, a well-defined database data set improves and optimizes the diagnostic models sought in the research.info:eu-repo/semantics/publishedVersio

    European language equality

    Get PDF
    This deep dive on data, knowledge graphs (KGs) and language resources (LRs) is the final of the four technology deep dives, as data as well as related models are the basis for technologies and solutions in the area of Language Technology (LT) for European digital language equality (DLE). This chapter focuses on the data and LRs required to achieve full DLE in Europe by 2030. The main components identified – data, KGs, LRs – are explained, and used to analyse the state-of-the-art as well as identify gaps. All of these components need to be tackled in the future, for the widest range of languages possible, from official EU languages to dialects to non- EU languages used in Europe. For all these languages, efficient data collection and sustainable data provision to be facilitated with fair conditions and costs. Specific technologies, methodologies and tools have been identified to enable the implementation of the vision of DLE by 2030. In addition, data-related business models and data-governance models are discussed, as they are considered a prerequisite for a working data economy that stimulates a vibrant LT landscape that can bring about European DLE.peer-reviewe

    Assessment of the quality of life predictors in children with newly diagnosed solid tumors

    Get PDF
    Poslednjih decenija došlo je do porasta stope preživljavanja dece obolele od malignih bolesti, najviše kao posledica standardizacije i intenziviranja protokola lečenja što je dovelo i do porasta broja i težine neželjenih efekata u vidu različitih fizičkih i psihičkih simptoma sa kojima se deca suočavaju tokom lečenja a i kasnije tokom života. Cilj ovog rada bio je procena kvaliteta života dece sa novootkrivenim solidnim tumorima kao i određivanje prediktora kvaliteta života na početku lečenja. Metod: Istraživanje je obuhvatilo grupu dece sa novootkrivenim solidnim tumorima koja su započela onkološko lečenje na Institutu za onkologiju i radiologiju Srbije u periodu od decembra 2016. do januara 2018. godine. Rezultati: Ukupan broj dece bio je 51. Dečaka je bilo 24 (47,1%), a devojčica 27 (52,9%). Većina dece imala je tumore CNS-a (21/51; 41,2%), dok su na drugom mestu bili koštani tumori (19/51; 39,2%). Diseminovanu formu bolesti imalo je 14/51 dete (27%) a najčešće je diseminacija bila prisutna kod koštanih tumora (8/20; 40%). Prosečan uzrast dece sa tumorima CNS bio je 7,3 ± 3,9 godina a najčešći tip tumora je bio meduloblastom (38%). Prosečan uzrast dece sa koštanim tumorima je bio 13,6±5,7 godina a više dece je imalo Ewing sarkom (12/20; 60%). Najčešća lokalizacija koštanih tumora bili su ekstremiteti (60%). Nije uočena razlika u kvalitetu života u odnosu na pol. U domenu za mučninu značajna razlika u odnosu na uzrast postojala je između namlađe i najstarije dece (p=0,06), dok je u odnosu na tip tumora razlika postojala između dece sa tumorima CNS u odnosu na decu sa tumorima kostiju (p=0,019). U domenu kognitivnih smetnji značajna razlika je postojala između dece uzrasta 5-7 i 8-12 godina (p=0,040) kao i između dece uzrasta 8-12 i 13-18 godina (p=0,020). U domenu za uznemirenost zbog lečenja značajna razlika je postojala između dece sa tumorima CNS i dece sa tumorima kostiju (p=0,042). Značajna razlika uočena je u odnosu na tip tumora za domen fizičkog funkcionisanja između dece sa tumorima CNS i dece sa tumorima kostiju (p=0,021) kao i za domen emocionalnog funkcionisanja između dece sa tumorima kostiju i dece sa tumorima mekih tkiva (p=0,042)...In the last decades, an increase in the survival rate of children diagnosed with malignant diseases has been observed, predominantly as a consequence of standardization and intensification of treatment protocols, which also led to the increase of number and severity of treatment adverse effects represented by various physical and mental symptoms children encounter during the treatment and later in life. The aim of this research was the assessment of the quality of life of children with newly diagnosed solid tumors as well as the assessment of predictors of quality of life at the beginning of treatment. Methods: This research included a group of children with newly diagnosed solid tumors who began their oncological treatment at the Institute for Oncology and Radiology of Serbia from December 2016 until January 2018. Results: The total number of children enrolled was 51, out of which 24 (47.1%) were boys and 27 (52.9%) girls. Majority of children had tumors of the central nervous system (21/51; 41.2%), followed by bone tumors (19/51; 39.2%). Disseminated form of the disease was present in 14/51 children (27%), predominantly seen in bone tumors (8/20; 40%). The average age of children with CNS tumors was 7.3±3.9 years and the most frequent type of tumor was medulloblastoma (38 %). The average age of children with bone tumors was 13.6 ± 5.7 years with Ewing sarcoma as the most frequently diagnosed bone tumor (12/20; 60%). The most frequent localization of bone tumors were extremities (60%). No difference in the quality of life related to gender was observed. In the nausea domain, a significant difference was found between the youngest and oldest children (p=0.06), while in relation to the type of tumor, the difference was found between children with CNS tumors and children with bone tumors (p=0.019). In the cognitive disorder domain, a significant difference was found between children in 5-7 years age group and children in 8-12 years age group, as well as between children of 8-12 years and 13-18 years (p=0.020). In the treatment anxiety domain, a significant difference was found between children with CNS tumors and children with bone tumors (p=0.042)..
    corecore