114 research outputs found

    On the Evolution of Knowledge Graphs: A Survey and Perspective

    Full text link
    Knowledge graphs (KGs) are structured representations of diversified knowledge. They are widely used in various intelligent applications. In this article, we provide a comprehensive survey on the evolution of various types of knowledge graphs (i.e., static KGs, dynamic KGs, temporal KGs, and event KGs) and techniques for knowledge extraction and reasoning. Furthermore, we introduce the practical applications of different types of KGs, including a case study in financial analysis. Finally, we propose our perspective on the future directions of knowledge engineering, including the potential of combining the power of knowledge graphs and large language models (LLMs), and the evolution of knowledge extraction, reasoning, and representation

    A Survey on Knowledge Graphs: Representation, Acquisition and Applications

    Full text link
    Human knowledge provides a formal understanding of the world. Knowledge graphs that represent structural relations between entities have become an increasingly popular research direction towards cognition and human-level intelligence. In this survey, we provide a comprehensive review of knowledge graph covering overall research topics about 1) knowledge graph representation learning, 2) knowledge acquisition and completion, 3) temporal knowledge graph, and 4) knowledge-aware applications, and summarize recent breakthroughs and perspective directions to facilitate future research. We propose a full-view categorization and new taxonomies on these topics. Knowledge graph embedding is organized from four aspects of representation space, scoring function, encoding models, and auxiliary information. For knowledge acquisition, especially knowledge graph completion, embedding methods, path inference, and logical rule reasoning, are reviewed. We further explore several emerging topics, including meta relational learning, commonsense reasoning, and temporal knowledge graphs. To facilitate future research on knowledge graphs, we also provide a curated collection of datasets and open-source libraries on different tasks. In the end, we have a thorough outlook on several promising research directions

    Comparability, evaluation and benchmarking of large pre-trained language models

    Get PDF

    Robust and Explainable Identification of Logical Fallacies in Natural Language Arguments

    Full text link
    The spread of misinformation, propaganda, and flawed argumentation has been amplified in the Internet era. Given the volume of data and the subtlety of identifying violations of argumentation norms, supporting information analytics tasks, like content moderation, with trustworthy methods that can identify logical fallacies is essential. In this paper, we formalize prior theoretical work on logical fallacies into a comprehensive three-stage evaluation framework of detection, coarse-grained, and fine-grained classification. We adapt existing evaluation datasets for each stage of the evaluation. We employ three families of robust and explainable methods based on prototype reasoning, instance-based reasoning, and knowledge injection. The methods combine language models with background knowledge and explainable mechanisms. Moreover, we address data sparsity with strategies for data augmentation and curriculum learning. Our three-stage framework natively consolidates prior datasets and methods from existing tasks, like propaganda detection, serving as an overarching evaluation testbed. We extensively evaluate these methods on our datasets, focusing on their robustness and explainability. Our results provide insight into the strengths and weaknesses of the methods on different components and fallacy classes, indicating that fallacy identification is a challenging task that may require specialized forms of reasoning to capture various classes. We share our open-source code and data on GitHub to support further work on logical fallacy identification

    Multilingual taxonomic text classification in an incremental number of languages

    Get PDF
    La classificazione tassonomica dei testi è una branca dell'elaborazione del linguaggio naturale (NLP) che mira a classificare i testi in uno schema organizzato gerarchicamente. Le sue applicazioni spaziano dalla ricerca pura a scopi industriali e commerciali. In questa tesi, in particolare, l'attenzione è focalizzata sul processo di sviluppo di un modello multilingue per la classificazione gerarchica del testo indipendentemente dal suo idioma. Viene effettuata un'analisi comparativa per valutare quali tecniche di embedding e quali metodi di selezione delle feature siano più performanti. Inoltre, poiché nelle applicazioni concrete può essere richiesto che un modello multilingue supporti nuove lingue nel corso del tempo, implementiamo e analizziamo una serie di tecniche, tra cui due algoritmi di apprendimento continuo, per estendere sequenzialmente la nostra rete a un numero incrementale di lingue. Gli esperimenti condotti mostrano sia la solidità che le criticità del modello attuale e pongono le basi per ulteriori ricerche.Taxonomic text classification is the Natural Language Processing (NLP) branch that aims at classifying text into a hierarchically organized schema. Its applications space from pure research to industrial and commercial purposes. In this thesis, in particular, the attention is focused on the process of developing a multilingual model for hierarchically classifying text independently from its idiom. Comparative analysis is performed to evaluate whose embedding techniques and feature selection methods perform better. Moreover, since in real-life scenarios a multilingual model may be required to support new languages over time, we implement and benchmark a set of techniques, among which are two Continual Learning algorithms, to sequentially extend our network to an incremental number of languages. The experiments carried out display both the solidity and criticalities of the current model and set the basis for further research

    A Comprehensive Survey on Word Representation Models: From Classical to State-Of-The-Art Word Representation Language Models

    Full text link
    Word representation has always been an important research area in the history of natural language processing (NLP). Understanding such complex text data is imperative, given that it is rich in information and can be used widely across various applications. In this survey, we explore different word representation models and its power of expression, from the classical to modern-day state-of-the-art word representation language models (LMS). We describe a variety of text representation methods, and model designs have blossomed in the context of NLP, including SOTA LMs. These models can transform large volumes of text into effective vector representations capturing the same semantic information. Further, such representations can be utilized by various machine learning (ML) algorithms for a variety of NLP related tasks. In the end, this survey briefly discusses the commonly used ML and DL based classifiers, evaluation metrics and the applications of these word embeddings in different NLP tasks

    Citation Polarity Identification From Scientific Articles Using Deep Learning Methods

    Get PDF
    The way in which research articles are cited reflects how previous work is utilized by other researchers or stakeholders and can indicate the impact of that work on subsequent experiments. Based on human intuition, citations can be perceived as positive, negative, or neutral. While current citation indexing systems provide information on the author and publication name of the cited article, as well as the citation count, they do not indicate the polarity of the citation. This study aims to identify the polarity of citations in scientific research articles using pre-trained language models like BERT, ELECTRA, RoBERTa, Bio-RoBERTa, SPECTER, ERNIE, LongFormer, BigBird, and deep-learning methods. Most citations have a neutral polarity, resulting in imbalanced datasets for training deep-learning models. To address this issue, a class balancing technique is proposed and applied to all datasets to improve consistency and results. Pre-trained language models are used to generate optimal features, and ensemble techniques are utilized to combine all model predictions to produce the highest precision, recall, and F1-scores for all three labels
    corecore