99 research outputs found

    CESI: Canonicalizing Open Knowledge Bases using Embeddings and Side Information

    Full text link
    Open Information Extraction (OpenIE) methods extract (noun phrase, relation phrase, noun phrase) triples from text, resulting in the construction of large Open Knowledge Bases (Open KBs). The noun phrases (NPs) and relation phrases in such Open KBs are not canonicalized, leading to the storage of redundant and ambiguous facts. Recent research has posed canonicalization of Open KBs as clustering over manuallydefined feature spaces. Manual feature engineering is expensive and often sub-optimal. In order to overcome this challenge, we propose Canonicalization using Embeddings and Side Information (CESI) - a novel approach which performs canonicalization over learned embeddings of Open KBs. CESI extends recent advances in KB embedding by incorporating relevant NP and relation phrase side information in a principled manner. Through extensive experiments on multiple real-world datasets, we demonstrate CESI's effectiveness.Comment: Accepted at WWW 201

    Canonicalizing Knowledge Base Literals

    Get PDF
    Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching

    Can we predict new facts with open knowledge graph embeddings? A benchmark for open link prediction

    Full text link
    Open Information Extraction systems extract(“subject text”, “relation text”, “object text”)triples from raw text. Some triples are textualversions of facts, i.e., non-canonicalized men-tions of entities and relations. In this paper, weinvestigate whether it is possible to infernewfacts directly from theopen knowledge graphwithout any canonicalization or any supervi-sion from curated knowledge. For this pur-pose, we propose the open link prediction task,i.e., predicting test facts by completing(“sub-ject text”, “relation text”, ?)questions. Anevaluation in such a setup raises the question ifa correct prediction is actually anewfact thatwas induced by reasoning over the open knowl-edge graph or if it can be trivially explained.For example, facts can appear in different para-phrased textual variants, which can lead to testleakage. To this end, we propose an evaluationprotocol and a methodology for creating theopen link prediction benchmark OLPBENCH.We performed experiments with a prototypicalknowledge graph embedding model for openlink prediction. While the task is very chal-lenging, our results suggests that it is possibleto predict genuinely new facts, which can notbe trivially explained

    Development of knowledge representation based on markov logical networks in the business process mangement system

    Get PDF
    Досліджено проблему побудови представлення знань в системі процесного управління на основі аналізу поведінки бізнес-процесів, що представлена у вигляді логів подій. Кожна подія характеризує дію бізнес-процесу. Актуальність проблеми визначається тим, що при управлінні складними знання-ємними бізнес-процесами виконавці можуть змінювати послідовність дій з урахуванням додаткових знань про предметну область. В результаті виникає невідповідність між процесом та його моделлю, що створює труднощі для подальшого управління бізнес-процесом. Для усунення вказаної невідповідності потрібно формалізувати ці додаткові знання та використовувати їх при процесному управлінні, що потребує створення відповідного представлення знань. Запропоновано модель представлення знань враховує статичні й динамічні характеристики бізнес-процесу. Статичні характеристики бізнес-процесу задаються фактами та правилами із аргументами, представленими атрибутами подій логу. Факти і правила формуються на основі відповідних шаблонів. Атрибути задають значення властивостей об’єктів, з якими оперує бізнес-процес. Динамічні особливості бізнес-процесу визначаються через поточний розподіл ймовірностей виконання правил з урахуванням атрибутів поточної події логу бізнес-процесу. Запропонована модель відрізняється тим, що вона враховує обмеження на допустимі послідовності виконання дій бізнес-процесу, а також обмеження на основі апріорних знань про предметну область. Такі обмеження дозволить понизити складність задачі пошуку ймовірностей успішного завершення бізнес-процесу шляхом скорочення множини допустимих трас в тому випадку, якщо виконавці змінили послідовність дій. В практичному аспекті модель забезпечує можливість підтримки прийняття рішень з управління знання-ємними бізнес-процесами на основі прогнозування ймовірностей досягнення кінцевого стану процесу з урахуванням атрибутів подій логу.The problem of constructing knowledge representation in the process control system based on the analysis of the behavior of business processes, represented in the form of logs of events, is studied. Each event characterizes the action of the business process. The urgency of the problem is determined by the fact that when managing complex knowledge-capacious business processes, performers can change the sequence of actions taking into account additional knowledge about the subject area. As a result, there is a discrepancy between the process and its model, which creates difficulties for the further management of this business process. To eliminate this discrepancy, it is necessary to formalize the additional knowledge used and apply them in process management, which requires the creation of an appropriate knowledge representation. The proposed knowledge representation model takes into account the static and dynamic characteristics of the business process. The static characteristics of a business process are specified by facts and rules with arguments represented by the attributes of the log events. Facts and rules are formed on the basis of appropriate templates. Attributes specify the values of the properties of objects with which the business process operates. Dynamic features of the business process are determined through the current distribution of the probability that the rules will be executed, taking into account the attributes of the current business process log event. The proposed model is characterized by the fact that it takes into account the limitations on the permissible sequences of execution of the actions of the business process, as well as restrictions based on a priori knowledge of the subject area. Such restrictions will reduce the complexity of the problem of finding the probabilities of a successful completion of a business process by reducing the number of allowed trails in the event that the performers have changed the sequence of actions. In practical terms, the model provides the ability to support decision-making on the management of knowledge-intensive business processes based on predicting the probabilities of achieving the final state of the process, taking into account the attributes of log events

    Tackling scalability issues in mining path patterns from knowledge graphs: a preliminary study

    Get PDF
    Features mined from knowledge graphs are widely used within multiple knowledge discovery tasks such as classification or fact-checking. Here, we consider a given set of vertices, called seed vertices, and focus on mining their associated neighboring vertices, paths, and, more generally, path patterns that involve classes of ontologies linked with knowledge graphs. Due to the combinatorial nature and the increasing size of real-world knowledge graphs, the task of mining these patterns immediately entails scalability issues. In this paper, we address these issues by proposing a pattern mining approach that relies on a set of constraints (e.g., support or degree thresholds) and the monotonicity property. As our motivation comes from the mining of real-world knowledge graphs, we illustrate our approach with PGxLOD, a biomedical knowledge graph

    Method for Detecting Anomalous States of a Control Object in Information Systems Based on the Analysis of Temporal Data and Knowledge

    Get PDF
    The problem of finding the anomalous states of the control object in the management information system under conditions of uncertainty caused by the incompleteness of knowledge about this object is considered. The method of classifying the current state of the control object in real time, allowing to identify the current anomalous state. The method uses temporal data and knowledge. Data is represented by sequences of events with timestamps. Knowledge is represented as weighted temporal rules and constraints. The method includes the following key phases: the formation of sequences of logical facts; selection of temporal rules and constraints; classification based on a comparison of rules and constraints. Logical facts are represented as predicates on event attributes and reflect the state of the control object. Logical rules define valid sequences of logical facts. Performing a classification by successive comparisons of constraints and weights of the rules makes it possible to more effectively identify the anomalous state since the comparison of the constraints reduces the subset of facts comparing to the current state. The method creates conditions for improving management efficiency in the context of incomplete information on the state of a complex object by using logical inference in knowledge bases for anomalous states of such control objects
    corecore