14 research outputs found

    Inductive logic programming at 30: a new introduction

    Full text link
    Inductive logic programming (ILP) is a form of machine learning. The goal of ILP is to induce a hypothesis (a set of logical rules) that generalises training examples. As ILP turns 30, we provide a new introduction to the field. We introduce the necessary logical notation and the main learning settings; describe the building blocks of an ILP system; compare several systems on several dimensions; describe four systems (Aleph, TILDE, ASPAL, and Metagol); highlight key application areas; and, finally, summarise current limitations and directions for future research.Comment: Paper under revie

    Towards Even More Irresistible Axiom Weakening

    Get PDF
    Axiom weakening is a technique that allows for a fine-grained repair of inconsistent ontologies. Its main advantage is that it repairs on- tologies by making axioms less restrictive rather than by deleting them, employing the use of refinement operators. In this paper, we build on pre- viously introduced axiom weakening for ALC, and make it much more irresistible by extending its definitions to deal with SROIQ, the expressive and decidable description logic underlying OWL 2 DL. We extend the definitions of refinement operator to deal with SROIQ constructs, in particular with role hierarchies, cardinality constraints and nominals, and illustrate its application. Finally, we discuss the problem of termi- nation of an iterated weakening procedure

    Combining Representation Learning with Logic for Language Processing

    Get PDF
    The current state-of-the-art in many natural language processing and automated knowledge base completion tasks is held by representation learning methods which learn distributed vector representations of symbols via gradient-based optimization. They require little or no hand-crafted features, thus avoiding the need for most preprocessing steps and task-specific assumptions. However, in many cases representation learning requires a large amount of annotated training data to generalize well to unseen data. Such labeled training data is provided by human annotators who often use formal logic as the language for specifying annotations. This thesis investigates different combinations of representation learning methods with logic for reducing the need for annotated training data, and for improving generalization.Comment: PhD Thesis, University College London, Submitted and accepted in 201

    Efficient Extraction and Query Benchmarking of Wikipedia Data

    Get PDF
    Knowledge bases are playing an increasingly important role for integrating information between systems and over the Web. Today, most knowledge bases cover only specific domains, they are created by relatively small groups of knowledge engineers, and it is very cost intensive to keep them up-to-date as domains change. In parallel, Wikipedia has grown into one of the central knowledge sources of mankind and is maintained by thousands of contributors. The DBpedia (http://dbpedia.org) project makes use of this large collaboratively edited knowledge source by extracting structured content from it, interlinking it with other knowledge bases, and making the result publicly available. DBpedia had and has a great effect on the Web of Data and became a crystallization point for it. Furthermore, many companies and researchers use DBpedia and its public services to improve their applications and research approaches. However, the DBpedia release process is heavy-weight and the releases are sometimes based on several months old data. Hence, a strategy to keep DBpedia always in synchronization with Wikipedia is highly required. In this thesis we propose the DBpedia Live framework, which reads a continuous stream of updated Wikipedia articles, and processes it. DBpedia Live processes that stream on-the-fly to obtain RDF data and updates the DBpedia knowledge base with the newly extracted data. DBpedia Live also publishes the newly added/deleted facts in files, in order to enable synchronization between our DBpedia endpoint and other DBpedia mirrors. Moreover, the new DBpedia Live framework incorporates several significant features, e.g. abstract extraction, ontology changes, and changesets publication. Basically, knowledge bases, including DBpedia, are stored in triplestores in order to facilitate accessing and querying their respective data. Furthermore, the triplestores constitute the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triplestore implementations. We introduce a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triplestores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triplestores and provide results for the popular triplestore implementations Virtuoso, Sesame, Apache Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triplestores is by far less homogeneous than suggested by previous benchmarks. Further, one of the crucial tasks when creating and maintaining knowledge bases is validating their facts and maintaining the quality of their inherent data. This task include several subtasks, and in thesis we address two of those major subtasks, specifically fact validation and provenance, and data quality The subtask fact validation and provenance aim at providing sources for these facts in order to ensure correctness and traceability of the provided knowledge This subtask is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. We present DeFacto (Deep Fact Validation), which is an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact. On the other hand the subtask of data quality maintenance aims at evaluating and continuously improving the quality of data of the knowledge bases. We present a methodology for assessing the quality of knowledge bases’ data, which comprises of a manual and a semi-automatic process. The first phase includes the detection of common quality problems and their representation in a quality problem taxonomy. In the manual process, the second phase comprises of the evaluation of a large number of individual resources, according to the quality problem taxonomy via crowdsourcing. This process is accompanied by a tool wherein a user assesses an individual resource and evaluates each fact for correctness. The semi-automatic process involves the generation and verification of schema axioms. We report the results obtained by applying this methodology to DBpedia

    Exploiting prior knowledge and latent variable representations for the statistical modeling and probabilistic querying of large knowledge graphs

    Get PDF
    Large knowledge graphs increasingly add great value to various applications that require machines to recognize and understand queries and their semantics, as in search or question answering systems. These applications include Google search, Bing search, IBM’s Watson, but also smart mobile assistants as Apple’s Siri, Google Now or Microsoft’s Cortana. Popular knowledge graphs like DBpedia, YAGO or Freebase store a broad range of facts about the world, to a large extent derived from Wikipedia, currently the biggest web encyclopedia. In addition to these freely accessible open knowledge graphs, commercial ones have also evolved including the well-known Google Knowledge Graph or Microsoft’s Satori. Since incompleteness and veracity of knowledge graphs are known problems, the statistical modeling of knowledge graphs has increasingly gained attention in recent years. Some of the leading approaches are based on latent variable models which show both excellent predictive performance and scalability. Latent variable models learn embedding representations of domain entities and relations (representation learning). From these embeddings, priors for every possible fact in the knowledge graph are generated which can be exploited for data cleansing, completion or as prior knowledge to support triple extraction from unstructured textual data as successfully demonstrated by Google’s Knowledge-Vault project. However, large knowledge graphs impose constraints on the complexity of the latent embeddings learned by these models. For graphs with millions of entities and thousands of relation-types, latent variable models are required to exploit low dimensional embeddings for entities and relation-types to be tractable when applied to these graphs. The work described in this thesis extends the application of latent variable models for large knowledge graphs in three important dimensions. First, it is shown how the integration of ontological constraints on the domain and range of relation-types enables latent variable models to exploit latent embeddings of reduced complexity for modeling large knowledge graphs. The integration of this prior knowledge into the models leads to a substantial increase both in predictive performance and scalability with improvements of up to 77% in link-prediction tasks. Since manually designed domain and range constraints can be absent or fuzzy, we also propose and study an alternative approach based on a local closed-world assumption, which derives domain and range constraints from observed data without the need of prior knowledge extracted from the curated schema of the knowledge graph. We show that such an approach also leads to similar significant improvements in modeling quality. Further, we demonstrate that these two types of domain and range constraints are of general value to latent variable models by integrating and evaluating them on the current state of the art of latent variable models represented by RESCAL, Translational Embedding, and the neural network approach used by the recently proposed Google Knowledge Vault system. In the second part of the thesis it is shown that the just mentioned three approaches all perform well, but do not share many commonalities in the way they model knowledge graphs. These differences can be exploited in ensemble solutions which improve the predictive performance even further. The third part of the thesis concerns the efficient querying of the statistically modeled knowledge graphs. This thesis interprets statistically modeled knowledge graphs as probabilistic databases, where the latent variable models define a probability distribution for triples. From this perspective, link-prediction is equivalent to querying ground triples which is a standard functionality of the latent variable models. For more complex querying that involves e.g. joins and projections, the theory on probabilistic databases provides evaluation rules. In this thesis it is shown how the intrinsic features of latent variable models can be combined with the theory of probabilistic databases to realize efficient probabilistic querying of the modeled graphs

    Interactive Multiagent Adaptation of Individual Classification Models for Decision Support

    Get PDF
    An essential prerequisite for informed decision-making of intelligent agents is direct access to empirical knowledge for situation assessment. This contribution introduces an agent-oriented knowledge management framework for learning agents facing impediments in self-contained acquisition of classification models. The framework enables the emergence of dynamic knowledge networks among benevolent agents forming a community of practice in open multiagent systems. Agents in an advisee role are enabled to pinpoint learning impediments in terms of critical training cases and to engage in a goal-directed discourse with an advisor panel to overcome identified issues. The advisors provide arguments supporting and hence explaining those critical cases. Using such input as additional background knowledge, advisees can adapt their models in iterative relearning organized as a search through model space. An extensive empirical evaluation in two real-world domains validates the presented approach

    Collected Papers (on Physics, Artificial Intelligence, Health Issues, Decision Making, Economics, Statistics), Volume XI

    Get PDF
    This eleventh volume of Collected Papers includes 90 papers comprising 988 pages on Physics, Artificial Intelligence, Health Issues, Decision Making, Economics, Statistics, written between 2001-2022 by the author alone or in collaboration with the following 84 co-authors (alphabetically ordered) from 19 countries: Abhijit Saha, Abu Sufian, Jack Allen, Shahbaz Ali, Ali Safaa Sadiq, Aliya Fahmi, Atiqa Fakhar, Atiqa Firdous, Sukanto Bhattacharya, Robert N. Boyd, Victor Chang, Victor Christianto, V. Christy, Dao The Son, Debjit Dutta, Azeddine Elhassouny, Fazal Ghani, Fazli Amin, Anirudha Ghosha, Nasruddin Hassan, Hoang Viet Long, Jhulaneswar Baidya, Jin Kim, Jun Ye, Darjan Karabašević, Vasilios N. Katsikis, Ieva Meidutė-Kavaliauskienė, F. Kaymarm, Nour Eldeen M. Khalifa, Madad Khan, Qaisar Khan, M. Khoshnevisan, Kifayat Ullah,, Volodymyr Krasnoholovets, Mukesh Kumar, Le Hoang Son, Luong Thi Hong Lan, Tahir Mahmood, Mahmoud Ismail, Mohamed Abdel-Basset, Siti Nurul Fitriah Mohamad, Mohamed Loey, Mai Mohamed, K. Mohana, Kalyan Mondal, Muhammad Gulfam, Muhammad Khalid Mahmood, Muhammad Jamil, Muhammad Yaqub Khan, Muhammad Riaz, Nguyen Dinh Hoa, Cu Nguyen Giap, Nguyen Tho Thong, Peide Liu, Pham Huy Thong, Gabrijela Popović‬‬‬‬‬‬‬‬‬‬, Surapati Pramanik, Dmitri Rabounski, Roslan Hasni, Rumi Roy, Tapan Kumar Roy, Said Broumi, Saleem Abdullah, Muzafer Saračević, Ganeshsree Selvachandran, Shariful Alam, Shyamal Dalapati, Housila P. Singh, R. Singh, Rajesh Singh, Predrag S. Stanimirović, Kasan Susilo, Dragiša Stanujkić, Alexandra Şandru, Ovidiu Ilie Şandru, Zenonas Turskis, Yunita Umniyati, Alptekin Ulutaș, Maikel Yelandi Leyva Vázquez, Binyamin Yusoff, Edmundas Kazimieras Zavadskas, Zhao Loon Wang.‬‬‬
    corecore