2,256 research outputs found

    Ontology Enrichment by Discovering Multi-Relational Association Rules from Ontological Knowledge Bases

    Get PDF
    International audienceIn the Semantic Web context, OWL ontologies represent the con-ceptualization of domains of interest while the corresponding as-sertional knowledge is given by the heterogeneous Web resources referring to them. Being strongly decoupled, ontologies and assertion can be out-of-sync. An ontology can be incomplete, noisy and sometimes inconsistent with regard to the actual usage of its conceptual vocabulary in the assertions. Data mining can support the discovery of hidden knowledge patterns in the data, to enrich the ontologies. We present a method for discovering multi-relational association rules, coded in SWRL, from ontological knowledge bases. Unlike state-of-the-art approaches, the method is able to take the intensional knowledge into account. Furthermore, since discovered rules are represented in SWRL, they can be straightforwardly integrated within the ontology, thus (i) enriching its expressive power and (ii) augmenting the assertional knowledge that can be derived. Discovered rules may also suggest new axioms to be added to the ontology. We performed experiments on publicly available ontologies validating the performances of our approach

    An Evolutionary Algorithm for Discovering Multi-Relational Association Rules in the Semantic Web

    Get PDF
    International audienceIn the Semantic Web context, OWL ontologies represent the conceptualization of domains of interest while the corresponding assertional knowledge is given by RDF data referring to them. Because of its open, distributed, and collaborative nature, such knowledge can be incomplete, noisy, and sometimes inconsistent. By exploiting the evidence coming from the assertional data, we aim at discovering hidden knowledge patterns in the form of multi-relational association rules while taking advantage of the intensional knowledge available in ontological knowledge bases. An evolutionary search method applied to populated ontological knowledge bases is proposed for finding rules with a high inductive power. The proposed method, EDMAR, uses problem-aware genetic operators, echoing the refinement operators of ILP, and takes the intensional knowledge into account, which allows it to restrict and guide the search. Discovered rules are coded in SWRL, and as such they can be straightforwardly integrated within the ontology, thus enriching its expressive power and augmenting the assertional knowledge that can be derived. Additionally , discovered rules may also suggest new axioms to be added to the ontology. We performed experiments on publicly available ontologies, validating the performances of our approach and comparing them with the main state-of-the-art systems

    Constructing Metrics for Evaluating Multi-Relational Association Rules in the Semantic Web from Metrics for Scoring Association Rules

    Get PDF
    International audienceWe propose a method to construct asymmetric metrics for evaluating the quality of multi-relational association rules coded in the form of SWRL rules. These metrics are derived from metrics for scoring association rules. We use each constructed metric as a fitness function for evolutionary inductive programming employed to discover hidden knowledge patterns (represented in SWRL) from assertional data of ontological knowledge bases. This new knowledge can be integrated easily within the ontology to enrich it. In addition, we also carry out a search for the best metric to score candidate multi-relational association rules in the evolutionary approach by experiment. We performed experiments on three publicly available ontologies validating the performances of our approach and comparing them with the main state-of-the-art systems

    Discovering Implicational Knowledge in Wikidata

    Full text link
    Knowledge graphs have recently become the state-of-the-art tool for representing the diverse and complex knowledge of the world. Examples include the proprietary knowledge graphs of companies such as Google, Facebook, IBM, or Microsoft, but also freely available ones such as YAGO, DBpedia, and Wikidata. A distinguishing feature of Wikidata is that the knowledge is collaboratively edited and curated. While this greatly enhances the scope of Wikidata, it also makes it impossible for a single individual to grasp complex connections between properties or understand the global impact of edits in the graph. We apply Formal Concept Analysis to efficiently identify comprehensible implications that are implicitly present in the data. Although the complex structure of data modelling in Wikidata is not amenable to a direct approach, we overcome this limitation by extracting contextual representations of parts of Wikidata in a systematic fashion. We demonstrate the practical feasibility of our approach through several experiments and show that the results may lead to the discovery of interesting implicational knowledge. Besides providing a method for obtaining large real-world data sets for FCA, we sketch potential applications in offering semantic assistance for editing and curating Wikidata

    Empowering Knowledge Bases: a Machine Learning Perspective

    Get PDF
    The construction of Knowledge Bases requires quite often the intervention of knowledge engineering and domain experts, resulting in a time consuming task. Alternative approaches have been developed for building knowledge bases from existing sources of information such as web pages and crowdsourcing; seminal examples are NELL, DBPedia, YAGO and several others. With the goal of building very large sources of knowledge, as recently for the case of Knowledge Graphs, even more complex integration processes have been set up, involving multiple sources of information, human expert intervention, crowdsourcing. Despite signi - cant e orts for making Knowledge Graphs as comprehensive and reliable as possible, they tend to su er of incompleteness and noise, due to the complex building process. Nevertheless, even for highly human curated knowledge bases, cases of incompleteness can be found, for instance with disjointness axioms missing quite often. Machine learning methods have been proposed with the purpose of re ning, enriching, completing and possibly raising potential issues in existing knowledge bases while showing the ability to cope with noise. The talk will concentrate on classes of mostly symbol-based machine learning methods, speci cally focusing on concept learning, rule learning and disjointness axioms learning problems, showing how the developed methods can be exploited for enriching existing knowledge bases. During the talk it will be highlighted as, a key element of the illustrated solutions, is represented by the integration of: background knowledge, deductive reasoning and the evidence coming from the mass of the data. The last part of the talk will be devoted to the presentation of an approach for injecting background knowledge into numeric-based embedding models to be used for predictive tasks on Knowledge Graphs

    Requirements and Use Cases ; Report I on the sub-project Smart Content Enrichment

    Get PDF
    In this technical report, we present the results of the first milestone phase of the Corporate Smart Content sub-project "Smart Content Enrichment". We present analyses of the state of the art in the fields concerning the three working packages defined in the sub-project, which are aspect-oriented ontology development, complex entity recognition, and semantic event pattern mining. We compare the research approaches related to our three research subjects and outline briefly our future work plan

    Using Association Rules to Enrich Arabic Ontology

    Get PDF
    In this article, we propose the use of a minimal generic base of associative rules between term association rules, to automatically enrich an existing domain ontology. Initially, non-redundant association rules between terms are extracted from an Arabic corpus. Then, the matching of the candidate terms is done through the matching between the concepts of the initial ontology and the premises of the association rules, with three distance measures that we define

    Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

    Get PDF
    While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves
    • …
    corecore