129 research outputs found

    A case study on TUdatalib

    Get PDF
    Semantic Web and Linked Data technologies might solve issues originating from research data being published by independent providers. For maximum benefit from these technologies, metadata should be provided as standardized as possible. The Data Catalog Vocabulary (DCAT) is a W3C recommendation of potential value for Linked Data exposure of research data metadata. The suitability of DCAT for institutional research data repositories was investigated using the TUdatalib repository as study case. A model for TUdatalib metadata was developed based on the analysis of selected resources and guided by a draft of DCAT 3. The model allowed for providing the essential information about the repository structure and contents indicating suitability of the vocabulary and, conceptually, should permit automated data conversion from the repository system to DCAT 3. A loss of expressiveness comes from the omission of dataset series. Conformance with DCAT 3 class definitions led to a highly complex model, thus creating challenges with actual technical realizations. A comparative study revealed simpler models to be used at two other repositories, but implementation of the TUdatalib or a similar model would have potential to improve alignment to DCAT specifications. DCAT 3 was observed to be a promising option for Linked Data exposure of institutional research data repository metadata and the TUdatalib model might serve towards developing a general DCAT 3 application profile for institutional and other research data repositories

    Combining machine learning and semantic web: A systematic mapping study

    Full text link
    In line with the general trend in artificial intelligence research to create intelligent systems that combine learning and symbolic components, a new sub-area has emerged that focuses on combining Machine Learning components with techniques developed by the Semantic Web community - Semantic Web Machine Learning (SWeML). Due to its rapid growth and impact on several communities in thepast two decades, there is a need to better understand the space of these SWeML Systems, their characteristics, and trends. Yet, surveys that adopt principled and unbiased approaches are missing. To fill this gap, we performed a systematic study and analyzed nearly 500 papers published in the past decade in this area, where we focused on evaluating architectural and application-specific features. Our analysis identified a rapidly growing interest in SWeML Systems, with a high impact on several application domains and tasks. Catalysts for this rapid growth are the increased application of deep learning and knowledge graph technologies. By leveraging the in-depth understanding of this area acquired through this study, a further key contribution of this article is a classification system for SWeML Systems that we publish as ontology.</p

    Link Prediction of Weighted Triples for Knowledge Graph Completion Within the Scholarly Domain

    Get PDF
    Knowledge graphs (KGs) are widely used for modeling scholarly communication, performing scientometric analyses, and supporting a variety of intelligent services to explore the literature and predict research dynamics. However, they often suffer from incompleteness (e.g., missing affiliations, references, research topics), leading to a reduced scope and quality of the resulting analyses. This issue is usually tackled by computing knowledge graph embeddings (KGEs) and applying link prediction techniques. However, only a few KGE models are capable of taking weights of facts in the knowledge graph into account. Such weights can have different meanings, e.g. describe the degree of association or the degree of truth of a certain triple. In this paper, we propose the Weighted Triple Loss, a new loss function for KGE models that takes full advantage of the additional numerical weights on facts and it is even tolerant to incorrect weights. We also extend the Rule Loss, a loss function that is able to exploit a set of logical rules, in order to work with weighted triples. The evaluation of our solutions on several knowledge graphs indicates significant performance improvements with respect to the state of the art. Our main use case is the large-scale AIDA knowledge graph, which describes 21 million research articles. Our approach enables to complete information about affiliation types, countries, and research topics, greatly improving the scope of the resulting scientometrics analyses and providing better support to systems for monitoring and predicting research dynamics

    Generating knowledge graphs by employing Natural Language Processing and Machine Learning techniques within the scholarly domain

    Get PDF
    The continuous growth of scientific literature brings innovations and, at the same time, raises new challenges. One of them is related to the fact that its analysis has become difficult due to the high volume of published papers for which manual effort for annotations and management is required. Novel technological infrastructures are needed to help researchers, research policy makers, and companies to time-efficiently browse, analyse, and forecast scientific research. Knowledge graphs i.e., large networks of entities and relationships, have proved to be effective solution in this space. Scientific knowledge graphs focus on the scholarly domain and typically contain metadata describing research publications such as authors, venues, organizations, research topics, and citations. However, the current generation of knowledge graphs lacks of an explicit representation of the knowledge presented in the research papers. As such, in this paper, we present a new architecture that takes advantage of Natural Language Processing and Machine Learning methods for extracting entities and relationships from research publications and integrates them in a large-scale knowledge graph. Within this research work, we i) tackle the challenge of knowledge extraction by employing several state-of-the-art Natural Language Processing and Text Mining tools, ii) describe an approach for integrating entities and relationships generated by these tools, iii) show the advantage of such an hybrid system over alternative approaches, and vi) as a chosen use case, we generated a scientific knowledge graph including 109,105 triples, extracted from 26,827 abstracts of papers within the Semantic Web domain. As our approach is general and can be applied to any domain, we expect that it can facilitate the management, analysis, dissemination, and processing of scientific knowledge

    Extracting and Cleaning RDF Data

    Get PDF
    The RDF data model has become a prevalent format to represent heterogeneous data because of its versatility. The capability of dismantling information from its native formats and representing it in triple format offers a simple yet powerful way of modelling data that is obtained from multiple sources. In addition, the triple format and schema constraints of the RDF model make the RDF data easy to process as labeled, directed graphs. This graph representation of RDF data supports higher-level analytics by enabling querying using different techniques and querying languages, e.g., SPARQL. Anlaytics that require structured data are supported by transforming the graph data on-the-fly to populate the target schema that is needed for downstream analysis. These target schemas are defined by downstream applications according to their information need. The flexibility of RDF data brings two main challenges. First, the extraction of RDF data is a complex task that may involve domain expertise about the information required to be extracted for different applications. Another significant aspect of analyzing RDF data is its quality, which depends on multiple factors including the reliability of data sources and the accuracy of the extraction systems. The quality of the analysis depends mainly on the quality of the underlying data. Therefore, evaluating and improving the quality of RDF data has a direct effect on the correctness of downstream analytics. This work presents multiple approaches related to the extraction and quality evaluation of RDF data. To cope with the large amounts of data that needs to be extracted, we present DSTLR, a scalable framework to extract RDF triples from semi-structured and unstructured data sources. For rare entities that fall on the long tail of information, there may not be enough signals to support high-confidence extraction. Towards this problem, we present an approach to estimate property values for long tail entities. We also present multiple algorithms and approaches that focus on the quality of RDF data. These include discovering quality constraints from RDF data, and utilizing machine learning techniques to repair errors in RDF data

    SEMANTIC WEB-BASED MANAGEMENT OF ROUTING CONFIGURATIONS

    Get PDF
    Abstract-Today, network operators typically reason about network behaviour by observing the effects of a particular configuration in operation. This configuration process typically involves logging configuration changes and rolling back to a previous version when a problem arises. Advanced network operators (more each day) use policy-based routing languages to define the routing configuration and tools based on systematic verification techniques to ensure that operational behaviour is consistent with the intended behaviour. These tools help operators to reason about properties of routing protocols. However, these languages and tools work in low-level, i.e. they focus on properties, parameters, and elements of routing protocols. However, network operators receive high-level policies that must be refined to low level parameters before they can be applied. These high-level policies should consider other properties (e.g. extensibility or reasoning capabilities), parameters (e.g. time period, localization or QoS parameters), and elements (e.g. AAA individuals or resources), when the network configuration is defined. We believe that there is a need of broader approaches in languages and tools for defining routing configurations that are more powerful and integrated to other network elements. This article provides the main ideas behind the specification of routing policies using formal languages which enable the description of semantics. (1) Corresponding author; telephone: +34 868 887646; Fax: +34 868 884151 These semantics make easier the policy refinement process and allows describing an automated process for doing conflict detection on these policies

    Ontology-based Course Teacher Assignment within Universities

    Get PDF
    Educational institutions suffer from the enormous amount of data that keeps growing continuously. These data are usually scattered and unorganised, and it comes from different resources with different formats. Besides, modernization vision within these institutions aims to reduce human action and replace it with automatic devices interactions. To have the full benefit from these data and use it within the modern systems, they have to be readable and understandable by machines. Those data and knowledge with semantic descriptions make an easy way to monitor and manage decision processes within universities to solve many educational challenges. In this study, an educational ontology is developed to model the semantic courses and academic profiles in universities and use it to solve the challenge of assigning the most appropriate academic teacher to teach a specific course

    Using Patterns for Keyword Search in RDF Graphs *

    Get PDF
    ABSTRACT An increasing number of RDF datasets are available on the Web. Querying RDF data requires the knowledge of a query language such as SPARQL; it also requires some information describing the content of these datasets. The goal of our work is to facilitate the querying of RDF datasets, and we present an approach for enabling users to search in RDF data using keywords. We introduce the notion of pattern to integrate external knowledge in the search process, which increases the quality of the results
    • …
    corecore