7 research outputs found

    ELLIS: Interactive exploration of Linked Data on the level of induced schema patterns

    Get PDF
    We present ELLIS, a demo to browse the Linked Data cloud on the level of induced schema patterns. To this end, we define schema-level patterns of RDF types and properties to identify how entities described by type sets are connected by property sets. We show that schema-level patterns can be aggregated and extracted from large Linked Data sets using efficient algorithms for mining frequent item sets. A subsequent visualisation of such patterns enables users to quickly understand which type of information is modelled on the Linked Data cloud and how this information is interconnected

    Automatic schema extraction from RDF data

    Get PDF
    The Resource Description Framework (RDF) is a model for the representation of semantic data. RDF allows the storage of information without a fixed schema. This provides more flexibility but the lack of a fixed schema poses a significant entry barrier to the utilisation of the stored data. The SPARQL language is used for querying an RDF database. Several works exist in the domain of schema extraction from SPARQL end- points. Most tend to provide a visual representation of the schema, rather than an immediately usable output. Many of these solutions perform a very thorough and lengthy extraction unsuitable for a web application environment and some are not even available online. This thesis introduces TypeSPARQ, an open-source web application for ex- tracting schemata from SPARQL endpoints. TypeSPARQ creates a visualisation of the endpoint's schema and offers options for exporting it. TypeSPARQ in- tegrates with LDKit, which provides type-safe access to SPARQL endpoints for TypeScript applications. These tools combined offer TypeScript developers a seamless process from endpoint exploration to integrating the endpoint within their projects. 1Resource Description Framework (RDF) je datový model pro reprezentaci sémantických dat. RDF umožňuje uložení informací bez pevného schématu. To poskytuje více flexibility, ale absence pevného schématu představuje významnou vstupní bariéru pro využití těchto dat. Jazyk SPARQL se používá pro dotazování databází s RDF daty. Existuje několik řešení v oblasti extrakce schématu s využitím jazyka SPARQL. Většina řešení se přiklání k poskytnutí vizuální reprezentace schématu spíše než k tvorbě okamžitě použitelné výstupu. Mnoho těchto řešení provádí velmi důkladnou a dlouhou extrakci, která není vhodná pro prostředí webové aplikace, a některé nejsou dokonce dostupné online. Tato práce představuje TypeSPARQ, open-source webovou aplikaci pro ex- trakci schématu ze SPARQL endpointů. TypeSPARQ vytváří vizualizaci schématu koncového bodu a nabízí možnosti jeho exportu. TypeSPARQ je in- tegrovaná s knihovnou LDKit, která poskytuje bezpečný typovaný přístup ke SPARQL endpointům pro TypeScript aplikace. Tyto nástroje dohromady nabízejí TypeScriptovým vývojářům bezproblémový průběh vývoje od průzkumu koncového bodu po jeho integraci do projektů. 1Katedra softwarového inženýrstvíDepartment of Software EngineeringFaculty of Mathematics and PhysicsMatematicko-fyzikální fakult

    Instance-Based Lossless Summarization of Knowledge Graph With Optimized Triples and Corrections (IBA-OTC)

    Get PDF
    Knowledge graph (KG) summarization facilitates efficient information retrieval for exploring complex structural data. For fast information retrieval, it requires processing on redundant data. However, it necessitates the completion of information in a summary graph. It also saves computational time during data retrieval, storage space, in-memory visualization, and preserving structure after summarization. State-of-the-art approaches summarize a given KG by preserving its structure at the cost of information loss. Additionally, the approaches not preserving the underlying structure, compromise the summarization ratio by focusing only on the compression of specific regions. In this way, these approaches either miss preserving the original facts or the wrong prediction of inferred information. To solve these problems, we present a novel framework for generating a lossless summary by preserving the structure through super signatures and their corresponding corrections. The proposed approach summarizes only the naturally overlapped instances while maintaining its information and preserving the underlying Resource Description Framework RDF graph. The resultant summary is composed of triples with positive, negative, and star corrections that are optimized by the smart calling of two novel functions namely merge and disperse . To evaluate the effectiveness of our proposed approach, we perform experiments on nine publicly available real-world knowledge graphs and obtain a better summarization ratio than state-of-the-art approaches by a margin of 10% to 30% with achieving its completeness, correctness, and compactness. In this way, the retrieval of common events and groups by queries is accelerated in the resultant graph

    RDF graph summarization: principles, techniques and applications (tutorial)

    Get PDF
    International audienceThe explosion in the amount of the RDF on the Web has lead to the need to explore, query and understand such data sources. The task is challenging due to the complex and heterogeneous structure of RDF graphs which, unlike relational databases, do not come with a structure-dictating schema. Summarization has been applied to RDF data to facilitate these tasks. Its purpose is to extract concise and meaningful information from RDF knowledge bases, representing their content as faithfully as possible. There is no single concept of RDF summary, and not a single but many approaches to build such summaries; the summarization goal, and the main computational tools employed for summarizing graphs, are the main factors behind this diversity. This tutorial presents a structured analysis and comparison existing works in the area of RDF summarization; it is based upon a recent survey which we co-authored with colleagues [3]. We present the concepts at the core of each approach, outline their main technical aspects and implementation. We conclude by identifying the most pertinent summarization method for different usage scenarios, and discussing areas where future effort is needed

    ETSI SmartM2M Technical Report 103715; Study for oneM2M; Discovery and Query solutions analysis & selection

    Get PDF
    The oneM2M system has implemented basic native discovery capabilities. In order to enhance the semantic capabilities of the oneM2M architecture by providing solid contributions to the oneM2M standards, four Technical Reports have been developed. Each of them is the outcome of a special study phase: requirements, study, simulation and standardization phase. The present document covers the second phase and provides the basis for the other documents. It identifies, defines and analyses relevant approaches with respect to the use cases and requirements developed in ETSI TR 103 714 The most appropriate one will be selected

    TermPicker: Empfehlungen von Vokabulartermen für die Wiederverwendung beim Modellieren von Linked Open Data

    Get PDF
    Reusing terms from Resource Description Framework (RDF) vocabularies when modeling data as Linked Open Data (LOD) is difficult and without additional guidance far from trivial. This work proposes and evaluates TermPicker: a novel approach alleviating this situation by recommending vocabulary terms based on the information how other data providers modeled their data as LOD. TermPicker gathers such information and represents it via so- called schema-level patterns (SLPs), which are used to calculate a ranked list of RDF vocabulary term recommendations. The ranking of the recommendations is based either on the machine learning approach "Learning To Rank" (L2R) or on the data mining approach "Association Rule" mining (AR). TermPicker is evaluated in a two-fold way. First, an automated cross-validation evaluates TermPicker’s prediction based on the Mean Average Precision (MAP) as well as the Mean Reciprocal Rank at the first five positions (MRR@5). Second, a user study examines which of the recommendation methods (L2R vs. AR) aids real users more to reuse RDF vocabulary terms in a practical setting. The participants, i.e., TermPicker’s potential users, are asked to reuse vocabulary terms while modeling three data sets as LOD, but they receive either L2R-based recommendations, AR-based recommendation, or no recommendations. The results of the cross-validation show that using SLPs, TermPicker achieves 35% higher MAP and MRR@5 values compared to using solely the features based on the typical reuse strategies. Both the L2R-based and the AR-based recommendation methods were able to calculate lists of recommendations with MAP = 0.75 and MRR@5 = 0.80. However, the results of the user study show that the majority of the participants favor the AR-based recommendations. The outcome of this work demonstrates that TermPicker alleviates the situation of searching for classes and properties used by other data providers on the LOD cloud for representing similar data
    corecore