Search CORE

616 research outputs found

gMark: Schema-Driven Generation of Graphs and Queries

Author: Advokaat Nicky
Bagan Guillaume
Bonifati Angela
Ciucanu Radu
Fletcher George H. L.
Lemay Aurélien
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 24/11/2016
Field of study

Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this paper, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints. We illustrate the flexibility and practical usability of gMark by showcasing the framework's capabilities in generating high quality graphs and workloads, and its ability to encode user-defined schemas across a variety of application domains.Comment: Accepted in November 2016. URL: http://ieeexplore.ieee.org/document/7762945/. in IEEE Transactions on Knowledge and Data Engineering 201

arXiv.org e-Print Archive

Crossref

Repository TU/e

Pure OAI Repository

HAL Clermont Université

INRIA a CCSD electronic archive server

HAL

Hal-Diderot

SWI-Prolog and the Web

Author: Gras
Huang
JAN WIELEMAKER
LOURENS VAN DER MEIJ
Mäkelä
Ramakrishnan
Wielemaker
Wielemaker
Wielemaker
ZHISHENG HUANG
Publication venue
Publication date: 06/11/2007
Field of study

Where Prolog is commonly seen as a component in a Web application that is either embedded or communicates using a proprietary protocol, we propose an architecture where Prolog communicates to other components in a Web application using the standard HTTP protocol. By avoiding embedding in external Web servers development and deployment become much easier. To support this architecture, in addition to the transfer protocol, we must also support parsing, representing and generating the key Web document types such as HTML, XML and RDF. This paper motivates the design decisions in the libraries and extensions to Prolog for handling Web documents and protocols. The design has been guided by the requirement to handle large documents efficiently. The described libraries support a wide range of Web applications ranging from HTML and XML documents to Semantic Web RDF processing. To appear in Theory and Practice of Logic Programming (TPLP)Comment: 31 pages, 24 figures and 2 tables. To appear in Theory and Practice of Logic Programming (TPLP

arXiv.org e-Print Archive

VU Research Portal

Crossref

International Migration, Integration and Social Cohesion online publications

UvA-DARE

NinSuna: a fully integrated platform for format-independent multimedia content adaptation and delivery using Semantic Web technologies

Author: De Neve Wesley
Mannens Erik
Paridaens Tom
Van de Walle Rik
Van Deursen Davy
Van Lancker Wim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

The current multimedia landscape is characterized by a significant heterogeneity in terms of coding and delivery formats, usage environments, and user preferences. The main contribution of this paper is a discussion of the design and functioning of a fully integrated platform for multimedia adaptation and delivery, called NinSuna. This platform is able to efficiently deal with the aforementioned heterogeneity in the present-day multimedia ecosystem, thanks to the use of format-agnostic adaptation engines (i.e., engines independent of the underlying coding format) and format-agnostic packaging engines (i.e., engines independent of the underlying delivery format). Moreover, NinSuna also provides a seamless integration between metadata standards and adaptation processes. Both our format-independent adaptation and packaging techniques rely on a model for multimedia bitstreams, describing the structural, semantic, and scalability properties of these multimedia streams. News sequences were used as a test case for our platform, enabling the user to select news fragments matching his/her specific interests and usage environment characteristics

Ghent University Academic Bibliography

Knowledge extraction from unstructured data and classification through distributed ontologies

Author: Rizzo Giuseppe
Publication venue
Publication date: 01/01/2012
Field of study

The World Wide Web has changed the way humans use and share any kind of information. The Web removed several access barriers to the information published and has became an enormous space where users can easily navigate through heterogeneous resources (such as linked documents) and can easily edit, modify, or produce them. Documents implicitly enclose information and relationships among them which become only accessible to human beings. Indeed, the Web of documents evolved towards a space of data silos, linked each other only through untyped references (such as hypertext references) where only humans were able to understand. A growing desire to programmatically access to pieces of data implicitly enclosed in documents has characterized the last efforts of the Web research community. Direct access means structured data, thus enabling computing machinery to easily exploit the linking of different data sources. It has became crucial for the Web community to provide a technology stack for easing data integration at large scale, first structuring the data using standard ontologies and afterwards linking them to external data. Ontologies became the best practices to define axioms and relationships among classes and the Resource Description Framework (RDF) became the basic data model chosen to represent the ontology instances (i.e. an instance is a value of an axiom, class or attribute). Data becomes the new oil, in particular, extracting information from semi-structured textual documents on the Web is key to realize the Linked Data vision. In the literature these problems have been addressed with several proposals and standards, that mainly focus on technologies to access the data and on formats to represent the semantics of the data and their relationships. With the increasing of the volume of interconnected and serialized RDF data, RDF repositories may suffer from data overloading and may become a single point of failure for the overall Linked Data vision. One of the goals of this dissertation is to propose a thorough approach to manage the large scale RDF repositories, and to distribute them in a redundant and reliable peer-to-peer RDF architecture. The architecture consists of a logic to distribute and mine the knowledge and of a set of physical peer nodes organized in a ring topology based on a Distributed Hash Table (DHT). Each node shares the same logic and provides an entry point that enables clients to query the knowledge base using atomic, disjunctive and conjunctive SPARQL queries. The consistency of the results is increased using data redundancy algorithm that replicates each RDF triple in multiple nodes so that, in the case of peer failure, other peers can retrieve the data needed to resolve the queries. Additionally, a distributed load balancing algorithm is used to maintain a uniform distribution of the data among the participating peers by dynamically changing the key space assigned to each node in the DHT. Recently, the process of data structuring has gained more and more attention when applied to the large volume of text information spread on the Web, such as legacy data, news papers, scientific papers or (micro-)blog posts. This process mainly consists in three steps: \emph{i)} the extraction from the text of atomic pieces of information, called named entities; \emph{ii)} the classification of these pieces of information through ontologies; \emph{iii)} the disambigation of them through Uniform Resource Identifiers (URIs) identifying real world objects. As a step towards interconnecting the web to real world objects via named entities, different techniques have been proposed. The second objective of this work is to propose a comparison of these approaches in order to highlight strengths and weaknesses in different scenarios such as scientific and news papers, or user generated contents. We created the Named Entity Recognition and Disambiguation (NERD) web framework, publicly accessible on the Web (through REST API and web User Interface), which unifies several named entity extraction technologies. Moreover, we proposed the NERD ontology, a reference ontology for comparing the results of these technologies. Recently, the NERD ontology has been included in the NIF (Natural language processing Interchange Format) specification, part of the Creating Knowledge out of Interlinked Data (LOD2) project. Summarizing, this dissertation defines a framework for the extraction of knowledge from unstructured data and its classification via distributed ontologies. A detailed study of the Semantic Web and knowledge extraction fields is proposed to define the issues taken under investigation in this work. Then, it proposes an architecture to tackle the single point of failure issue introduced by the RDF repositories spread within the Web. Although the use of ontologies enables a Web where data is structured and comprehensible by computing machinery, human users may take advantage of it especially for the annotation task. Hence, this work describes an annotation tool for web editing, audio and video annotation in a web front end User Interface powered on the top of a distributed ontology. Furthermore, this dissertation details a thorough comparison of the state of the art of named entity technologies. The NERD framework is presented as technology to encompass existing solutions in the named entity extraction field and the NERD ontology is presented as reference ontology in the field. Finally, this work highlights three use cases with the purpose to reduce the amount of data silos spread within the Web: a Linked Data approach to augment the automatic classification task in a Systematic Literature Review, an application to lift educational data stored in Sharable Content Object Reference Model (SCORM) data silos to the Web of data and a scientific conference venue enhancer plug on the top of several data live collectors. Significant research efforts have been devoted to combine the efficiency of a reliable data structure and the importance of data extraction techniques. This dissertation opens different research doors which mainly join two different research communities: the Semantic Web and the Natural Language Processing community. The Web provides a considerable amount of data where NLP techniques may shed the light within it. The use of the URI as a unique identifier may provide one milestone for the materialization of entities lifted from a raw text to real world object

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Recommended from our members

OptiqueVQS: A visual query system over ontologies for industry

Author: Arenas
Bevan
Bobed
Brunetti
Calvanese
Catarci
Catarci
Civili
Cuenca Grau
Damljanovic
Dey
Epstein
Giese
Glimm
Hogan
Kaufmann
Khalili
Kharlamov
Kharlamov
Kogalovsky
López
Marchionini
Nikolaou
Poggi
Qiu
Romero
Schraefel
Shneiderman
Skjæveland
Soylu
Soylu
Soylu
Soylu
Soylu
Soylu
Soylu
Spanos
Sutcliffe
ter Hofstede
Tran
Uren
Vega-Gorgojo
Vega-Gorgojo
Publication venue: 'IOS Press'
Publication date: 01/01/2017
Field of study

An important application of semantic technologies in industry has been the formalisation of information models using OWL 2 ontologies and the use of RDF for storing and exchanging application data. Moreover, legacy data can be virtualised as RDF using ontologies following the ontology-based data access (OBDA) approach. In all these applications, it is important to provide domain experts with query formulation tools for expressing their information needs in terms of queries over ontologies. In this work, we present such a tool, OptiqueVQS, which is designed based on our experience with OBDA applications in Statoil and Siemens and on best HCI practices for interdisciplinary engineering environments. OptiqueVQS implements a number of unique techniques distinguishing it from analogous query formulation systems. In particular, it exploits ontology projection techniques to enable graph-based navigation over an ontology during query construction. Secondly, while OptiqueVQS is primarily ontology driven, it exploits sampled data to enhance selection of data values for some data attributes. Finally, OptiqueVQS is built on well-grounded requirements, design rationale, and quality attributes. We evaluated OptiqueVQS with both domain experts and casual users and qualitatively compared our system against prominent visual systems for ontology-driven query formulation and exploration of semantic data. OptiqueVQS is available online and can be downloaded together with an example OBDA scenario

City Research Online

Crossref

NORA - Norwegian Open Research Archives

Survey over Existing Query and Transformation Languages

Author: Bolzer Oliver
Bry François
Furche Tim
Horrocks Ian
Kraus Michael
Orsini Renzo
Schaffert Sebastian
Publication venue
Publication date: 01/01/2004
Field of study

A widely acknowledged obstacle for realizing the vision of the Semantic Web is the inability of many current Semantic Web approaches to cope with data available in such diverging representation formalisms as XML, RDF, or Topic Maps. A common query language is the first step to allow transparent access to data in any of these formats. To further the understanding of the requirements and approaches proposed for query languages in the conventional as well as the Semantic Web, this report surveys a large number of query languages for accessing XML, RDF, or Topic Maps. This is the first systematic survey to consider query languages from all these areas. From the detailed survey of these query languages, a common classification scheme is derived that is useful for understanding and differentiating languages within and among all three areas

CiteSeerX

Open Access LMU

Towards a big data reference architecture

Author: Maier M.
Publication venue
Publication date: 01/01/2013
Field of study

Repository TU/e

Pure OAI Repository

On construction, performance, and diversification for structured queries on the semantic desktop

Author: Minack Enrico
Publication venue: Gottfried Wilhelm Leibniz Universität Hannover
Publication date: 01/01/2011
Field of study

[no abstract

Institutionelles Repositorium der Leibniz Universität Hannover

Luzzu - A Framework for Linked Data Quality Assessment

Author: Auer Sören
Debattista Jeremy
Lange Christoph
Publication venue
Publication date: 07/01/2016
Field of study

With the increasing adoption and growth of the Linked Open Data cloud [9], with RDFa, Microformats and other ways of embedding data into ordinary Web pages, and with initiatives such as schema.org, the Web is currently being complemented with a Web of Data. Thus, the Web of Data shares many characteristics with the original Web of Documents, which also varies in quality. This heterogeneity makes it challenging to determine the quality of the data published on the Web and to subsequently make this information explicit to data consumers. The main contribution of this article is LUZZU, a quality assessment framework for Linked Open Data. Apart from providing quality metadata and quality problem reports that can be used for data cleaning, LUZZU is extensible: third party metrics can be easily plugged-in the framework. The framework does not rely on SPARQL endpoints, and is thus free of all the problems that come with them, such as query timeouts. Another advantage over SPARQL based qual- ity assessment frameworks is that metrics implemented in LUZZU can have more complex functionality than triple matching. Using the framework, we performed a quality assessment of a number of statistical linked datasets that are available on the LOD cloud. For this evaluation, 25 metrics from ten different dimensions were implemented

arXiv.org e-Print Archive

CiteSeerX

Crossref

Fraunhofer-ePrints