Search CORE

14 research outputs found

Federated Query Processing over Heterogeneous Data Sources in a Semantic Data Lake

Author: Endris Kemele M.
Publication venue: Universitäts- und Landesbibliothek Bonn
Publication date
Field of study

Data provides the basis for emerging scientific and interdisciplinary data-centric applications with the potential of improving the quality of life for citizens. Big Data plays an important role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Open data initiatives have encouraged the publication of Big Data by exploiting the decentralized nature of the Web, allowing for the availability of heterogeneous data generated and maintained by autonomous data providers. Consequently, the growing volume of data consumed by different applications raise the need for effective data integration approaches able to process a large volume of data that is represented in different format, schema and model, which may also include sensitive data, e.g., financial transactions, medical procedures, or personal data. Data Lakes are composed of heterogeneous data sources in their original format, that reduce the overhead of materialized data integration. Query processing over Data Lakes require the semantic description of data collected from heterogeneous data sources. A Data Lake with such semantic annotations is referred to as a Semantic Data Lake. Transforming Big Data into actionable knowledge demands novel and scalable techniques for enabling not only Big Data ingestion and curation to the Semantic Data Lake, but also for efficient large-scale semantic data integration, exploration, and discovery. Federated query processing techniques utilize source descriptions to find relevant data sources and find efficient execution plan that minimize the total execution time and maximize the completeness of answers. Existing federated query processing engines employ a coarse-grained description model where the semantics encoded in data sources are ignored. Such descriptions may lead to the erroneous selection of data sources for a query and unnecessary retrieval of data, affecting thus the performance of query processing engine. In this thesis, we address the problem of federated query processing against heterogeneous data sources in a Semantic Data Lake. First, we tackle the challenge of knowledge representation and propose a novel source description model, RDF Molecule Templates, that describe knowledge available in a Semantic Data Lake. RDF Molecule Templates (RDF-MTs) describes data sources in terms of an abstract description of entities belonging to the same semantic concept. Then, we propose a technique for data source selection and query decomposition, the MULDER approach, and query planning and optimization techniques, Ontario, that exploit the characteristics of heterogeneous data sources described using RDF-MTs and provide a uniform access to heterogeneous data sources. We then address the challenge of enforcing privacy and access control requirements imposed by data providers. We introduce a privacy-aware federated query technique, BOUNCER, able to enforce privacy and access control regulations during query processing over data sources in a Semantic Data Lake. In particular, BOUNCER exploits RDF-MTs based source descriptions in order to express privacy and access control policies as well as their automatic enforcement during source selection, query decomposition, and planning. Furthermore, BOUNCER implements query decomposition and optimization techniques able to identify query plans over data sources that not only contain the relevant entities to answer a query, but also are regulated by policies that allow for accessing these relevant entities. Finally, we tackle the problem of interest based update propagation and co-evolution of data sources. We present a novel approach for interest-based RDF update propagation that consistently maintains a full or partial replication of large datasets and deal with co-evolution

bonndoc – Der Publikationsserver der Universität Bonn

Recommended from our members

Federated Query Processing

Author: Endris Kemele M.
Graux Damien
Vidal Maria-Esther
Publication venue: Cham : Springer
Publication date: 01/01/2020
Field of study

Big data plays a relevant role in promoting both manufacturing and scientific development through industrial digitization and emerging interdisciplinary research. Semantic web technologies have also experienced great progress, and scientific communities and practitioners have contributed to the problem of big data management with ontological models, controlled vocabularies, linked datasets, data models, query languages, as well as tools for transforming big data into knowledge from which decisions can be made. Despite the significant impact of big data and semantic web technologies, we are entering into a new era where domains like genomics are projected to grow very rapidly in the next decade. In this next era, integrating big data demands novel and scalable tools for enabling not only big data ingestion and curation but also efficient large-scale exploration and discovery. Federated query processing techniques provide a solution to scale up to large volumes of data distributed across multiple data sources. Federated query processing techniques resort to source descriptions to identify relevant data sources for a query, as well as to find efficient execution plans that minimize the total execution time of a query and maximize the completeness of the answers. This chapter summarizes the main characteristics of a federated query engine, reviews the current state of the field, and outlines the problems that still remain open and represent grand challenges for the area

Repositorium für Naturwissenschaften und Technik

SPARQL Query Result Explanation for Linked Data

Author: Endris Kemele M.
Gandon Fabien
Hasan Rakebul
Publication venue: HAL CCSD
Publication date: 19/10/2014
Field of study

International audienceIn this paper, we present an approach to explain SPARQL query results for Linked Data using why-provenance. We present a non-annotation-based algorithm to generate why-provenance and show its feasibility for Linked Data. We present an explanation-aware federated query processor prototype and show the presentation of our explanations. We present a user study to evaluate the impacts of our explanations. Our study shows that our query result explanations are helpful for end users to understand the result derivations and make trust judgments on the results

INRIA a CCSD electronic archive server

HAL Descartes

HAL-Rennes 1

Recommended from our members

Preface

Author: Chaves-Fraga
Comerio Marco
David Colpaert Pieter
Endris Kemele M.
Kaffee Lucie-Aimee
Sadeghi Mersedeh
Vidal Maria-Esther
Publication venue: Aachen, Germany : RWTH Aachen
Publication date: 01/01/2019
Field of study

This volumne presents the proceedings of the 1st International Workshop on Approaches for Making Data Interoperable (AMAR 2019) and 1st International Workshop on Semantics for Transport (Sem4Tra) held in Karlsruhe, Germany, September 9, 2019, co-located with SEMANTiCS 2019. Interoperability of data is an important factor to make transportation data accessible, therefore we present the topics alongside each other in this proceedings

Repositorium für Naturwissenschaften und Technik

Lymphocyte predominant cells detect Moraxella catarrhalis-derived antigens in nodular lymphocyte-predominant Hodgkin lymphoma.

Author: Bohle R.M.
de Leval L.
Eichenauer D.A.
Engert A.
Fadle N.
Hansmann M.L.
Hartmann S.
Kemele M.
Kempf VAJ
Kim Y.J.
Küppers R.
Neumann F.
Nimmesgern A.
Pfreundschuh M.
Preuss K.D.
Regitz E.
Schneider N.
Sundström C.
Thurner L.
von Müller L.
Vornanen M.
Weniger M.A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2020
Field of study

Nodular lymphocyte-predominant Hodgkin lymphoma (NLPHL) is a rare lymphoma of B-cell origin with frequent expression of functional B-cell receptors (BCRs). Here we report that expression cloning followed by antigen screening identifies DNA-directed RNA polymerase beta' (RpoC) from Moraxella catarrhalis as frequent antigen of BCRs of IgD + LP cells. Patients show predominance of HLA-DRB1*04/07 and the IgVH genes encode extraordinarily long CDR3s. High-titer, light-chain-restricted anti-RpoC IgG1/κ-type serum-antibodies are additionally found in these patients. RpoC and MID/hag, a superantigen co-expressed by Moraxella catarrhalis that is known to activate IgD + B cells by binding to the Fc domain of IgD, have additive activation effects on the BCR, the NF-κB pathway and the proliferation of IgD + DEV cells expressing RpoC-specific BCRs. This suggests an additive antigenic and superantigenic stimulation of B cells with RpoC-specific IgD + BCRs under conditions of a permissive MHC-II haplotype as a model of NLPHL lymphomagenesis, implying future treatment strategies

Kölner UniversitätsPublikationsServer

Serveur académique lausannois

Publikationer från Uppsala Universitet

Digitala Vetenskapliga Arkivet - Academic Archive On-line

Trepo - Institutional Repository of Tampere University

SPARQL Query Result Explanation for Linked Data

Author: Endris Kemele M.
Gandon Fabien
Hasan Rakebul
Publication venue: HAL CCSD
Publication date: 19/10/2014
Field of study

HAL Descartes

Ranking knowledge graphs by capturing knowledge about languages and labels

Author: Ell Basil
Endris Kemele M.
Gómez-Pérez Asunción
Hassanzadeh Oktie
Katerenchuk Denys
Mahdisoltani Farzaneh
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 23/09/2019
Field of study

Capturing knowledge about the mulitilinguality of a knowledge graph is of supreme importance to understand its applicability across multiple languages. Several metrics have been proposed for describing mulitilinguality at the level of a whole knowledge graph. Albeit enabling the understanding of the ecosystem of knowledge graphs in terms of the utilized languages, they are unable to capture a fine-grained description of the languages in which the different entities and properties of the knowledge graph are represented. This lack of representation prevents the comparison of existing knowledge graphs in order to decide which are the most appropriate for a multilingual application

Southampton (e-Prints Soton)

Crossref

What does Dataset Reuse tell us about Quality?

Author: Demidova Elena
Endris Kemele M.
Giménez-Garcıa José-M.
Lange Christoph
Thakkar Harsh
Zimmermann Antoine
Publication venue: HAL CCSD
Publication date: 18/05/2016
Field of study

Following the Linked Data principles means maximising thereusability of data over the Web. A strong reason for reusing a dataset isthat it is considered useful for some application. Considering the broaddefinition of data quality as \fitness for use", the question arises whetherquality of linked datasets and their actual reuse correlate, or, in otherwords, whether certain quality characteristics can be optimised to increasethe potential reuse of the datasets. Reuse of datasets becomesapparent when datasets are referred to from other datasets, papers, ordiscussions within the community. It can thus be measured, similarly tocitations of papers. Many other aspects of Linked Data quality have alsobeen defined in a measurable way, i.e. as quality metrics. In this paperwe present metrics to quantify dataset reuse in a scientific communityand investigate their correlation with the quality metrics discussed in theliterature

HAL-UJM

HAL-EMSE

Question Answering on Linked Data: Challenges and Future Directions

Author: Endris Kemele M.
Jaya Kumar Ashwini
Lange Christoph
Lukovnikov Denis
Shekarpour Saeedeh
Singh Kuldeep
Thakkar Harsh
Publication venue
Publication date: 01/01/2016
Field of study

Question Answering (QA) systems are becoming the inspiring model for the future of search engines. While, recently, datasets underlying QA systems have been promoted from unstructured datasets to structured datasets with semantically highly enriched metadata, question answering systems are still facing serious challenges and are therefore not meeting users' expectations. This paper provides an exhaustive insight of challenges known so far for building QA systems, with a special focus on employing structured data (i.e. knowledge graphs).It thus helps researchers to easily spot gaps to fill with their future research agendas

Fraunhofer-ePrints

SDM-TIB/SDM-RDFizer: v4.7.2.6

Author: Daniel Doña
David Chaves
Dylan Van Assche
eiglesias34
Katrin Leinweber
Kemele M. Endris
Maria-Esther Vidal
Philipp D. Rohde
Samaneh Jozashoori
Vincent Emonet
Vladimir Alexiev
Publication venue: Zenodo
Publication date: 14/12/2022
Field of study

An Efficient RML-Compliant Engine for Knowledge Graph Constructio

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY