Search CORE

99 research outputs found

Optimizing Federated Queries Based on the Physical Design of a Data Lake

Author: Rohde Philipp D.
Vidal Maria-Esther
Publication venue: Aachen : RWTH
Publication date: 01/01/2020
Field of study

The optimization of query execution plans is known to be crucial for reducing the query execution time. In particular, query optimization has been studied thoroughly for relational databases over the past decades. Recently, the Resource Description Framework (RDF) became popular for publishing data on the Web. As a consequence, federations composed of different data models like RDF and relational databases evolved. One type of these federations are Semantic Data Lakes where every data source is kept in its original data model and semantically annotated with ontologies or controlled vocabularies. However, state-of-the-art query engines for federated query processing over Semantic Data Lakes often rely on optimization techniques tailored for RDF. In this paper, we present query optimization techniques guided by heuristics that take the physical design of a Data Lake into account. The heuristics are implemented on top of Ontario, a SPARQL query engine for Semantic Data Lakes. Using sourcespecific heuristics, the query engine is able to generate more efficient query execution plans by exploiting the knowledge about indexes and normalization in relational databases. We show that heuristics which take the physical design of the Data Lake into account are able to speed up query processing

arXiv.org e-Print Archive

Repositorium für Naturwissenschaften und Technik

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Mapping Analysis in Ontology-based Data Access: Algorithms and Complexity (Extended Abstract)

Author: A Poggi
C Civili
D Calvanese
G Gottlob
M Giese
M Rodríguez-Muro
R Fagin
R Kontchakov
Publication venue
Publication date: 01/01/2015
Field of study

Crossref

NORA - Norwegian Open Research Archives

Archivio della ricerca- Università di Roma La Sapienza

Learning To Scale Up Search-Driven Data Integration

Author: Yan Zhepeng
Publication venue: ScholarlyCommons
Publication date: 01/01/2016
Field of study

A recent movement to tackle the long-standing data integration problem is a compositional and iterative approach, termed “pay-as-you-go” data integration. Under this model, the objective is to immediately support queries over “partly integrated” data, and to enable the user community to drive integration of the data that relate to their actual information needs. Over time, data will be gradually integrated. While the pay-as-you-go vision has been well-articulated for some time, only recently have we begun to understand how it can be manifested into a system implementation. One branch of this effort has focused on enabling queries through keyword search-driven data integration, in which users pose queries over partly integrated data encoded as a graph, receive ranked answers generated from data and metadata that is linked at query-time, and provide feedback on those answers. From this user feedback, the system learns to repair bad schema matches or record links. Many real world issues of uncertainty and diversity in search-driven integration remain open. Such tasks in search-driven integration require a combination of human guidance and machine learning. The challenge is how to make maximal use of limited human input. This thesis develops three methods to scale up search-driven integration, through learning from expert feedback: (1) active learning techniques to repair links from small amounts of user feedback; (2) collaborative learning techniques to combine users’ conflicting feedback; and (3) debugging techniques to identify where data experts could best improve integration quality. We implement these methods within the Q System, a prototype of search-driven integration, and validate their effectiveness over real-world datasets

ScholarlyCommons@Penn

Diseño e implementación de un sistema de almacenamiento de datos sobre políticas de privacidad basado en técnicas de ETL

Author: Sánchez Fernández Adrián
Publication venue
Publication date: 01/01/2022
Field of study

El problema de la integración de la información resulta de la gran dispersión que existe de la información en distintos almacenamiento. En este trabajo se ha resuelto dicho problema de integración para información proveniente de distintas fuentes de datos sobre políticas de privacidad. Para resolver el problema se han utilizado técnicas de extracción carga y transformación de la información sobre un sistema de almacenamiento centralizado.The information integrity problem comes from the huge dispersion in the information over the different storage systems. In this project, it has been resolved for privacy policy datasets coming from different sources by using methods for extracting, loading and transforming the information into a centralized storage system.Departamento de Informática (Arquitectura y Tecnología de Computadores, Ciencias de la Computación e Inteligencia Artificial, Lenguajes y Sistemas Informáticos)Grado en Ingeniería Informátic

Repositorio Documental de la Universidad de Valladolid

Mapping Analysis in Ontology-based Data Access: Algorithms and Complexity (Extended Abstract)

Author: Domenico Fabio Savo
Domenico Lembo
Evgenij Thorstensen
José Mora
Riccardo Rosati
Publication venue
Publication date: 11/04/2020
Field of study

CiteSeerX

Linked Data y ontologías en una herramienta gráfica web

Author: Braun Germán
Cecchi Laura
Fillottrani Pablo Rubén
Michelan Gastón
Publication venue
Publication date: 01/04/2017
Field of study

Esta línea de investigación se desarrolla en forma colaborativa entre docentesinvestigadores de la Universidad Nacional del Comahue y de la Universidad Nacional del Sur, en el marco de proyectos de investigación financiados por las universidades antes mencionadas. El objetivo general del trabajo de investigación es permitir la interacción entre fuentes de datos enlazados disponibles en la Web y la herramienta cliente-servidor para el modelado conceptual gráfico con soporte de razonamiento: crowd. De este modo, se espera poder navegar cualquier ontología asociada a los datos y observar sus relaciones de una manera gráfica, esto último, con el fin de facilitar su interpretación al usuario convencional.Eje: Innovación en Sistemas de Software.Red de Universidades con Carreras en Informática (RedUNCI

Centro de Servicios en Gestión de Información

Servicio de Difusión de la Creación Intelectual

m-tables: Representing Missing Data

Author: Koutris Paraschos
Lang Willis
Naughton Jeffrey
Sundarmurthy Bruhathi
Tannen Val
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 20th International Conference on Database Theory (ICDT 2017)
Publication date: 01/01/2017
Field of study

Representation systems have been widely used to capture different forms of incomplete data in various settings. However, existing representation systems are not expressive enough to handle the more complex scenarios of missing data that can occur in practice: these could vary from missing attribute values, missing a known number of tuples, or even missing an unknown number of tuples. In this work, we propose a new representation system called m-tables, that can represent many different types of missing data. We show that m-tables form a closed, complete and strong representation system under both set and bag semantics and are strictly more expressive than conditional tables under both the closed and open world assumptions. We further study the complexity of computing certain and possible answers in m-tables. Finally, we discuss how to "interpret" m-tables through a novel labeling scheme that marks a type of generalized tuples as certain or possible

Dagstuhl Research Online Publication Server

Emergent semantics in distributed knowledge management

Author: C. Aiello
E. Damiani
M. Scannapieco
M. Viviani
P. Ceravolo
T. Catarci
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2008
Field of study

Organizations and enterprises have developed complex data and information exchange systems that are now vital for their daily operations. Currently available systems, however, face a major challenge. On todays global information infrastructure, data semantics is more and more context- and time-dependent, and cannot be fixed once and for all at design time. Identifying emerging relationships among previously unrelated information items (e.g., during data interchange) may dramatically increase their business value. This chapter introduce and discuss the notion of Emergent Semantics (ES), where both the representation of semantics and the discovery of the proper interpretation of symbols are seen as the result of a selforganizing process performed by distributed agents, exchanging symbols and adaptively developing the proper interpretation via multi-party cooperation and conflict resolution. Emergent data semantics is dynamically dependent on the collective behaviour of large communities of agents, which may have different and even conflicting interests and agendas. This is a research paradigm interpreting semantics from a pragmatic prospective. The chapter introduce this notion providing a discussion on the principles, research area and current state of the art

AIR Universita degli studi di Milano

A Review of Accessing Big Data with Significant Ontologies

Author: Hammad Jehad Abdulhamid
Sleeman Jumah Y.J
Publication venue: 'State University of Malang (UM)'
Publication date: 01/12/2020
Field of study

Ontology Based Data Access (OBDA) is a recently proposed approach which is able to provide a conceptual view on relational data sources. It addresses the problem of the direct access to big data through providing end-users with an ontology that goes between users and sources in which the ontology is connected to the data via mappings. We introduced the languages used to represent the ontologies and the mapping assertions technique that derived the query answering from sources. Query answering is divided into two steps: (i) Ontology rewriting, in which the query is rewritten with respect to the ontology into new query; (ii) mapping rewriting the query that obtained from previous step reformulating it over the data sources using mapping assertions. In this survey, we aim to study the earlier works done by other researchers in the fields of ontology, mapping and query answering over data sources

Portal Jurnal Elektronik Universitas Negeri Malang

Directory of Open Access Journals

Voyager: Data Discovery and Integration for Data Science

Author: Bogatu Alex
Douthwaite Mark
Freitas Andre
Paton Norman W.
Publication venue
Publication date: 23/03/2022
Field of study

The University of Manchester - Institutional Repository