Search CORE

8 research outputs found

From natural language questions to SPARQL queries: a pattern-based approach

Author: Arning Ann-Katrin
Sattler Kai-Uwe
Steinmetz Nadine
Publication venue
Publication date: 01/01/2019
Field of study

Linked Data knowledge bases are valuable sources of knowledge which give insights, reveal facts about various relationships and provide a large amount of metadata in well-structured form. Although the format of semantic information – namely as RDF(S) – is kept simple by representing each fact as a triple of subject, property and object, the access to the knowledge is only available using SPARQL queries on the data. Therefore, Question Answering (QA) systems provide a user-friendly way to access any type of knowledge base and especially for Linked Data sources to get insight into the semantic information. As RDF(S) knowledge bases are usually structured in the same way and provide per se semantic metadata about the contained information, we provide a novel approach that is independent from the underlying knowledge base. Thus, the main contribution of our proposed approach constitutes the simple replaceability of the underlying knowledge base. The algorithm is based on general question and query patterns and only accesses the knowledge base for the actual query generation and execution. This paper presents the proposed approach and an evaluation in comparison to state-of-the-art Linked Data approaches for challenges of QA systems

Digitale Bibliothek Thüringen

Vamsa: Automated Provenance Tracking in Data Science Scripts

Author: Angelino E.
Cheney J.
Freire J.
Garcia R.
Ikeda R.
Ives Z.
Meng X.
Miao H.
Pedregosa F.
Pimentel J. F.
Prokhorenkova L.
Psallidas F.
Rule A.
Schelter S.
Schelter S.
Shu K.
Vartak M.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/07/2020
Field of study

There has recently been a lot of ongoing research in the areas of fairness, bias and explainability of machine learning (ML) models due to the self-evident or regulatory requirements of various ML applications. We make the following observation: All of these approaches require a robust understanding of the relationship between ML models and the data used to train them. In this work, we introduce the ML provenance tracking problem: the fundamental idea is to automatically track which columns in a dataset have been used to derive the features/labels of an ML model. We discuss the challenges in capturing such information in the context of Python, the most common language used by data scientists. We then present Vamsa, a modular system that extracts provenance from Python scripts without requiring any changes to the users' code. Using 26K real data science scripts, we verify the effectiveness of Vamsa in terms of coverage, and performance. We also evaluate Vamsa's accuracy on a smaller subset of manually labeled data. Our analysis shows that Vamsa's precision and recall range from 90.4% to 99.1% and its latency is in the order of milliseconds for average size scripts. Drawing from our experience in deploying ML models in production, we also present an example in which Vamsa helps automatically identify models that are affected by data corruption issues

arXiv.org e-Print Archive

Crossref

Explaining Queries over Web Tables to Non-Experts

Author: Berant Jonathan
Deutch Daniel
Globerson Amir
Milo Tova
Wolfson Tomer
Publication venue
Publication date: 14/08/2018
Field of study

Designing a reliable natural language (NL) interface for querying tables has been a longtime goal of researchers in both the data management and natural language processing (NLP) communities. Such an interface receives as input an NL question, translates it into a formal query, executes the query and returns the results. Errors in the translation process are not uncommon, and users typically struggle to understand whether their query has been mapped correctly. We address this problem by explaining the obtained formal queries to non-expert users. Two methods for query explanations are presented: the first translates queries into NL, while the second method provides a graphic representation of the query cell-based provenance (in its execution on a given table). Our solution augments a state-of-the-art NL interface over web tables, enhancing it in both its training and deployment phase. Experiments, including a user study conducted on Amazon Mechanical Turk, show our solution to improve both the correctness and reliability of an NL interface.Comment: Short paper version to appear in ICDE 201

arXiv.org e-Print Archive

Crossref

Umsetzung von Provenance-Anfragen in Big-Data-Analytics-Umgebungen

Author: Auge Tanja
Publication venue
Publication date: 28/09/2017
Field of study

Ziel der Arbeit ist die Adaption von Techniken der Provenance-Anfragen why, where und how in Umgebungen, die statt einfacher Anfragen wie Selektion, Projektion und Verbund auch OLAP-Operationen und weitere Machine-Learning-Algorithmen benutzen. Die ausschließlich extensionalen Provenance-Antworten werden dabei durch Provenance-Polynome sowie (minimalen) Zeugenbasen gegeben. Die Erweiterung des CHASE-Algorithmus für Datenbanken um eine BACKCHASE-Phase zur Provenance-Antwort-Bewertung ermöglicht so die Bestimmung des CHASE-Inversentyps (exakt/relaxt/ergebnisäquivalent) einer gegebenen Anfrage

Rostocker Dokumentenserver

Universität Rostock, Lehrstuhl Datenbank- und Informationssysteme: Dbis Repository

Maximizing User Domain Expertise to Clarify Oblique Specifications of Relational Queries

Author: Baik Christopher
Publication venue
Publication date
Field of study

While there is abundant access to data management technology today, working with data is still challenging for the average user. One common means of manipulating data is with SQL on relational databases, but this requires knowledge of SQL as well as the database's schema and contents. Consequently, previous work has proposed oblique query specification (OQS) methods such as natural language or programming-by-example to allow users to imprecisely specify their query intent. These methods, however, suffer from either low precision or low expressivity and, in addition, produce a list of candidate SQL queries that make it difficult for users to select their final target query. My thesis is that OQS systems should maximize user domain expertise to triangulate the user's desired query. First, I demonstrate how to leverage previously-issued SQL queries to improve the accuracy of natural language interfaces. Second, I propose a system allowing users to specify a query with both natural language and programming-by-example. Finally, I develop a system where users provide feedback on system-suggested tuples to select a SQL query from a set of candidate queries generated by an OQS system.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/155114/1/cjbaik_1.pd

Deep Blue Documents at the University of Michigan

Provenance for natural language queries

Author: Brgisser P.
Cohn D.
Davidson S. B.
Franconi E.
Glavic B.
Ives Z. G.
Marneffe M.
Simmhan Y. L.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref