9 research outputs found

    Analytical Queries on Vanilla RDF Graphs with a Guided Query Builder Approach

    Get PDF
    International audienceAs more and more data are available as RDF graphs, the availability of tools for data analytics beyond semantic search becomes a key issue of the Semantic Web. Previous work require the modelling of data cubes on top of RDF graphs. We propose an approach that directly answers analytical queries on unmodified (vanilla) RDF graphs by exploiting the computation features of SPARQL 1.1. We rely on the NF design pattern to design a query builder that completely hides SPARQL behind a verbalization in natural language; and that gives intermediate results and suggestions at each step. Our evaluations show that our approach covers a large range of use cases, scales well on large datasets, and is easier to use than writing SPARQL queries

    Question Answering on RDF Data Cubes

    No full text
    The Semantic Web, a Web of Data, is an extension of the World Wide Web (WWW), a Web of Documents. A large amount of such data is freely available as Linked Open Data (LOD) for many areas of knowledge, forming the LOD Cloud. While this data conforms to the Resource Description Framework (RDF) and can thus be processed by machines, users need to master a formal query language and learn a specific vocabulary. Semantic Question Answering (SQA) systems remove those access barriers by letting the user ask natural language questions that the systems translate into formal queries. Thus, the research area of SQA plays an important role for the acceptance and benefit of the Semantic Web. The original contributions of this thesis to SQA are: First, we survey the current state of the art of SQA. We complement existing surveys by systematically identifying SQA publications in the chosen timeframe. 72 publications describing 62 different systems are systematically and manually selected using predefined inclusion and exclusion criteria out of 1960 candidates from the end of 2010 to July 2015. The survey identifies common challenges, structured solutions, and recommendations on research opportunities for future systems. From that point on, we focus on multidimensional numerical data, which is immensely valuable as it influences decisions in health care, policy and finance, among others. With the growth of the open data movement, more and more of it is becoming freely available. A large amount of such data is included in the LOD cloud using the RDF Data Cube (RDC) vocabulary. However, consuming multidimensional numerical data requires experts and specialized tools. Traditional SQA systems cannot process RDCs because their meta-structure is opaque to applications that expect facts to be encoded in single triples, This motivates our second contribution, the design and implementation of the first SQA algorithm on RDF Data Cubes. We kick-start this new research subfield by creating a user question corpus and a benchmark over multiple data sets. The evaluation of our system on the benchmark, which is included in the public Question Answering over Linked Data (QALD) challenge of 2016, shows the feasibility of the approach, but also highlights challenges, which we discuss in detail as a starting point for future work in the field. The benchmark is based on our final contribution, the addition of 955 financial government spending data sets to the LOD cloud by transforming data sets of the OpenSpending project to RDF Data Cubes. Open spending data has the power to reduce corruption by increasing accountability and strengthens democracy because voters can make better informed decisions. An informed and trusting public also strengthens the government itself because it is more likely to commit to large projects. OpenSpending.org is an open platform that provides public finance data from governments around the world. The transformation result, called LinkedSpending, consists of more than five million planned and carried out financial transactions in 955 data sets from all over the world as Linked Open Data and is freely available and openly licensed.:1 Introduction 1.1 Motivation 1.2 Research Questions and Contributions 1.3 Thesis Structure 2 Preliminaries 2.1 Semantic Web 2.1.1 URIs and URLs 2.1.2 Linked Data 2.1.3 Resource Description Framework 2.1.4 Ontologies 2.2 Question Answering 2.2.1 History 2.2.2 Definitions 2.2.3 Evaluation 2.2.4 SPARQL 2.2.5 Controlled Vocabulary 2.2.6 Faceted Search 2.2.7 Keyword Search 2.3 Data Cubes 3 Related Work 3.1 Semantic Question Answering 3.1.1 Surveys 3.1.2 Evaluation Campaigns 3.1.3 System Frameworks 3.2 Question Answering on RDF Data Cubes 3.3 RDF Data Cube Data Sets 4 Systematic Survey of Semantic Question Answering 4.1 Methodology 4.1.1 Inclusion Criteria 4.1.2 Exclusion Criteria 4.1.3 Result 4.2 Systems 4.2.1 Implementation 4.2.2 Examples 4.2.3 Answer Presentation 4.3 Challenges 4.3.1 Lexical Gap 4.3.2 Ambiguity 4.3.3 Multilingualism 4.3.4 Complex Queries 4.3.5 Distributed Knowledge 4.3.6 Procedural, Temporal and Spatial Questions 4.3.7 Templates 5 Question Answering on RDF Data Cubes 5.1 Question Corpus 5.2 Corpus Analysis 5.3 Data Cube Operations 5.4 Algorithm 5.4.1 Preprocessing 5.4.2 Matching 5.4.3 Combining Matches to Constraints 5.4.4 Execution 6 LinkedSpending 6.1 Choice of Source Data 6.1.1 Government Spending 6.1.2 OpenSpending 6.2 OpenSpending Source Data 6.3 Conversion of OpenSpending to RDF 6.4 Publishing 6.5 Overview over the Data Sets 6.6 Data Set Quality Analysis 6.6.1 Intrinsic Dimensions 6.6.2 Representational Dimensions 6.7 Evaluation 6.7.1 Experimental Setup and Benchmark 6.7.2 Discussion 7 Conclusion 7.1 Research Question Summary 7.2 SQA Survey 7.2.1 Lexical Gap 7.2.2 Ambiguity 7.2.3 Multilingualism 7.2.4 Complex Operators 7.2.5 Distributed Knowledge 7.2.6 Procedural, Temporal and Spatial Data 7.2.7 Templates 7.2.8 Future Research 7.3 CubeQA 7.4 LinkedSpending 7.4.1 Shortcomings 7.4.2 Future Work Bibliography Appendix A The CubeQA Question Corpus Appendix B The QALD-6 Task 3 Benchmark Questions B.1 Training Data B.2 Testing Dat

    A Question Answering System For Interacting with SDMX Databases

    Get PDF
    International audienceAmong existing sources of Open Data, statistical databases published by national and international organizations such as the International Monetary Fund, the United Nations, OECD etc. stand out for their high quality and valuable insights. However, technical means to interact easily with such sources are currently lacking. This article presents an effort to build an interactive Question Answering system for accessing statistical databases structured according to the SDMX (Statistical Data and Metadata Exchange) standard promoted by the abovementioned institutions. We describe the system architectures, its main technical choices, and present a preliminary evaluation. The system is available online

    A SPARQL 1.1 Query Builder for the Data Analytics of Vanilla RDF Graphs

    Get PDF
    As more and more data are available as RDF graphs, the availability of tools for data analytics beyond semantic search becomes a key issue of the Semantic Web. Previous work has focused on adapting OLAP-like approaches and question answering by modelling RDF data cubes on top of RDF graphs. We propose a more direct – and more expressive – approach by guiding users in the incremental building of SPARQL 1.1 queries that combine several computation features (aggregations, expressions, bindings and filters), and by evaluating those queries on unmodified (vanilla) RDF graphs. We rely on the N<A>F design pattern to hide SPARQL behind a natural language interface, and to provide results and suggestions at every step. We have implemented our approach on top of Sparklis, and we report on three experiments to assess its expressivity, usability, and scalability

    Finding and sharing GIS methods based on the questions they answer

    Get PDF
    Geographic information has become central for data scientists of many disciplines to put their analyses into a spatio-temporal perspective. However, just as the volume and variety of data sources on the Web grow, it becomes increasingly harder for analysts to be familiar with all the available geospatial tools, including toolboxes in Geographic Information Systems (GIS), R packages, and Python modules. Even though the semantics of the questions answered by these tools can be broadly shared, tools and data sources are still divided by syntax and platform-specific technicalities. It would, therefore, be hugely beneficial for information science if analysts could simply ask questions in generic and familiar terms to obtain the tools and data necessary to answer them. In this article, we systematically investigate the analytic questions that lie behind a range of common GIS tools, and we propose a semantic framework to match analytic questions and tools that are capable of answering them. To support the matching process, we define a tractable subset of SPARQL, the query language of the Semantic Web, and we propose and test an algorithm for computing query containment. We illustrate the identification of tools to answer user questions on a set of common user requests

    Survey on Challenges of Question Answering in the Semantic Web

    Get PDF
    Höffner K, Walter S, Marx E, Usbeck R, Lehmann J, Ngomo A-CN. Survey on Challenges of Question Answering in the Semantic Web. Semantic Web Journal. 2017;8(6):895-920

    CubeQA-question answering on RDF data cubes

    No full text
    Statistical data in the form of RDF Data Cubes is becoming increasingly valuable as it influences decisions in areas such as health care, policy and finance. While a growing amount is becoming freely available through the open data movement, this data is opaque to laypersons. Semantic Question Answering (SQA) technologies provide intuitive access via free-form natural language queries but general SQA systems cannot process RDF Data Cubes. On the intersection between RDF Data Cubes and SQA, we create a new subfield of SQA, called RDCQA. We create an RDQCA benchmark as task 3 of the QALD-6 evaluation challenge, to stimulate further research and enable quantitative comparison between RDCQA systems. We design and evaluate the domain independent CubeQA algorithm, which is the first RDCQA system and achieves a global F1 score of 0.43 on the QALD6T3-test benchmark, showing that RDCQA is feasible

    Making linked-data accessible: A review

    Get PDF
    Linked-Data (LD) is a paradigm that utilises the RDF triplestore to describe numerous pieces of knowledge linked together. When an entity is retrieved in LD, its associated data becomes instantly obtainable. SPARQL is the query language that allows users to access LD. On the other hand, SPARQL has a complicated syntax that necessitates previous knowledge. Thus, in order to encourage the end-users to use LD, it is crucial to allow them to obtain the data efficiently, in addition to improving their overall experience. Instead of manually constructing SPARQL queries, this paper investigates and reviews existing methods in which LD can be accessed using various tools and techniques, including query builders, visualisation approaches, and several LD applications. We then identify gaps within the literature and highlight future research directions
    corecore