540 research outputs found

    Semantic Interpretation of User Queries for Question Answering on Interlinked Data

    Get PDF
    The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. The contributions of this work are as follows: 1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation approach). We employed a Hidden Markov Model, whose parameters were bootstrapped with different distribution functions. 2. A novel method for constructing federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph, which ultimately renders a corresponding SPARQL query. 3. Regarding the problem of vocabulary mismatch, our contribution is divided into two parts, First, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Data. We evaluate the effectiveness of each feature individually as well as their combinations, employing Support Vector Machines and Decision Trees. Second, we propose a novel method for automatic query expansion, which employs a Hidden Markov Model to obtain the optimal tuples of derived words. 4. We provide two benchmarks for two different tasks to the community of question answering systems. The first one is used for the task of question answering on interlinked datasets (i.e. federated queries over Linked Data). The second one is used for the vocabulary mismatch task. We evaluate the accuracy of our approach using measures like mean reciprocal rank, precision, recall, and F-measure on three interlinked life-science datasets as well as DBpedia. The results of our accuracy evaluation demonstrate the effectiveness of our approach. Moreover, we study the runtime of our approach in its sequential as well as parallel implementations and draw conclusions on the scalability of our approach on Linked Data

    LEVERAGING BIBLIOGRAPHIC RDF DATA FOR KEYWORD PREDICTION WITH ASSOCIATION RULE MINING (ARM)

    Get PDF
    The Semantic Web ( Web 3.03.0) has been proposed as an efficient way to access the increasingly large amounts of data on the internet. The Linked Open Data Cloud project at present is the major effort to implement the concepts of the Seamtic Web, addressing the problems of in homogeneity and large data volumes. RKBExplorer is one of many repositories implementing Open Data and contains considerable bibliographic information. Th is paper discusses bibliographic data data, an important part of cloud data. Effective searching of bibliographic datasets can be a challenge as many of the papers residing in these databases do not have sufficient or comprehensive keyword information. In these cases however, a search engine based on RKBExplorer is only able to use information to retrieve papers based on author names and title of papers without keywords keywords. In this paper we attempt to address this problem by using the data mining algorithm Association Rule Mining (ARM ) to develop keywords based on features retrieved from Resource Description Framework (RDF) data within a bibliographic citation. We have demonstrate the applicability of this method for predicting missing keywords for bibliographic entries in several typical databases

    Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams

    Full text link
    Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems (CPS) present novel challenges to Big Data platforms for performing online analytics. Ubiquitous sensors from IoT deployments are able to generate data streams at high velocity, that include information from a variety of domains, and accumulate to large volumes on disk. Complex Event Processing (CEP) is recognized as an important real-time computing paradigm for analyzing continuous data streams. However, existing work on CEP is largely limited to relational query processing, exposing two distinctive gaps for query specification and execution: (1) infusing the relational query model with higher level knowledge semantics, and (2) seamless query evaluation across temporal spaces that span past, present and future events. These allow accessible analytics over data streams having properties from different disciplines, and help span the velocity (real-time) and volume (persistent) dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP) framework that provides domain-aware knowledge query constructs along with temporal operators that allow end-to-end queries to span across real-time and persistent streams. We translate this query model to efficient query execution over online and offline data streams, proposing several optimizations to mitigate the overheads introduced by evaluating semantic predicates and in accessing high-volume historic data streams. The proposed X-CEP query model and execution approaches are implemented in our prototype semantic CEP engine, SCEPter. We validate our query model using domain-aware CEP queries from a real-world Smart Power Grid application, and experimentally analyze the benefits of our optimizations for executing these queries, using event streams from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems, October 27, 201

    Sematch: semantic entity search from knowledge graph

    Get PDF
    As an increasing amount of the knowledge graph is published as Linked Open Data, semantic entity search is required to develop new applications. However, the use of structured query languages such as SPARQL is challenging for non-skilled users who need to master the query language as well as acquiring knowledge of the underlying ontology of Linked Data knowledge bases. In this article, we propose the Sematch framework for entity search in the knowledge graph that combines natural language query processing, entity linking, entity type linking and semantic similarity based query expansion. The system has been validated in a dataset and a prototype has been developed that translates natural language queries into SPARQL

    ARTE: Automated Generation of Realistic Test Inputs for Web APIs

    Get PDF
    Automated test case generation for web APIs is a thriving research topic, where test cases are frequently derived from the API specification. However, this process is only partially automated since testers are usually obliged to manually set meaningful valid test inputs for each input parameter. In this article, we present ARTE, an approach for the automated extraction of realistic test data for web APIs from knowledge bases like DBpedia. Specifically, ARTE leverages the specification of the API parameters to automatically search for realistic test inputs using natural language processing, search-based, and knowledge extraction techniques. ARTE has been integrated into RESTest, an open-source testing framework for RESTful APIs, fully automating the test case generation process. Evaluation results on 140 operations from 48 real-world web APIs show that ARTE can efficiently generate realistic test inputs for 64.9% of the target parameters, outperforming the state-of-the-art approach SAIGEN (31.8%). More importantly, ARTE supported the generation of over twice as many valid API calls (57.3%) as random generation (20%) and SAIGEN (26%), leading to a higher failure detection capability and uncovering several real-world bugs. These results show the potential of ARTE for enhancing existing web API testing tools, achieving an unprecedented level of automationJunta de Andalucía APOLO (US-1264651)Junta de Andalucía EKIPMENT-PLUS (P18-FR-2895)Ministerio de Ciencia, Innovación y Universidades RTI2018-101204-B-C21 (HORATIO)Ministerio de Ciencia, Innovación y Universidades RED2018-102472-

    Using Linguistic Analysis to Translate Arabic Natural Language Queries to SPARQL

    Full text link
    The logic-based machine-understandable framework of the Semantic Web often challenges naive users when they try to query ontology-based knowledge bases. Existing research efforts have approached this problem by introducing Natural Language (NL) interfaces to ontologies. These NL interfaces have the ability to construct SPARQL queries based on NL user queries. However, most efforts were restricted to queries expressed in English, and they often benefited from the advancement of English NLP tools. However, little research has been done to support querying the Arabic content on the Semantic Web by using NL queries. This paper presents a domain-independent approach to translate Arabic NL queries to SPARQL by leveraging linguistic analysis. Based on a special consideration on Noun Phrases (NPs), our approach uses a language parser to extract NPs and the relations from Arabic parse trees and match them to the underlying ontology. It then utilizes knowledge in the ontology to group NPs into triple-based representations. A SPARQL query is finally generated by extracting targets and modifiers, and interpreting them into SPARQL. The interpretation of advanced semantic features including negation, conjunctive and disjunctive modifiers is also supported. The approach was evaluated by using two datasets consisting of OWL test data and queries, and the obtained results have confirmed its feasibility to translate Arabic NL queries to SPARQL.Comment: Journal Pape

    OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning

    Full text link
    This research presents a comprehensive methodology for utilizing an ontology-driven structured prompts system in interplay with ChatGPT, a widely used large language model (LLM). The study develops formal models, both information and functional, and establishes the methodological foundations for integrating ontology-driven prompts with ChatGPT's meta-learning capabilities. The resulting productive triad comprises the methodological foundations, advanced information technology, and the OntoChatGPT system, which collectively enhance the effectiveness and performance of chatbot systems. The implementation of this technology is demonstrated using the Ukrainian language within the domain of rehabilitation. By applying the proposed methodology, the OntoChatGPT system effectively extracts entities from contexts, classifies them, and generates relevant responses. The study highlights the versatility of the methodology, emphasizing its applicability not only to ChatGPT but also to other chatbot systems based on LLMs, such as Google's Bard utilizing the PaLM 2 LLM. The underlying principles of meta-learning, structured prompts, and ontology-driven information retrieval form the core of the proposed methodology, enabling their adaptation and utilization in various LLM-based systems. This versatile approach opens up new possibilities for NLP and dialogue systems, empowering developers to enhance the performance and functionality of chatbot systems across different domains and languages.Comment: 14 pages, 1 figure. Published. International Journal of Computing, 22(2), 170-183. https://doi.org/10.47839/ijc.22.2.308

    Linked Data Entity Summarization

    Get PDF
    On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion

    선박 및 특허 정보에 대한 RDF 데이터 관리

    Get PDF
    The Resource Description Framework (RDF) is widely used to represent information in the Web. Efforts have been made to map RDF data to a relational representation, and this method has been adopted by several systems. RDF is queried using SPARQL, a standard W3C-recommended query language used to query graph and represent data as RDF triples. A set of tools and technologies are implemented and tested using SPARQL and Apache Jena Fuseki as RDF triplestore. The increasing size of RDF data requires the storage and query of an efficient system. These organizations’ framework, in the aspect of designing, analyzing, query optimization, storage and processing is required for efficient retrieval of RDF data. Together, SPARQL and RDF make it easier to merge results from multiple data sources. This thesis provides an overview of the method to manage data based on existing data of two organizations such as Port-MIS and KIPRIS. The RDF model is designed to enable web-based representation, information exchange and yet to suggest a promising direction for future research.Abstract viii Chapter 1 Introduction 1 1.1 Background of Research 1 1.2 Research Objectives 1 1.3 Organization of Thesis 2 Chapter 2 Literature Review 4 2.1 Semantic Web 4 2.2 Linked Open Data 5 2.3 RDF and RDF Schema 8 2.4 SPARQL 11 2.5 Apache Jena Fuseki as RDF Triplestore 12 Chapter 3 RDF Schema Design 14 3.1 RDFS for Vessels 14 3.1.1 Vessel Information Structure 14 3.1.2 Vessel Information Structure Based on Port-MIS 18 3.1.3 RDF Sample Syntax of Port-MIS 21 3.2 RDFS for Patent Data 23 3.2.1 Patent Data Structure 24 3.2.2 Patent Data Structure Based on KIPRIS 28 3.2.3 RDF Sample Syntax of KIPRIS 31 Chapter 4 Implementation and Testing 34 4.1 System Architecture 34 4.2 Data Processing Structure 35 4.3 SPARQL Queries 36 Chapter 5 Conclusion and Further Work 43 References 44 Acknowledgement 48Maste
    corecore