540 research outputs found
Semantic Interpretation of User Queries for Question Answering on Interlinked Data
The Web of Data contains a wealth of knowledge belonging to a large number of domains. Retrieving data from such precious interlinked knowledge bases is an issue. By taking the structure of data into account, it is expected that upcoming generation of search engines is approaching to question answering systems, which directly answer user questions. But developing a question answering over these interlinked data sources is still challenging because of two inherent characteristics: First, different datasets employ heterogeneous schemas and each one may only contain a part of the answer for a certain question. Second, constructing a federated formal query across different datasets requires exploiting links between these datasets on both the schema and instance levels. In this respect, several challenges such as resource disambiguation, vocabulary mismatch, inference, link traversal are raised. In this dissertation, we address these challenges in order to build a question answering system for Linked Data. We present our question answering system Sina, which transforms user-supplied queries (i.e. either natural language queries or keyword queries) into conjunctive SPARQL queries over a set of interlinked data sources. The contributions of this work are as follows: 1. A novel approach for determining the most suitable resources for a user-supplied query from different datasets (disambiguation approach). We employed a Hidden Markov Model, whose parameters were bootstrapped with different distribution functions. 2. A novel method for constructing federated formal queries using the disambiguated resources and leveraging the linking structure of the underlying datasets. This approach essentially relies on a combination of domain and range inference as well as a link traversal method for constructing a connected graph, which ultimately renders a corresponding SPARQL query. 3. Regarding the problem of vocabulary mismatch, our contribution is divided into two parts, First, we introduce a number of new query expansion features based on semantic and linguistic inferencing over Linked Data. We evaluate the effectiveness of each feature individually as well as their combinations, employing Support Vector Machines and Decision Trees. Second, we propose a novel method for automatic query expansion, which employs a Hidden Markov Model to obtain the optimal tuples of derived words. 4. We provide two benchmarks for two different tasks to the community of question answering systems. The first one is used for the task of question answering on interlinked datasets (i.e. federated queries over Linked Data). The second one is used for the vocabulary mismatch task. We evaluate the accuracy of our approach using measures like mean reciprocal rank, precision, recall, and F-measure on three interlinked life-science datasets as well as DBpedia. The results of our accuracy evaluation demonstrate the effectiveness of our approach. Moreover, we study the runtime of our approach in its sequential as well as parallel implementations and draw conclusions on the scalability of our approach on Linked Data
LEVERAGING BIBLIOGRAPHIC RDF DATA FOR KEYWORD PREDICTION WITH ASSOCIATION RULE MINING (ARM)
The Semantic Web ( Web 3.03.0) has been proposed as an efficient way to access the increasingly large amounts of data on the internet. The Linked Open Data Cloud project at present is the major effort to implement the concepts of the Seamtic Web, addressing the problems of in homogeneity and large data volumes. RKBExplorer is one of many repositories implementing Open Data and contains considerable bibliographic information. Th is paper discusses bibliographic data data, an important part of cloud data. Effective searching of bibliographic datasets can be a challenge as many of the papers residing in these databases do not have sufficient or comprehensive keyword information. In these cases however, a search engine based on RKBExplorer is only able to use information to retrieve papers based on author names and title of papers without keywords keywords. In this paper we attempt to address this problem by using the data mining algorithm Association Rule Mining (ARM ) to develop keywords based on features retrieved from Resource Description Framework (RDF) data within a bibliographic citation. We have demonstrate the applicability of this method for predicting missing keywords for bibliographic entries in several typical databases
Knowledge-infused and Consistent Complex Event Processing over Real-time and Persistent Streams
Emerging applications in Internet of Things (IoT) and Cyber-Physical Systems
(CPS) present novel challenges to Big Data platforms for performing online
analytics. Ubiquitous sensors from IoT deployments are able to generate data
streams at high velocity, that include information from a variety of domains,
and accumulate to large volumes on disk. Complex Event Processing (CEP) is
recognized as an important real-time computing paradigm for analyzing
continuous data streams. However, existing work on CEP is largely limited to
relational query processing, exposing two distinctive gaps for query
specification and execution: (1) infusing the relational query model with
higher level knowledge semantics, and (2) seamless query evaluation across
temporal spaces that span past, present and future events. These allow
accessible analytics over data streams having properties from different
disciplines, and help span the velocity (real-time) and volume (persistent)
dimensions. In this article, we introduce a Knowledge-infused CEP (X-CEP)
framework that provides domain-aware knowledge query constructs along with
temporal operators that allow end-to-end queries to span across real-time and
persistent streams. We translate this query model to efficient query execution
over online and offline data streams, proposing several optimizations to
mitigate the overheads introduced by evaluating semantic predicates and in
accessing high-volume historic data streams. The proposed X-CEP query model and
execution approaches are implemented in our prototype semantic CEP engine,
SCEPter. We validate our query model using domain-aware CEP queries from a
real-world Smart Power Grid application, and experimentally analyze the
benefits of our optimizations for executing these queries, using event streams
from a campus-microgrid IoT deployment.Comment: 34 pages, 16 figures, accepted in Future Generation Computer Systems,
October 27, 201
Sematch: semantic entity search from knowledge graph
As an increasing amount of the knowledge graph is published as Linked Open Data, semantic entity search is required to develop new applications. However, the use of structured query languages such as SPARQL is challenging for non-skilled users who need to master the query language as well as acquiring knowledge of the underlying ontology of Linked Data knowledge bases. In this article, we propose the Sematch framework for entity search in the knowledge graph that combines natural language query processing, entity linking, entity type linking and semantic similarity based query expansion. The system has been validated in a dataset and a prototype has been developed that translates natural language queries into SPARQL
ARTE: Automated Generation of Realistic Test Inputs for Web APIs
Automated test case generation for web APIs is a thriving research topic, where test cases are frequently derived from the API specification. However, this process is only partially automated since testers are usually obliged to manually set meaningful valid test inputs for each input parameter. In this article, we present ARTE, an approach for the automated extraction of realistic test data for web APIs from knowledge bases like DBpedia. Specifically, ARTE leverages the specification of the API parameters to automatically search for realistic test inputs using natural language processing, search-based, and knowledge extraction techniques. ARTE has been integrated into RESTest, an open-source testing framework for RESTful APIs, fully automating the test case generation process. Evaluation results on 140 operations from 48 real-world web APIs show that ARTE can efficiently generate realistic test inputs for 64.9% of the target parameters, outperforming the state-of-the-art approach SAIGEN (31.8%). More importantly, ARTE supported the generation of over twice as many valid API calls (57.3%) as random generation (20%) and SAIGEN (26%), leading to a higher failure detection capability and uncovering several real-world bugs. These results show the potential of ARTE for enhancing existing web API testing tools, achieving an unprecedented level of automationJunta de Andalucía APOLO (US-1264651)Junta de Andalucía EKIPMENT-PLUS (P18-FR-2895)Ministerio de Ciencia, Innovación y Universidades RTI2018-101204-B-C21 (HORATIO)Ministerio de Ciencia, Innovación y Universidades RED2018-102472-
Using Linguistic Analysis to Translate Arabic Natural Language Queries to SPARQL
The logic-based machine-understandable framework of the Semantic Web often
challenges naive users when they try to query ontology-based knowledge bases.
Existing research efforts have approached this problem by introducing Natural
Language (NL) interfaces to ontologies. These NL interfaces have the ability to
construct SPARQL queries based on NL user queries. However, most efforts were
restricted to queries expressed in English, and they often benefited from the
advancement of English NLP tools. However, little research has been done to
support querying the Arabic content on the Semantic Web by using NL queries.
This paper presents a domain-independent approach to translate Arabic NL
queries to SPARQL by leveraging linguistic analysis. Based on a special
consideration on Noun Phrases (NPs), our approach uses a language parser to
extract NPs and the relations from Arabic parse trees and match them to the
underlying ontology. It then utilizes knowledge in the ontology to group NPs
into triple-based representations. A SPARQL query is finally generated by
extracting targets and modifiers, and interpreting them into SPARQL. The
interpretation of advanced semantic features including negation, conjunctive
and disjunctive modifiers is also supported. The approach was evaluated by
using two datasets consisting of OWL test data and queries, and the obtained
results have confirmed its feasibility to translate Arabic NL queries to
SPARQL.Comment: Journal Pape
OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning
This research presents a comprehensive methodology for utilizing an
ontology-driven structured prompts system in interplay with ChatGPT, a widely
used large language model (LLM). The study develops formal models, both
information and functional, and establishes the methodological foundations for
integrating ontology-driven prompts with ChatGPT's meta-learning capabilities.
The resulting productive triad comprises the methodological foundations,
advanced information technology, and the OntoChatGPT system, which collectively
enhance the effectiveness and performance of chatbot systems. The
implementation of this technology is demonstrated using the Ukrainian language
within the domain of rehabilitation. By applying the proposed methodology, the
OntoChatGPT system effectively extracts entities from contexts, classifies
them, and generates relevant responses. The study highlights the versatility of
the methodology, emphasizing its applicability not only to ChatGPT but also to
other chatbot systems based on LLMs, such as Google's Bard utilizing the PaLM 2
LLM. The underlying principles of meta-learning, structured prompts, and
ontology-driven information retrieval form the core of the proposed
methodology, enabling their adaptation and utilization in various LLM-based
systems. This versatile approach opens up new possibilities for NLP and
dialogue systems, empowering developers to enhance the performance and
functionality of chatbot systems across different domains and languages.Comment: 14 pages, 1 figure. Published. International Journal of Computing,
22(2), 170-183. https://doi.org/10.47839/ijc.22.2.308
Linked Data Entity Summarization
On the Web, the amount of structured and Linked Data about entities is constantly growing. Descriptions of single entities often include thousands of statements and it becomes difficult to comprehend the data, unless a selection of the most relevant facts is provided. This doctoral thesis addresses the problem of Linked Data entity summarization. The contributions involve two entity summarization approaches, a common API for entity summarization, and an approach for entity data fusion
선박 및 특허 정보에 대한 RDF 데이터 관리
The Resource Description Framework (RDF) is widely used to represent information in the Web. Efforts have been made to map RDF data to a relational representation, and this method has been adopted by several systems. RDF is queried using SPARQL, a standard W3C-recommended query language used to query graph and represent data as RDF triples. A set of tools and technologies are implemented and tested using SPARQL and Apache Jena Fuseki as RDF triplestore. The increasing size of RDF data requires the storage and query of an efficient system. These organizations’ framework, in the aspect of designing, analyzing, query optimization, storage and processing is required for efficient retrieval of RDF data. Together, SPARQL and RDF make it easier to merge results from multiple data sources. This thesis provides an overview of the method to manage data based on existing data of two organizations such as Port-MIS and KIPRIS. The RDF model is designed to enable web-based representation, information exchange and yet to suggest a promising direction for future research.Abstract viii
Chapter 1 Introduction 1
1.1 Background of Research 1
1.2 Research Objectives 1
1.3 Organization of Thesis 2
Chapter 2 Literature Review 4
2.1 Semantic Web 4
2.2 Linked Open Data 5
2.3 RDF and RDF Schema 8
2.4 SPARQL 11
2.5 Apache Jena Fuseki as RDF Triplestore 12
Chapter 3 RDF Schema Design 14
3.1 RDFS for Vessels 14
3.1.1 Vessel Information Structure 14
3.1.2 Vessel Information Structure Based on Port-MIS 18
3.1.3 RDF Sample Syntax of Port-MIS 21
3.2 RDFS for Patent Data 23
3.2.1 Patent Data Structure 24
3.2.2 Patent Data Structure Based on KIPRIS 28
3.2.3 RDF Sample Syntax of KIPRIS 31
Chapter 4 Implementation and Testing 34
4.1 System Architecture 34
4.2 Data Processing Structure 35
4.3 SPARQL Queries 36
Chapter 5 Conclusion and Further Work 43
References 44
Acknowledgement 48Maste
- …