903 research outputs found
Opportunistic linked data querying through approximate membership metadata
Between URI dereferencing and the SPARQL protocol lies a largely unexplored axis of possible interfaces to Linked Data, each with its own combination of trade-offs. One of these interfaces is Triple Pattern Fragments, which allows clients to execute SPARQL queries against low-cost servers, at the cost of higher bandwidth. Increasing a client's efficiency means lowering the number of requests, which can among others be achieved through additional metadata in responses. We noted that typical SPARQL query evaluations against Triple Pattern Fragments require a significant portion of membership subqueries, which check the presence of a specific triple, rather than a variable pattern. This paper studies the impact of providing approximate membership functions, i.e., Bloom filters and Golomb-coded sets, as extra metadata. In addition to reducing HTTP requests, such functions allow to achieve full result recall earlier when temporarily allowing lower precision. Half of the tested queries from a WatDiv benchmark test set could be executed with up to a third fewer HTTP requests with only marginally higher server cost. Query times, however, did not improve, likely due to slower metadata generation and transfer. This indicates that approximate membership functions can partly improve the client-side query process with minimal impact on the server and its interface
OntoChatGPT Information System: Ontology-Driven Structured Prompts for ChatGPT Meta-Learning
This research presents a comprehensive methodology for utilizing an
ontology-driven structured prompts system in interplay with ChatGPT, a widely
used large language model (LLM). The study develops formal models, both
information and functional, and establishes the methodological foundations for
integrating ontology-driven prompts with ChatGPT's meta-learning capabilities.
The resulting productive triad comprises the methodological foundations,
advanced information technology, and the OntoChatGPT system, which collectively
enhance the effectiveness and performance of chatbot systems. The
implementation of this technology is demonstrated using the Ukrainian language
within the domain of rehabilitation. By applying the proposed methodology, the
OntoChatGPT system effectively extracts entities from contexts, classifies
them, and generates relevant responses. The study highlights the versatility of
the methodology, emphasizing its applicability not only to ChatGPT but also to
other chatbot systems based on LLMs, such as Google's Bard utilizing the PaLM 2
LLM. The underlying principles of meta-learning, structured prompts, and
ontology-driven information retrieval form the core of the proposed
methodology, enabling their adaptation and utilization in various LLM-based
systems. This versatile approach opens up new possibilities for NLP and
dialogue systems, empowering developers to enhance the performance and
functionality of chatbot systems across different domains and languages.Comment: 14 pages, 1 figure. Published. International Journal of Computing,
22(2), 170-183. https://doi.org/10.47839/ijc.22.2.308
A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model's Accuracy for Question Answering on Enterprise SQL Databases
Enterprise applications of Large Language Models (LLMs) hold promise for
question answering on enterprise SQL databases. However, the extent to which
LLMs can accurately respond to enterprise questions in such databases remains
unclear, given the absence of suitable Text-to-SQL benchmarks tailored to
enterprise settings. Additionally, the potential of Knowledge Graphs (KGs) to
enhance LLM-based question answering by providing business context is not well
understood. This study aims to evaluate the accuracy of LLM-powered question
answering systems in the context of enterprise questions and SQL databases,
while also exploring the role of knowledge graphs in improving accuracy. To
achieve this, we introduce a benchmark comprising an enterprise SQL schema in
the insurance domain, a range of enterprise queries encompassing reporting to
metrics, and a contextual layer incorporating an ontology and mappings that
define a knowledge graph. Our primary finding reveals that question answering
using GPT-4, with zero-shot prompts directly on SQL databases, achieves an
accuracy of 16%. Notably, this accuracy increases to 54% when questions are
posed over a Knowledge Graph representation of the enterprise SQL database.
Therefore, investing in Knowledge Graph provides higher accuracy for LLM
powered question answering systems.Comment: 34 page
Chain-of-Knowledge: Grounding Large Language Models via Dynamic Knowledge Adapting over Heterogeneous Sources
We present chain-of-knowledge (CoK), a novel framework that augments large
language models (LLMs) by dynamically incorporating grounding information from
heterogeneous sources. It results in more factual rationales and reduced
hallucination in generation. Specifically, CoK consists of three stages:
reasoning preparation, dynamic knowledge adapting, and answer consolidation.
Given a knowledge-intensive question, CoK first prepares several preliminary
rationales and answers while identifying the relevant knowledge domains. If
there is no majority consensus among the answers from samples, CoK corrects the
rationales step by step by adapting knowledge from the identified domains.
These corrected rationales can plausibly serve as a better foundation for the
final answer consolidation. Unlike prior studies that primarily use
unstructured data, CoK also leverages structured knowledge sources such as
Wikidata and tables that provide more reliable factual information. To access
both unstructured and structured knowledge sources in the dynamic knowledge
adapting stage, we propose an adaptive query generator that allows the
generation of queries for various types of query languages, including SPARQL,
SQL, and natural sentences. Moreover, to minimize error propagation between
rationales, CoK corrects the rationales progressively using preceding corrected
rationales to generate and correct subsequent rationales. Extensive experiments
show that CoK consistently improves the performance of LLMs on
knowledge-intensive tasks across different domains.Comment: Accepted by ICLR 202
Digital content management of Heet Sib Sorng custom for semantic search
This research presents the results of the alteration knowledgeable following the instruction of the digital content management by semantic technology development. The boundary of this research consists of the integration of the opportunities provided by the existing ontology and two datasets with the resources having different contents, including the datasets from the central library, physical and electronic data, and the development of semantical web approaches in ontology. In this research, research and development methodology is used and the data were acquired through literature review. This research shows that the ontology and applications evaluated are high level, and that there is an inclination for integrated systems oriented towards digital content management in the Thai custom semantic search system, which finally bring to the convergence of linked data and knowledge-based system
On Reasoning with RDF Statements about Statements using Singleton Property Triples
The Singleton Property (SP) approach has been proposed for representing and
querying metadata about RDF triples such as provenance, time, location, and
evidence. In this approach, one singleton property is created to uniquely
represent a relationship in a particular context, and in general, generates a
large property hierarchy in the schema. It has become the subject of important
questions from Semantic Web practitioners. Can an existing reasoner recognize
the singleton property triples? And how? If the singleton property triples
describe a data triple, then how can a reasoner infer this data triple from the
singleton property triples? Or would the large property hierarchy affect the
reasoners in some way? We address these questions in this paper and present our
study about the reasoning aspects of the singleton properties. We propose a
simple mechanism to enable existing reasoners to recognize the singleton
property triples, as well as to infer the data triples described by the
singleton property triples. We evaluate the effect of the singleton property
triples in the reasoning processes by comparing the performance on RDF datasets
with and without singleton properties. Our evaluation uses as benchmark the
LUBM datasets and the LUBM-SP datasets derived from LUBM with temporal
information added through singleton properties
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
SE-KGE: A Location-Aware Knowledge Graph Embedding Model for Geographic Question Answering and Spatial Semantic Lifting
Learning knowledge graph (KG) embeddings is an emerging technique for a
variety of downstream tasks such as summarization, link prediction, information
retrieval, and question answering. However, most existing KG embedding models
neglect space and, therefore, do not perform well when applied to (geo)spatial
data and tasks. For those models that consider space, most of them primarily
rely on some notions of distance. These models suffer from higher computational
complexity during training while still losing information beyond the relative
distance between entities. In this work, we propose a location-aware KG
embedding model called SE-KGE. It directly encodes spatial information such as
point coordinates or bounding boxes of geographic entities into the KG
embedding space. The resulting model is capable of handling different types of
spatial reasoning. We also construct a geographic knowledge graph as well as a
set of geographic query-answer pairs called DBGeo to evaluate the performance
of SE-KGE in comparison to multiple baselines. Evaluation results show that
SE-KGE outperforms these baselines on the DBGeo dataset for geographic logic
query answering task. This demonstrates the effectiveness of our
spatially-explicit model and the importance of considering the scale of
different geographic entities. Finally, we introduce a novel downstream task
called spatial semantic lifting which links an arbitrary location in the study
area to entities in the KG via some relations. Evaluation on DBGeo shows that
our model outperforms the baseline by a substantial margin.Comment: Accepted to Transactions in GI
- …