275,182 research outputs found
CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models
As an indispensable ingredient of intelligence, commonsense reasoning is
crucial for large language models (LLMs) in real-world scenarios. In this
paper, we propose CORECODE, a dataset that contains abundant commonsense
knowledge manually annotated on dyadic dialogues, to evaluate the commonsense
reasoning and commonsense conflict detection capabilities of Chinese LLMs. We
categorize commonsense knowledge in everyday conversations into three
dimensions: entity, event, and social interaction. For easy and consistent
annotation, we standardize the form of commonsense knowledge annotation in
open-domain dialogues as "domain: slot = value". A total of 9 domains and 37
slots are defined to capture diverse commonsense knowledge. With these
pre-defined domains and slots, we collect 76,787 commonsense knowledge
annotations from 19,700 dialogues through crowdsourcing. To evaluate and
enhance the commonsense reasoning capability for LLMs on the curated dataset,
we establish a series of dialogue-level reasoning and detection tasks,
including commonsense knowledge filling, commonsense knowledge generation,
commonsense conflict phrase detection, domain identification, slot
identification, and event causal inference. A wide variety of existing
open-source Chinese LLMs are evaluated with these tasks on our dataset.
Experimental results demonstrate that these models are not competent to predict
CORECODE's plentiful reasoning content, and even ChatGPT could only achieve
0.275 and 0.084 accuracy on the domain identification and slot identification
tasks under the zero-shot setting. We release the data and codes of CORECODE at
https://github.com/danshi777/CORECODE to promote commonsense reasoning
evaluation and study of LLMs in the context of daily conversations.Comment: AAAI 202
Reasoning about Social Semantic Web Applications using String Similarity and Frame Logic
Social semantic Web or Web 3.0 application gained major attention from academia and industry in recent times. Such applications try to take advantage of user supplied meta data, using ideas from the semantic Web initiative, in order to provide better services. An open problem is the formalization of such meta data, due to its complex and often inconsistent nature. A possible solution to inconsistencies are string similarity metrics which are explained and analyzed. A study of performance and applicability in a frame logic environment is conducted on the case of agent reasoning about multiple domains in TaOPis - a social semantic Web application for self-organizing communities. Results show that the NYSIIS metric yields surprisingly good results on Croatian words and phrases
Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMs
The potential of large language models (LLMs) to reason like humans has been
a highly contested topic in Machine Learning communities. However, the
reasoning abilities of humans are multifaceted and can be seen in various
forms, including analogical, spatial and moral reasoning, among others. This
fact raises the question whether LLMs can perform equally well across all these
different domains. This research work aims to investigate the performance of
LLMs on different reasoning tasks by conducting experiments that directly use
or draw inspirations from existing datasets on analogical and spatial
reasoning. Additionally, to evaluate the ability of LLMs to reason like human,
their performance is evaluted on more open-ended, natural language questions.
My findings indicate that LLMs excel at analogical and moral reasoning, yet
struggle to perform as proficiently on spatial reasoning tasks. I believe these
experiments are crucial for informing the future development of LLMs,
particularly in contexts that require diverse reasoning proficiencies. By
shedding light on the reasoning abilities of LLMs, this study aims to push
forward our understanding of how they can better emulate the cognitive
abilities of humans
Multi-agent Confidential Abductive Reasoning
In the context of multi-agent hypothetical reasoning, agents typically have partial knowledge about their environments, and the union of such knowledge is still incomplete to represent the whole world. Thus, given a global query they collaborate with each other to make correct inferences and hypothesis, whilst maintaining global constraints. Most collaborative reasoning systems operate on the assumption that agents can share or communicate any information they have. However, in application domains like multi-agent systems for healthcare or distributed software agents for security policies in coalition networks, confidentiality of knowledge is an additional
primary concern. These agents are required to collaborately compute consistent answers for a query whilst preserving their own private information. This paper addresses this issue showing how this dichotomy between "open communication" in collaborative reasoning and protection of confidentiality can be accommodated. We present a general-purpose distributed abductive logic programming system for multi-agent hypothetical reasoning with confidentiality. Specifically, the system computes consistent conditional answers for a query over a set of distributed normal logic programs with possibly unbound domains and arithmetic constraints, preserving the private information within the logic programs. A case study on security policy analysis in distributed coalition networks is described, as an example of many applications of this system
Embedding expert systems in semi-formal domains : examining the boundaries of the knowledge base
This thesis examines the use of expert systems in semi-formal domains. The research identifies the main problems with semi-formal domains and proposes and evaluates a number of different solutions to them. The thesis considers the traditional approach to developing expert systems, which sees domains as being formal, and notes that it continuously faces problems that result from informal features of the problem domain. To circumvent these difficulties experience or other subjective qualities are often used but they are not supported by the traditional approach to design. The thesis examines the formal approach and compares it with a semiformal approach to designing expert systems which is heavily influenced by the socio-technical view of information systems. From this basis it examines a number of problems that limit the construction and use of knowledge bases in semi-formal domains. These limitations arise from the nature of the problem being tackled, in particular problems of natural language communication and tacit knowledge and also from the character of computer technology and the role it plays. The thesis explores the possible mismatch between a human user and the machine and models the various types of confusion that arise. The thesis describes a number of practical solutions to overcome the problems identified. These solutions are implemented in an expert system shell (PESYS), developed as part of the research. The resulting solutions, based on non-linear documents and other software tools that open up the reasoning of the system, support users of expert systems in examining the boundaries of the knowledge base to help them avoid and overcome any confusion that has arisen. In this way users are encouraged to use their own skills and experiences in conjunction with an expert system to successfully exploit this technology in semi-formal domains
Decidability of ALCP(D) for concrete domains with the EHD-property
Reasoning for Description logics with concrete domains and w.r.t. general TBoxes easily becomes undecidable. For particular, restricted concrete domains decidablity can be regained. We introduce a novel way to integrate a concrete domain D into the well-known description logic ALC, we call the resulting logic ALCP(D). We then identify sufficient conditions on D that guarantee decidability of the satisfiability problem, even in the presence of general TBoxes. In particular, we show decidability of ALCP(D) for several domains over the integers, for which decidability was open. More generally, this result holds for all negation-closed concrete domains with the EHD-property, which stands for the existence of a homomorphism is definable. Such technique has recently been used to show decidability of CTL with local constraints over the integers
KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models
Large language models (LLMs) demonstrate remarkable performance on
knowledge-intensive tasks, suggesting that real-world knowledge is encoded in
their model parameters. However, besides explorations on a few probing tasks in
limited knowledge domains, it is not well understood how to evaluate LLMs'
knowledge systematically and how well their knowledge abilities generalize,
across a spectrum of knowledge domains and progressively complex task formats.
To this end, we propose KGQuiz, a knowledge-intensive benchmark to
comprehensively investigate the knowledge generalization abilities of LLMs.
KGQuiz is a scalable framework constructed from triplet-based knowledge, which
covers three knowledge domains and consists of five tasks with increasing
complexity: true-or-false, multiple-choice QA, blank filling, factual editing,
and open-ended knowledge generation. To gain a better understanding of LLMs'
knowledge abilities and their generalization, we evaluate 10 open-source and
black-box LLMs on the KGQuiz benchmark across the five knowledge-intensive
tasks and knowledge domains. Extensive experiments demonstrate that LLMs
achieve impressive performance in straightforward knowledge QA tasks, while
settings and contexts requiring more complex reasoning or employing
domain-specific facts still present significant challenges. We envision KGQuiz
as a testbed to analyze such nuanced variations in performance across domains
and task formats, and ultimately to understand, evaluate, and improve LLMs'
knowledge abilities across a wide spectrum of knowledge domains and tasks
- …