275,182 research outputs found

    CORECODE: A Common Sense Annotated Dialogue Dataset with Benchmark Tasks for Chinese Large Language Models

    Full text link
    As an indispensable ingredient of intelligence, commonsense reasoning is crucial for large language models (LLMs) in real-world scenarios. In this paper, we propose CORECODE, a dataset that contains abundant commonsense knowledge manually annotated on dyadic dialogues, to evaluate the commonsense reasoning and commonsense conflict detection capabilities of Chinese LLMs. We categorize commonsense knowledge in everyday conversations into three dimensions: entity, event, and social interaction. For easy and consistent annotation, we standardize the form of commonsense knowledge annotation in open-domain dialogues as "domain: slot = value". A total of 9 domains and 37 slots are defined to capture diverse commonsense knowledge. With these pre-defined domains and slots, we collect 76,787 commonsense knowledge annotations from 19,700 dialogues through crowdsourcing. To evaluate and enhance the commonsense reasoning capability for LLMs on the curated dataset, we establish a series of dialogue-level reasoning and detection tasks, including commonsense knowledge filling, commonsense knowledge generation, commonsense conflict phrase detection, domain identification, slot identification, and event causal inference. A wide variety of existing open-source Chinese LLMs are evaluated with these tasks on our dataset. Experimental results demonstrate that these models are not competent to predict CORECODE's plentiful reasoning content, and even ChatGPT could only achieve 0.275 and 0.084 accuracy on the domain identification and slot identification tasks under the zero-shot setting. We release the data and codes of CORECODE at https://github.com/danshi777/CORECODE to promote commonsense reasoning evaluation and study of LLMs in the context of daily conversations.Comment: AAAI 202

    Reasoning about Social Semantic Web Applications using String Similarity and Frame Logic

    Get PDF
    Social semantic Web or Web 3.0 application gained major attention from academia and industry in recent times. Such applications try to take advantage of user supplied meta data, using ideas from the semantic Web initiative, in order to provide better services. An open problem is the formalization of such meta data, due to its complex and often inconsistent nature. A possible solution to inconsistencies are string similarity metrics which are explained and analyzed. A study of performance and applicability in a frame logic environment is conducted on the case of agent reasoning about multiple domains in TaOPis - a social semantic Web application for self-organizing communities. Results show that the NYSIIS metric yields surprisingly good results on Croatian words and phrases

    Are LLMs the Master of All Trades? : Exploring Domain-Agnostic Reasoning Skills of LLMs

    Full text link
    The potential of large language models (LLMs) to reason like humans has been a highly contested topic in Machine Learning communities. However, the reasoning abilities of humans are multifaceted and can be seen in various forms, including analogical, spatial and moral reasoning, among others. This fact raises the question whether LLMs can perform equally well across all these different domains. This research work aims to investigate the performance of LLMs on different reasoning tasks by conducting experiments that directly use or draw inspirations from existing datasets on analogical and spatial reasoning. Additionally, to evaluate the ability of LLMs to reason like human, their performance is evaluted on more open-ended, natural language questions. My findings indicate that LLMs excel at analogical and moral reasoning, yet struggle to perform as proficiently on spatial reasoning tasks. I believe these experiments are crucial for informing the future development of LLMs, particularly in contexts that require diverse reasoning proficiencies. By shedding light on the reasoning abilities of LLMs, this study aims to push forward our understanding of how they can better emulate the cognitive abilities of humans

    Multi-agent Confidential Abductive Reasoning

    Get PDF
    In the context of multi-agent hypothetical reasoning, agents typically have partial knowledge about their environments, and the union of such knowledge is still incomplete to represent the whole world. Thus, given a global query they collaborate with each other to make correct inferences and hypothesis, whilst maintaining global constraints. Most collaborative reasoning systems operate on the assumption that agents can share or communicate any information they have. However, in application domains like multi-agent systems for healthcare or distributed software agents for security policies in coalition networks, confidentiality of knowledge is an additional primary concern. These agents are required to collaborately compute consistent answers for a query whilst preserving their own private information. This paper addresses this issue showing how this dichotomy between "open communication" in collaborative reasoning and protection of confidentiality can be accommodated. We present a general-purpose distributed abductive logic programming system for multi-agent hypothetical reasoning with confidentiality. Specifically, the system computes consistent conditional answers for a query over a set of distributed normal logic programs with possibly unbound domains and arithmetic constraints, preserving the private information within the logic programs. A case study on security policy analysis in distributed coalition networks is described, as an example of many applications of this system

    Embedding expert systems in semi-formal domains : examining the boundaries of the knowledge base

    Get PDF
    This thesis examines the use of expert systems in semi-formal domains. The research identifies the main problems with semi-formal domains and proposes and evaluates a number of different solutions to them. The thesis considers the traditional approach to developing expert systems, which sees domains as being formal, and notes that it continuously faces problems that result from informal features of the problem domain. To circumvent these difficulties experience or other subjective qualities are often used but they are not supported by the traditional approach to design. The thesis examines the formal approach and compares it with a semiformal approach to designing expert systems which is heavily influenced by the socio-technical view of information systems. From this basis it examines a number of problems that limit the construction and use of knowledge bases in semi-formal domains. These limitations arise from the nature of the problem being tackled, in particular problems of natural language communication and tacit knowledge and also from the character of computer technology and the role it plays. The thesis explores the possible mismatch between a human user and the machine and models the various types of confusion that arise. The thesis describes a number of practical solutions to overcome the problems identified. These solutions are implemented in an expert system shell (PESYS), developed as part of the research. The resulting solutions, based on non-linear documents and other software tools that open up the reasoning of the system, support users of expert systems in examining the boundaries of the knowledge base to help them avoid and overcome any confusion that has arisen. In this way users are encouraged to use their own skills and experiences in conjunction with an expert system to successfully exploit this technology in semi-formal domains

    Decidability of ALCP(D) for concrete domains with the EHD-property

    Get PDF
    Reasoning for Description logics with concrete domains and w.r.t. general TBoxes easily becomes undecidable. For particular, restricted concrete domains decidablity can be regained. We introduce a novel way to integrate a concrete domain D into the well-known description logic ALC, we call the resulting logic ALCP(D). We then identify sufficient conditions on D that guarantee decidability of the satisfiability problem, even in the presence of general TBoxes. In particular, we show decidability of ALCP(D) for several domains over the integers, for which decidability was open. More generally, this result holds for all negation-closed concrete domains with the EHD-property, which stands for the existence of a homomorphism is definable. Such technique has recently been used to show decidability of CTL with local constraints over the integers

    KGQuiz: Evaluating the Generalization of Encoded Knowledge in Large Language Models

    Full text link
    Large language models (LLMs) demonstrate remarkable performance on knowledge-intensive tasks, suggesting that real-world knowledge is encoded in their model parameters. However, besides explorations on a few probing tasks in limited knowledge domains, it is not well understood how to evaluate LLMs' knowledge systematically and how well their knowledge abilities generalize, across a spectrum of knowledge domains and progressively complex task formats. To this end, we propose KGQuiz, a knowledge-intensive benchmark to comprehensively investigate the knowledge generalization abilities of LLMs. KGQuiz is a scalable framework constructed from triplet-based knowledge, which covers three knowledge domains and consists of five tasks with increasing complexity: true-or-false, multiple-choice QA, blank filling, factual editing, and open-ended knowledge generation. To gain a better understanding of LLMs' knowledge abilities and their generalization, we evaluate 10 open-source and black-box LLMs on the KGQuiz benchmark across the five knowledge-intensive tasks and knowledge domains. Extensive experiments demonstrate that LLMs achieve impressive performance in straightforward knowledge QA tasks, while settings and contexts requiring more complex reasoning or employing domain-specific facts still present significant challenges. We envision KGQuiz as a testbed to analyze such nuanced variations in performance across domains and task formats, and ultimately to understand, evaluate, and improve LLMs' knowledge abilities across a wide spectrum of knowledge domains and tasks
    • …
    corecore