Search CORE

30 research outputs found

Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach

Author: Hou Lei
Li Juanzi
Yu Jifan
Zhang Zheyuan
Publication venue
Publication date: 18/10/2023
Field of study

Large Language Models (LLMs) have not only exhibited exceptional performance across various tasks, but also demonstrated sparks of intelligence. Recent studies have focused on assessing their capabilities on human exams and revealed their impressive competence in different domains. However, cognitive research on the overall knowledge structure of LLMs is still lacking. In this paper, based on educational diagnostic assessment method, we conduct an evaluation using MoocRadar, a meticulously annotated human test dataset based on Bloom Taxonomy. We aim to reveal the knowledge structures of LLMs and gain insights of their cognitive capabilities. This research emphasizes the significance of investigating LLMs' knowledge and understanding the disparate cognitive patterns of LLMs. By shedding light on models' knowledge, researchers can advance development and utilization of LLMs in a more informed and effective manner.Comment: Findings of EMNLP 2023 (Short Paper

arXiv.org e-Print Archive

WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models

Author: Bai Yushi
Hou Lei
Li Juanzi
Sun Yuliang
Tu Shangqing
Yu Jifan
Publication venue
Publication date: 13/11/2023
Field of study

To mitigate the potential misuse of large language models (LLMs), recent research has developed watermarking algorithms, which restrict the generation process to leave an invisible trace for watermark detection. Due to the two-stage nature of the task, most studies evaluate the generation and detection separately, thereby presenting a challenge in unbiased, thorough, and applicable evaluations. In this paper, we introduce WaterBench, the first comprehensive benchmark for LLM watermarks, in which we design three crucial factors: (1) For \textbf{benchmarking procedure}, to ensure an apples-to-apples comparison, we first adjust each watermarking method's hyper-parameter to reach the same watermarking strength, then jointly evaluate their generation and detection performance. (2) For \textbf{task selection}, we diversify the input and output length to form a five-category taxonomy, covering

9

tasks. (3) For \textbf{evaluation metric}, we adopt the GPT4-Judge for automatically evaluating the decline of instruction-following abilities after watermarking. We evaluate

4

open-source watermarks on

2

LLMs under

2

watermarking strengths and observe the common struggles for current methods on maintaining the generation quality. The code and data are available at \url{https://github.com/THU-KEG/WaterBench}.Comment: 22pages, 7 figure

arXiv.org e-Print Archive

KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding

Author: Cao Shulin
Hou Lei
Li Juanzi
Liu Yantao
Lv Xin
Yao Zijun
Yu Jifan
Publication venue
Publication date: 06/07/2023
Field of study

Deep text understanding, which requires the connections between a given document and prior knowledge beyond its text, has been highlighted by many benchmarks in recent years. However, these benchmarks have encountered two major limitations. On the one hand, most of them require human annotation of knowledge, which leads to limited knowledge coverage. On the other hand, they usually use choices or spans in the texts as the answers, which results in narrow answer space. To overcome these limitations, we build a new challenging benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has two advantages, i.e., broad knowledge coverage and flexible answer format. Specifically, we utilize massive knowledge bases to guide annotators or large language models (LLMs) to construct knowledgable questions. Moreover, we use labels in knowledge bases rather than spans or choices as the final answers. We test state-of-the-art models on KoRC and the experimental results show that the strongest baseline only achieves 68.3% and 30.0% F1 measure in the in-distribution and out-of-distribution test set, respectively. These results indicate that deep text understanding is still an unsolved challenge. The benchmark dataset, leaderboard, and baseline methods are released in https://github.com/THU-KEG/KoRC

arXiv.org e-Print Archive

ConstGCN: Constrained Transmission-based Graph Convolutional Networks for Document-level Relation Extraction

Author: Gao Qi
Hou Lei
Li Juanzi
Liu Jinxin
Qi Ji
Xu Bin
Yu Jifan
Zeng Kaisheng
Publication venue
Publication date: 08/10/2022
Field of study

Document-level relation extraction with graph neural networks faces a fundamental graph construction gap between training and inference - the golden graph structure only available during training, which causes that most methods adopt heuristic or syntactic rules to construct a prior graph as a pseudo proxy. In this paper, we propose

\textbf{ConstGCN}

, a novel graph convolutional network which performs knowledge-based information propagation between entities along with all specific relation spaces without any prior graph construction. Specifically, it updates the entity representation by aggregating information from all other entities along with each relation space, thus modeling the relation-aware spatial information. To control the information flow passing through the indeterminate relation spaces, we propose to constrain the propagation using transmitting scores learned from the Noise Contrastive Estimation between fact triples. Experimental results show that our method outperforms the previous state-of-the-art (SOTA) approaches on the DocRE dataset

arXiv.org e-Print Archive

Interactive Contrastive Learning for Self-supervised Entity Alignment

Author: Cao Yixin
Dong Zhenhao
Feng Ling
Hou Lei
Hu Minghao
Li Juanzi
Lv Xin
Yu Jifan
Zeng Kaisheng
Publication venue
Publication date: 10/10/2022
Field of study

Self-supervised entity alignment (EA) aims to link equivalent entities across different knowledge graphs (KGs) without seed alignments. The current SOTA self-supervised EA method draws inspiration from contrastive learning, originally designed in computer vision based on instance discrimination and contrastive loss, and suffers from two shortcomings. Firstly, it puts unidirectional emphasis on pushing sampled negative entities far away rather than pulling positively aligned pairs close, as is done in the well-established supervised EA. Secondly, KGs contain rich side information (e.g., entity description), and how to effectively leverage those information has not been adequately investigated in self-supervised EA. In this paper, we propose an interactive contrastive learning model for self-supervised EA. The model encodes not only structures and semantics of entities (including entity name, entity description, and entity neighborhood), but also conducts cross-KG contrastive learning by building pseudo-aligned entity pairs. Experimental results show that our approach outperforms previous best self-supervised results by a large margin (over 9% average improvement) and performs on par with previous SOTA supervised counterparts, demonstrating the effectiveness of the interactive contrastive learning for self-supervised EA.Comment: Accepted by CIKM 202

arXiv.org e-Print Archive

Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction

Author: Chen Yuxiang
Hou Lei
Li Juanzi
Liu Jinxin
Qi Ji
Sun Jiuding
Wang Xiaozhi
Xu Bin
Yu Jifan
Zeng Kaisheng
Zhang Chuchun
Publication venue
Publication date: 24/10/2023
Field of study

The robustness to distribution changes ensures that NLP models can be successfully applied in the realistic world, especially for information extraction tasks. However, most prior evaluation benchmarks have been devoted to validating pairwise matching correctness, ignoring the crucial measurement of robustness. In this paper, we present the first benchmark that simulates the evaluation of open information extraction models in the real world, where the syntactic and expressive distributions under the same knowledge meaning may drift variously. We design and annotate a large-scale testbed in which each example is a knowledge-invariant clique that consists of sentences with structured knowledge of the same meaning but with different syntactic and expressive forms. By further elaborating the robustness metric, a model is judged to be robust if its performance is consistently accurate on the overall cliques. We perform experiments on typical models published in the last decade as well as a popular large language model, the results show that the existing successful models exhibit a frustrating degradation, with a maximum drop of 23.43 F1 score. Our resources and code are available at https://github.com/qijimrc/ROBUST.Comment: Accepted by EMNLP 2023 Main Conferenc

arXiv.org e-Print Archive

VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering

Author: Cao Shulin
Chen Yuanyong
Hou Lei
Jin Hailong
Li Juanzi
Lv Xin
Xin Amy
Xu Jianjun
Yao Zijun
Yu Jifan
Zhang Peng
Publication venue
Publication date: 06/07/2023
Field of study

We present Visual Knowledge oriented Programming platform (VisKoP), a knowledge base question answering (KBQA) system that integrates human into the loop to edit and debug the knowledge base (KB) queries. VisKoP not only provides a neural program induction module, which converts natural language questions into knowledge oriented program language (KoPL), but also maps KoPL programs into graphical elements. KoPL programs can be edited with simple graphical operators, such as dragging to add knowledge operators and slot filling to designate operator arguments. Moreover, VisKoP provides auto-completion for its knowledge base schema and users can easily debug the KoPL program by checking its intermediate results. To facilitate the practical KBQA on a million-entity-level KB, we design a highly efficient KoPL execution engine for the back-end. Experiment results show that VisKoP is highly efficient and user interaction can fix a large portion of wrong KoPL programs to acquire the correct answer. The VisKoP online demo https://demoviskop.xlore.cn (Stable release of this paper) and https://viskop.xlore.cn (Beta release with new features), highly efficient KoPL engine https://pypi.org/project/kopl-engine, and screencast video https://youtu.be/zAbJtxFPTXo are now publicly available

arXiv.org e-Print Archive

MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs

Author: Hou Lei
Li Juanzi
Li Manli
Li Xiaoya
Liao Zhengshan
Lu Mengying
Tang Jie
Tu Shangqing
Yao Zijun
Yu Jifan
Zheng Hai-Tao
Zhong Qingyang
Publication venue
Publication date: 04/04/2023
Field of study

Student modeling, the task of inferring a student's learning characteristics through their interactions with coursework, is a fundamental issue in intelligent education. Although the recent attempts from knowledge tracing and cognitive diagnosis propose several promising directions for improving the usability and effectiveness of current models, the existing public datasets are still insufficient to meet the need for these potential solutions due to their ignorance of complete exercising contexts, fine-grained concepts, and cognitive labels. In this paper, we present MoocRadar, a fine-grained, multi-aspect knowledge repository consisting of 2,513 exercise questions, 5,600 knowledge concepts, and over 12 million behavioral records. Specifically, we propose a framework to guarantee a high-quality and comprehensive annotation of fine-grained concepts and cognitive labels. The statistical and experimental results indicate that our dataset provides the basis for the future improvements of existing methods. Moreover, to support the convenient usage for researchers, we release a set of tools for data querying, model adaption, and even the extension of our repository, which are now available at https://github.com/THU-KEG/MOOC-Radar.Comment: Accepted by SIGIR 202

arXiv.org e-Print Archive

CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models

Author: Chen Zhuang
Dong Yuxiao
Hou Wenjing
Huang Minlie
Huang Yongkang
Peng Libiao
Sabour Sahand
Song Yi
Tang Jie
Wan Dazhen
Wen Bosi
Xiao Xiyao
Yang Jiaming
Yu Jifan
Zhang Xiaohan
Zhang Yijia
Zhou Jinfeng
Publication venue
Publication date: 28/11/2023
Field of study

In this paper, we present CharacterGLM, a series of models built upon ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM is designed for generating Character-based Dialogues (CharacterDial), which aims to equip a conversational AI system with character customization for satisfying people's inherent social desires and emotional needs. On top of CharacterGLM, we can customize various AI characters or social agents by configuring their attributes (identities, interests, viewpoints, experiences, achievements, social relationships, etc.) and behaviors (linguistic features, emotional expressions, interaction patterns, etc.). Our model outperforms most mainstream close-source large langauge models, including the GPT series, especially in terms of consistency, human-likeness, and engagement according to manual evaluations. We will release our 6B version of CharacterGLM and a subset of training data to facilitate further research development in the direction of character-based dialogue generation.Comment: Work in progres

arXiv.org e-Print Archive