30 research outputs found
Exploring the Cognitive Knowledge Structure of Large Language Models: An Educational Diagnostic Assessment Approach
Large Language Models (LLMs) have not only exhibited exceptional performance
across various tasks, but also demonstrated sparks of intelligence. Recent
studies have focused on assessing their capabilities on human exams and
revealed their impressive competence in different domains. However, cognitive
research on the overall knowledge structure of LLMs is still lacking. In this
paper, based on educational diagnostic assessment method, we conduct an
evaluation using MoocRadar, a meticulously annotated human test dataset based
on Bloom Taxonomy. We aim to reveal the knowledge structures of LLMs and gain
insights of their cognitive capabilities. This research emphasizes the
significance of investigating LLMs' knowledge and understanding the disparate
cognitive patterns of LLMs. By shedding light on models' knowledge, researchers
can advance development and utilization of LLMs in a more informed and
effective manner.Comment: Findings of EMNLP 2023 (Short Paper
WaterBench: Towards Holistic Evaluation of Watermarks for Large Language Models
To mitigate the potential misuse of large language models (LLMs), recent
research has developed watermarking algorithms, which restrict the generation
process to leave an invisible trace for watermark detection. Due to the
two-stage nature of the task, most studies evaluate the generation and
detection separately, thereby presenting a challenge in unbiased, thorough, and
applicable evaluations. In this paper, we introduce WaterBench, the first
comprehensive benchmark for LLM watermarks, in which we design three crucial
factors: (1) For \textbf{benchmarking procedure}, to ensure an apples-to-apples
comparison, we first adjust each watermarking method's hyper-parameter to reach
the same watermarking strength, then jointly evaluate their generation and
detection performance. (2) For \textbf{task selection}, we diversify the input
and output length to form a five-category taxonomy, covering tasks. (3) For
\textbf{evaluation metric}, we adopt the GPT4-Judge for automatically
evaluating the decline of instruction-following abilities after watermarking.
We evaluate open-source watermarks on LLMs under watermarking
strengths and observe the common struggles for current methods on maintaining
the generation quality. The code and data are available at
\url{https://github.com/THU-KEG/WaterBench}.Comment: 22pages, 7 figure
KoRC: Knowledge oriented Reading Comprehension Benchmark for Deep Text Understanding
Deep text understanding, which requires the connections between a given
document and prior knowledge beyond its text, has been highlighted by many
benchmarks in recent years. However, these benchmarks have encountered two
major limitations. On the one hand, most of them require human annotation of
knowledge, which leads to limited knowledge coverage. On the other hand, they
usually use choices or spans in the texts as the answers, which results in
narrow answer space. To overcome these limitations, we build a new challenging
benchmark named KoRc in this paper. Compared with previous benchmarks, KoRC has
two advantages, i.e., broad knowledge coverage and flexible answer format.
Specifically, we utilize massive knowledge bases to guide annotators or large
language models (LLMs) to construct knowledgable questions. Moreover, we use
labels in knowledge bases rather than spans or choices as the final answers. We
test state-of-the-art models on KoRC and the experimental results show that the
strongest baseline only achieves 68.3% and 30.0% F1 measure in the
in-distribution and out-of-distribution test set, respectively. These results
indicate that deep text understanding is still an unsolved challenge. The
benchmark dataset, leaderboard, and baseline methods are released in
https://github.com/THU-KEG/KoRC
ConstGCN: Constrained Transmission-based Graph Convolutional Networks for Document-level Relation Extraction
Document-level relation extraction with graph neural networks faces a
fundamental graph construction gap between training and inference - the golden
graph structure only available during training, which causes that most methods
adopt heuristic or syntactic rules to construct a prior graph as a pseudo
proxy. In this paper, we propose , a novel graph
convolutional network which performs knowledge-based information propagation
between entities along with all specific relation spaces without any prior
graph construction. Specifically, it updates the entity representation by
aggregating information from all other entities along with each relation space,
thus modeling the relation-aware spatial information. To control the
information flow passing through the indeterminate relation spaces, we propose
to constrain the propagation using transmitting scores learned from the Noise
Contrastive Estimation between fact triples. Experimental results show that our
method outperforms the previous state-of-the-art (SOTA) approaches on the DocRE
dataset
Interactive Contrastive Learning for Self-supervised Entity Alignment
Self-supervised entity alignment (EA) aims to link equivalent entities across
different knowledge graphs (KGs) without seed alignments. The current SOTA
self-supervised EA method draws inspiration from contrastive learning,
originally designed in computer vision based on instance discrimination and
contrastive loss, and suffers from two shortcomings. Firstly, it puts
unidirectional emphasis on pushing sampled negative entities far away rather
than pulling positively aligned pairs close, as is done in the well-established
supervised EA. Secondly, KGs contain rich side information (e.g., entity
description), and how to effectively leverage those information has not been
adequately investigated in self-supervised EA. In this paper, we propose an
interactive contrastive learning model for self-supervised EA. The model
encodes not only structures and semantics of entities (including entity name,
entity description, and entity neighborhood), but also conducts cross-KG
contrastive learning by building pseudo-aligned entity pairs. Experimental
results show that our approach outperforms previous best self-supervised
results by a large margin (over 9% average improvement) and performs on par
with previous SOTA supervised counterparts, demonstrating the effectiveness of
the interactive contrastive learning for self-supervised EA.Comment: Accepted by CIKM 202
Preserving Knowledge Invariance: Rethinking Robustness Evaluation of Open Information Extraction
The robustness to distribution changes ensures that NLP models can be
successfully applied in the realistic world, especially for information
extraction tasks. However, most prior evaluation benchmarks have been devoted
to validating pairwise matching correctness, ignoring the crucial measurement
of robustness. In this paper, we present the first benchmark that simulates the
evaluation of open information extraction models in the real world, where the
syntactic and expressive distributions under the same knowledge meaning may
drift variously. We design and annotate a large-scale testbed in which each
example is a knowledge-invariant clique that consists of sentences with
structured knowledge of the same meaning but with different syntactic and
expressive forms. By further elaborating the robustness metric, a model is
judged to be robust if its performance is consistently accurate on the overall
cliques. We perform experiments on typical models published in the last decade
as well as a popular large language model, the results show that the existing
successful models exhibit a frustrating degradation, with a maximum drop of
23.43 F1 score. Our resources and code are available at
https://github.com/qijimrc/ROBUST.Comment: Accepted by EMNLP 2023 Main Conferenc
VisKoP: Visual Knowledge oriented Programming for Interactive Knowledge Base Question Answering
We present Visual Knowledge oriented Programming platform (VisKoP), a
knowledge base question answering (KBQA) system that integrates human into the
loop to edit and debug the knowledge base (KB) queries. VisKoP not only
provides a neural program induction module, which converts natural language
questions into knowledge oriented program language (KoPL), but also maps KoPL
programs into graphical elements. KoPL programs can be edited with simple
graphical operators, such as dragging to add knowledge operators and slot
filling to designate operator arguments. Moreover, VisKoP provides
auto-completion for its knowledge base schema and users can easily debug the
KoPL program by checking its intermediate results. To facilitate the practical
KBQA on a million-entity-level KB, we design a highly efficient KoPL execution
engine for the back-end. Experiment results show that VisKoP is highly
efficient and user interaction can fix a large portion of wrong KoPL programs
to acquire the correct answer. The VisKoP online demo
https://demoviskop.xlore.cn (Stable release of this paper) and
https://viskop.xlore.cn (Beta release with new features), highly efficient KoPL
engine https://pypi.org/project/kopl-engine, and screencast video
https://youtu.be/zAbJtxFPTXo are now publicly available
MoocRadar: A Fine-grained and Multi-aspect Knowledge Repository for Improving Cognitive Student Modeling in MOOCs
Student modeling, the task of inferring a student's learning characteristics
through their interactions with coursework, is a fundamental issue in
intelligent education. Although the recent attempts from knowledge tracing and
cognitive diagnosis propose several promising directions for improving the
usability and effectiveness of current models, the existing public datasets are
still insufficient to meet the need for these potential solutions due to their
ignorance of complete exercising contexts, fine-grained concepts, and cognitive
labels. In this paper, we present MoocRadar, a fine-grained, multi-aspect
knowledge repository consisting of 2,513 exercise questions, 5,600 knowledge
concepts, and over 12 million behavioral records. Specifically, we propose a
framework to guarantee a high-quality and comprehensive annotation of
fine-grained concepts and cognitive labels. The statistical and experimental
results indicate that our dataset provides the basis for the future
improvements of existing methods. Moreover, to support the convenient usage for
researchers, we release a set of tools for data querying, model adaption, and
even the extension of our repository, which are now available at
https://github.com/THU-KEG/MOOC-Radar.Comment: Accepted by SIGIR 202
CharacterGLM: Customizing Chinese Conversational AI Characters with Large Language Models
In this paper, we present CharacterGLM, a series of models built upon
ChatGLM, with model sizes ranging from 6B to 66B parameters. Our CharacterGLM
is designed for generating Character-based Dialogues (CharacterDial), which
aims to equip a conversational AI system with character customization for
satisfying people's inherent social desires and emotional needs. On top of
CharacterGLM, we can customize various AI characters or social agents by
configuring their attributes (identities, interests, viewpoints, experiences,
achievements, social relationships, etc.) and behaviors (linguistic features,
emotional expressions, interaction patterns, etc.). Our model outperforms most
mainstream close-source large langauge models, including the GPT series,
especially in terms of consistency, human-likeness, and engagement according to
manual evaluations. We will release our 6B version of CharacterGLM and a subset
of training data to facilitate further research development in the direction of
character-based dialogue generation.Comment: Work in progres