Understanding events in texts is a core objective of natural language
understanding, which requires detecting event occurrences, extracting event
arguments, and analyzing inter-event relationships. However, due to the
annotation challenges brought by task complexity, a large-scale dataset
covering the full process of event understanding has long been absent. In this
paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event
argument annotations, making the first all-in-one dataset supporting event
detection, event argument extraction (EAE), and event relation extraction. As
an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive
schema covering 162 event types and 612 argument roles, all with expert-written
definitions and examples; (2) a large data scale, containing 98,591 events and
290,613 arguments obtained with laborious human annotation; (3) the exhaustive
annotation supporting all task variants of EAE, which annotates both entity and
non-entity event arguments in document level. Experiments indicate that
MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary
large language models (LLMs). Furthermore, to demonstrate the benefits of an
all-in-one dataset, we preliminarily explore a potential application, future
event prediction, with LLMs. MAVEN-Arg and our code can be obtained from
https://github.com/THU-KEG/MAVEN-Argument.Comment: Working in progres