Search CORE

24,744 research outputs found

Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction

Author: Bian Jiang
Cao Wei
Xu Wei
Zheng Shun
Publication venue
Publication date: 01/01/2019
Field of study

Most existing event extraction (EE) methods merely extract event arguments within the sentence scope. However, such sentence-level EE methods struggle to handle soaring amounts of documents from emerging applications, such as finance, legislation, health, etc., where event arguments always scatter across different sentences, and even multiple such event mentions frequently co-exist in the same document. To address these challenges, we propose a novel end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic graph to fulfill the document-level EE (DEE) effectively. Moreover, we reformalize a DEE task with the no-trigger-words design to ease the document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we build a large-scale real-world dataset consisting of Chinese financial announcements with the challenges mentioned above. Extensive experiments with comprehensive analyses illustrate the superiority of Doc2EDAG over state-of-the-art methods. Data and codes can be found at https://github.com/dolphin-zs/Doc2EDAG.Comment: Accepted by EMNLP 201

arXiv.org e-Print Archive

Crossref

MultiLegalPile: A 689GB Multilingual Legal Corpus

Author: Chalkidis Ilias
Ho Daniel E
Matoshi Veton
Niklaus Joël
Stürmer Matthias
Publication venue: Cornell University
Publication date: 03/06/2023
Field of study

Large, high-quality datasets are crucial for training Large Language Models (LLMs). However, so far, there are few datasets available for specialized critical domains such as law and the available ones are often only for the English language. We curate and release MULTILEGALPILE, a 689GB corpus in 24 languages from 17 jurisdictions. The MULTILEGALPILE corpus, which includes diverse legal data sources with varying licenses, allows for pretraining NLP models under fair use, with more permissive licenses for the Eurlex Resources and Legal mC4 subsets. We pretrain two RoBERTa models and one Longformer multilingually, and 24 monolingual models on each of the language-specific subsets and evaluate them on LEXTREME. Additionally, we evaluate the English and multilingual models on LexGLUE. Our multilingual models set a new SotA on LEXTREME and our English models on LexGLUE. We release the dataset, the trained models, and all of the code under the most open possible licenses

Berner Fachhochschule: ARBOR

A Multilingual Approach to Identify and Classify Exceptional Measures against COVID-19

Author: Caselli Tommaso
De Saint-Phalle Eugenie
de Vries Wietse
Egger Clara
Tziafas Georgios
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

The COVID-19 pandemic has witnessed the implementations of exceptional measures by governments across the world to counteract its impact. This work presents the initial results of an on-going project, EXCEPTIUS, aiming to automatically identify, classify and com- pare exceptional measures against COVID-19 across 32 countries in Europe. To this goal, we created a corpus of legal documents with sentence-level annotations of eight different classes of exceptional measures that are im- plemented across these countries. We evalu- ated multiple multi-label classifiers on a manu- ally annotated corpus at sentence level. The XLM-RoBERTa model achieves highest per- formance on this multilingual multi-label clas- sification task, with a macro-average F1 score of 59.8%

Proceedings - University of Groningen

University of Groningen

ARTS repository - University of Groningen

EUR Research Repository

Dissertations of the University of Groningen

A Multilingual Approach to Identify and Classify Exceptional Measures against COVID-19

Author: Caselli Tommaso
De Saint-Phalle Eugenie
de Vries Wietse
Egger Clara
Tziafas Georgios
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

ARTS repository - University of Groningen

A Multilingual Approach to Identify and Classify Exceptional Measures against COVID-19

Author: Caselli Tommaso
De Saint-Phalle Eugenie
de Vries Wietse
Egger Clara
Tziafas Georgios
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

University of Groningen