101,387 research outputs found
Dynamic Global Memory for Document-level Argument Extraction
Extracting informative arguments of events from news articles is a
challenging problem in information extraction, which requires a global
contextual understanding of each document. While recent work on document-level
extraction has gone beyond single-sentence and increased the cross-sentence
inference capability of end-to-end models, they are still restricted by certain
input sequence length constraints and usually ignore the global context between
events. To tackle this issue, we introduce a new global neural generation-based
framework for document-level event argument extraction by constructing a
document memory store to record the contextual event information and leveraging
it to implicitly and explicitly help with decoding of arguments for later
events. Empirical results show that our framework outperforms prior methods
substantially and it is more robust to adversarially annotated examples with
our constrained decoding design. (Our code and resources are available at
https://github.com/xinyadu/memory_docie for research purpose.)Comment: ACL 2022 main conference (12 pages
Analysing Errors of Open Information Extraction Systems
We report results on benchmarking Open Information Extraction (OIE) systems
using RelVis, a toolkit for benchmarking Open Information Extraction systems.
Our comprehensive benchmark contains three data sets from the news domain and
one data set from Wikipedia with overall 4522 labeled sentences and 11243
binary or n-ary OIE relations. In our analysis on these data sets we compared
the performance of four popular OIE systems, ClausIE, OpenIE 4.2, Stanford
OpenIE and PredPatt. In addition, we evaluated the impact of five common error
classes on a subset of 749 n-ary tuples. From our deep analysis we unreveal
important research directions for a next generation of OIE systems.Comment: Accepted at Building Linguistically Generalizable NLP Systems at
EMNLP 201
ArgumenText: Argument Classification and Clustering in a Generalized Search Scenario
The ArgumenText project creates argument mining technology for big and heterogeneous data and aims to evaluate its use in real-world applications. The technology mines and clusters arguments from a variety of textual sources for a large range of topics and in multiple languages. Its main strength is its generalization to very different textual sources including web crawls, news data, or customer reviews. We validated the technology with a focus on supporting decisions in innovation management as well as customer feedback analysis. Along with its public argument search engine and API, ArgumenText has released multiple datasets for argument classification and clustering. This contribution outlines the major technology-related challenges and proposed solutions for the tasks of argument extraction from heterogeneous sources and argument clustering. It also lays out exemplary industry applications and remaining challenges
Towards Building a Knowledge Base of Monetary Transactions from a News Collection
We address the problem of extracting structured representations of economic
events from a large corpus of news articles, using a combination of natural
language processing and machine learning techniques. The developed techniques
allow for semi-automatic population of a financial knowledge base, which, in
turn, may be used to support a range of data mining and exploration tasks. The
key challenge we face in this domain is that the same event is often reported
multiple times, with varying correctness of details. We address this challenge
by first collecting all information pertinent to a given event from the entire
corpus, then considering all possible representations of the event, and
finally, using a supervised learning method, to rank these representations by
the associated confidence scores. A main innovative element of our approach is
that it jointly extracts and stores all attributes of the event as a single
representation (quintuple). Using a purpose-built test set we demonstrate that
our supervised learning approach can achieve 25% improvement in F1-score over
baseline methods that consider the earliest, the latest or the most frequent
reporting of the event.Comment: Proceedings of the 17th ACM/IEEE-CS Joint Conference on Digital
Libraries (JCDL '17), 201
- …