299,156 research outputs found
Automatic Taxonomy Generation - A Use-Case in the Legal Domain
A key challenge in the legal domain is the adaptation and representation of
the legal knowledge expressed through texts, in order for legal practitioners
and researchers to access this information easier and faster to help with
compliance related issues. One way to approach this goal is in the form of a
taxonomy of legal concepts. While this task usually requires a manual
construction of terms and their relations by domain experts, this paper
describes a methodology to automatically generate a taxonomy of legal noun
concepts. We apply and compare two approaches on a corpus consisting of
statutory instruments for UK, Wales, Scotland and Northern Ireland laws.Comment: 9 page
Automatic extraction of knowledge from web documents
A large amount of digital information available is written as text documents in the form of web pages, reports, papers, emails, etc. Extracting the knowledge of interest from such documents from multiple sources in a timely fashion is therefore crucial. This paper provides an update on the Artequakt system which uses natural language tools to automatically extract knowledge about artists from multiple documents based on a predefined ontology. The ontology represents the type and form of knowledge to extract. This knowledge is then used to generate tailored biographies. The information extraction process of Artequakt is detailed and evaluated in this paper
A Web-Based Tool for Analysing Normative Documents in English
Our goal is to use formal methods to analyse normative documents written in
English, such as privacy policies and service-level agreements. This requires
the combination of a number of different elements, including information
extraction from natural language, formal languages for model representation,
and an interface for property specification and verification. We have worked on
a collection of components for this task: a natural language extraction tool, a
suitable formalism for representing such documents, an interface for building
models in this formalism, and methods for answering queries asked of a given
model. In this work, each of these concerns is brought together in a web-based
tool, providing a single interface for analysing normative texts in English.
Through the use of a running example, we describe each component and
demonstrate the workflow established by our tool
Doc2EDAG: An End-to-End Document-level Framework for Chinese Financial Event Extraction
Most existing event extraction (EE) methods merely extract event arguments
within the sentence scope. However, such sentence-level EE methods struggle to
handle soaring amounts of documents from emerging applications, such as
finance, legislation, health, etc., where event arguments always scatter across
different sentences, and even multiple such event mentions frequently co-exist
in the same document. To address these challenges, we propose a novel
end-to-end model, Doc2EDAG, which can generate an entity-based directed acyclic
graph to fulfill the document-level EE (DEE) effectively. Moreover, we
reformalize a DEE task with the no-trigger-words design to ease the
document-level event labeling. To demonstrate the effectiveness of Doc2EDAG, we
build a large-scale real-world dataset consisting of Chinese financial
announcements with the challenges mentioned above. Extensive experiments with
comprehensive analyses illustrate the superiority of Doc2EDAG over
state-of-the-art methods. Data and codes can be found at
https://github.com/dolphin-zs/Doc2EDAG.Comment: Accepted by EMNLP 201
- …