Search CORE

5 research outputs found

Representing Standard Text Formulations as Directed Graphs

Author: Heid Ulrich (Prof. Dr.)
Josi Frieda
Wartena Christian (Prof. Dr.)
Publication venue: Hannover : Hochschule Hannover
Publication date: 01/01/2021
Field of study

In order to ensure validity in legal texts like contracts and case law, lawyers rely on standardised formulations that are written carefully but also represent a kind of code with a meaning and function known to all legal experts. Using directed (acyclic) graphs to represent standardized text fragments, we are able to capture variations concerning time specifications, slight rephrasings, names, places and also OCR errors. We show how we can find such text fragments by sentence clustering, pattern detection and clustering patterns. To test the proposed methods, we use two corpora of German contracts and court decisions, specially compiled for this purpose. However, the entire process for representing standardised text fragments is language-agnostic. We analyze and compare both corpora and give an quantitative and qualitative analysis of the text fragments found and present a number of examples from both corpora

Server für wissenschaftliche Schriften der Hochschule Hannover

Generalisierung von formelhaften Textbestandteilen in juristischen Korpora: Einsatz- und Entwicklungspotential

Author: Heid Ulrich
Josi Frieda
Wartena Christian (Prof. Dr.)
Publication venue: Hannover : Hochschule Hannover
Publication date: 01/01/2022
Field of study

Generalisierte Rechtsdokumente, bei denen für die individuellen Ausprägungen eines Vertrages die Positionen im Text bekannt sind, können eingesetzt werden, um erstens das Genehmigungsverfahren von Neuverträgen automatisiert zu unterstützen und zweitens als Vertragsgenerator neue Rechtsdokumente vorausgewählt zur Verfügung zu stellen. In diesem Beitrag wird, mithilfe von bekannten juristischen Texten gezeigt, wie formelhafte Textabschnitte identifiziert und häufige individuelle Ausprägungen klassifiziert werden können, um als Musterabschnitte eingesetzt zu werden. Es werden Einsatzbereiche vorgestellt und vorhandenes Potential für Legal Tech-Anwendungen aufgezeigt

Server für wissenschaftliche Schriften der Hochschule Hannover

Detecting Paraphrases of Standard Clause Titles in Insurance Contracts

Author: Heid Ulrich
Josi Frieda
Wartena Christian (Prof. Dr.)
Publication venue: Hannover : Hochschule Hannover
Publication date: 01/01/2019
Field of study

For the analysis of contract texts, validated model texts, such as model clauses, can be used to identify used contract clauses. This paper investigates how the similarity between titles of model clauses and headings extracted from contracts can be computed, and which similarity measure is most suitable for this. For the calculation of the similarities between title pairs we tested various variants of string similarity and token based similarity. We also compare two additional semantic similarity measures based on word embeddings using pre-trained embeddings and word embeddings trained on contract texts. The identification of the model clause title can be used as a starting point for the mapping of clauses found in contracts to verified clauses

Crossref

Server für wissenschaftliche Schriften der Hochschule Hannover

Preparing Legal Documents for NLP Analysis: Improving the Classification of Text Elements by Using Page Features

Author: Heid Ulrich
Josi Frieda
Wartena Christian (Prof. Dr.)
Publication venue: Hannover : Hochschule Hannover
Publication date: 01/01/2022
Field of study

Legal documents often have a complex layout with many different headings, headers and footers, side notes, etc. For the further processing, it is important to extract these individual components correctly from a legally binding document, for example a signed PDF. A common approach to do so is to classify each (text) region of a page using its geometric and textual features. This approach works well, when the training and test data have a similar structure and when the documents of a collection to be analyzed have a rather uniform layout. We show that the use of global page properties can improve the accuracy of text element classification: we first classify each page into one of three layout types. After that, we can train a classifier for each of the three page types and thereby improve the accuracy on a manually annotated collection of 70 legal documents consisting of 20,938 text elements. When we split by page type, we achieve an improvement from 0.95 to 0.98 for single-column pages with left marginalia and from 0.95 to 0.96 for double-column pages. We developed our own feature-based method for page layout detection, which we benchmark against a standard implementation of a CNN image classifier. The approach presented here is based on corpus of freely available German contracts and general terms and conditions. Both the corpus and all manual annotations are made freely available. The method is language agnostic

Server für wissenschaftliche Schriften der Hochschule Hannover

Organic Chemical Compounds From Morpho-Semantics to SMILES Strings and Classes (Web Version)

Author: Betreuer Prof
Dr. Uwe Reyle
Dr. Uwe Reyle
Gerhard Kremer
Hd Dr
Prüfer Prof
Stefanie Anstein
Ulrich Heid
Publication venue
Publication date
Field of study

£ £ £ £ £ £ Universität £ £

CiteSeerX