Search CORE

22 research outputs found

Anonymity at Risk? Assessing Re-Identification Capabilities of Large Language Models

Author: Niklaus Joel
Nyffenegger Alex
Stürmer Matthias
Publication venue
Publication date: 21/08/2023
Field of study

Anonymity of both natural and legal persons in court rulings is a critical aspect of privacy protection in the European Union and Switzerland. With the advent of LLMs, concerns about large-scale re-identification of anonymized persons are growing. In accordance with the Federal Supreme Court of Switzerland, we explore the potential of LLMs to re-identify individuals in court rulings by constructing a proof-of-concept using actual legal data from the Swiss federal supreme court. Following the initial experiment, we constructed an anonymized Wikipedia dataset as a more rigorous testing ground to further investigate the findings. With the introduction and application of the new task of re-identifying people in texts, we also introduce new metrics to measure performance. We systematically analyze the factors that influence successful re-identifications, identifying model size, input length, and instruction tuning among the most critical determinants. Despite high re-identification rates on Wikipedia, even the best LLMs struggled with court decisions. The complexity is attributed to the lack of test datasets, the necessity for substantial training resources, and data sparsity in the information used for re-identification. In conclusion, this study demonstrates that re-identification using LLMs may not be feasible for now, but as the proof-of-concept on Wikipedia showed, it might become possible in the future. We hope that our system can help enhance the confidence in the security of anonymized decisions, thus leading to the courts being more confident to publish decisions

arXiv.org e-Print Archive

MultiLegalSBD: A Multilingual Legal Sentence Boundary Detection Dataset

Author: Brugger Tobias
Niklaus Joel
Stürmer Matthias
Publication venue
Publication date: 02/05/2023
Field of study

Sentence Boundary Detection (SBD) is one of the foundational building blocks of Natural Language Processing (NLP), with incorrectly split sentences heavily influencing the output quality of downstream tasks. It is a challenging task for algorithms, especially in the legal domain, considering the complex and different sentence structures used. In this work, we curated a diverse multilingual legal dataset consisting of over 130'000 annotated sentences in 6 languages. Our experimental results indicate that the performance of existing SBD models is subpar on multilingual legal data. We trained and tested monolingual and multilingual models based on CRF, BiLSTM-CRF, and transformers, demonstrating state-of-the-art performance. We also show that our multilingual models outperform all baselines in the zero-shot setting on a Portuguese test set. To encourage further research and development by the community, we have made our dataset, models, and code publicly available.Comment: Accepted at ICAIL 202

arXiv.org e-Print Archive

Resolving Legalese: A Multilingual Exploration of Negation Scope Resolution in Legal Documents

Author: Christen Ramona
Niklaus Joel
Shaitarova Anastassia
Stürmer Matthias
Publication venue
Publication date: 15/09/2023
Field of study

Resolving the scope of a negation within a sentence is a challenging NLP task. The complexity of legal texts and the lack of annotated in-domain negation corpora pose challenges for state-of-the-art (SotA) models when performing negation scope resolution on multilingual legal data. Our experiments demonstrate that models pre-trained without legal data underperform in the task of negation scope resolution. Our experiments, using language models exclusively fine-tuned on domains like literary texts and medical data, yield inferior results compared to the outcomes documented in prior cross-domain experiments. We release a new set of annotated court decisions in German, French, and Italian and use it to improve negation scope resolution in both zero-shot and multilingual settings. We achieve token-level F1-scores of up to 86.7% in our zero-shot cross-lingual experiments, where the models are trained on two languages of our legal datasets and evaluated on the third. Our multilingual experiments, where the models were trained on all available negation data and evaluated on our legal datasets, resulted in F1-scores of up to 91.1%

arXiv.org e-Print Archive

Survey of Artificial Intelligence for Card Games and Its Application to the Swiss Game Jass

Author: Alberti Michele
Ingold Rolf
Liwicki Marcus
Niklaus Joel
Pondenkandath Vinaychandran
Publication venue
Publication date: 11/06/2019
Field of study

In the last decades we have witnessed the success of applications of Artificial Intelligence to playing games. In this work we address the challenging field of games with hidden information and card games in particular. Jass is a very popular card game in Switzerland and is closely connected with Swiss culture. To the best of our knowledge, performances of Artificial Intelligence agents in the game of Jass do not outperform top players yet. Our contribution to the community is two-fold. First, we provide an overview of the current state-of-the-art of Artificial Intelligence methods for card games in general. Second, we discuss their application to the use-case of the Swiss card game Jass. This paper aims to be an entry point for both seasoned researchers and new practitioners who want to join in the Jass challenge

arXiv.org e-Print Archive

Crossref

Berner Fachhochschule: ARBOR

ClassActionPrediction: A Challenging Benchmark for Legal Judgment Prediction of Class Action Cases in the US

Author: Bernsohn Dor
Hagag Ben
Hayat Gila
Niklaus Joel
Semo Gil
Publication venue
Publication date: 01/11/2022
Field of study

The research field of Legal Natural Language Processing (NLP) has been very active recently, with Legal Judgment Prediction (LJP) becoming one of the most extensively studied tasks. To date, most publicly released LJP datasets originate from countries with civil law. In this work, we release, for the first time, a challenging LJP dataset focused on class action cases in the US. It is the first dataset in the common law system that focuses on the harder and more realistic task involving the complaints as input instead of the often used facts summary written by the court. Additionally, we study the difficulty of the task by collecting expert human predictions, showing that even human experts can only reach 53% accuracy on this dataset. Our Longformer model clearly outperforms the human baseline (63%), despite only considering the first 2,048 tokens. Furthermore, we perform a detailed error analysis and find that the Longformer model is significantly better calibrated than the human experts. Finally, we publicly release the dataset and the code used for the experiments

arXiv.org e-Print Archive

Berner Fachhochschule: ARBOR

SCALE: Scaling up the Complexity for Advanced Language Model Evaluation

Author: Chalkidis Ilias
Ho Daniel E.
Matoshi Veton
Niklaus Joel
Rasiah Vishvaksenan
Stern Ronja
Stürmer Matthias
Publication venue
Publication date: 01/09/2023
Field of study

Recent strides in Large Language Models (LLMs) have saturated many NLP benchmarks (even professional domain-specific ones), emphasizing the need for novel, more challenging novel ones to properly assess LLM capabilities. In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks). Our benchmark comprises diverse legal NLP datasets from the Swiss legal system, allowing for a comprehensive study of the underlying Non-English, inherently multilingual, federal legal system. Despite recent advances, efficiently processing long documents for intense review/analysis tasks remains an open challenge for language models. Also, comprehensive, domain-specific benchmarks requiring high expertise to develop are rare, as are multilingual benchmarks. This scarcity underscores our contribution's value, considering most public models are trained predominantly on English corpora, while other languages remain understudied, particularly for practical domain-specific NLP tasks. Our benchmark allows for testing and advancing the state-of-the-art LLMs. As part of our study, we evaluate several pre-trained multilingual language models on our benchmark to establish strong baselines as a point of reference. Despite the large size of our datasets (tens to hundreds of thousands of examples), existing publicly available models struggle with most tasks, even after in-domain pretraining. We publish all resources (benchmark suite, pre-trained models, code) under a fully permissive open CC BY-SA license

arXiv.org e-Print Archive

Maintenance of leaf N controls the photosynthetic CO 2 response of grassland species exposed to 9 years of free-air CO 2 enrichment

Author: Ainsworth
Ainsworth
Anderson
BassiriRad
Blanke
Canadell
Clark
Craine
De Graaff
Dijkstra
Dukes
Egerton-Warburton
Ellsworth
Farquhar
Field
Finzi
Gamper
Gill
Grigal
Grünzweig
Hartnett
Hobbie
Holmes
Hungate
Hungate
Hunter
Högberg
Joel
Johnson
Knapp
Knops
Körner
Lavorel
Leadley
Lee
Lee
Luo
Luo
Lüscher
McMurtrie
Mikkelsen
Mohan
Morgan
Morgan
Morgan
Niklaus
Niklaus
Niklaus
Polley
Poorter
Poorter
Poorter
Potvin
Rastetter
Reich
Reich
Reich
Reich
Reich
Schimel
Schneider
Stevens
Suding
Suding
Teyssonneyre
Thomas
Tilman
Van Der Heijden
Vitousek
West
Wilson
Woodward
Yin
Zanetti
Zavaleta
Zavaleta
Publication venue: 'Wiley'
Publication date: 01/01/2010
Field of study

Determining underlying physiological patterns governing plant productivity and diversity in grasslands are critical to evaluate species responses to future environmental conditions of elevated CO 2 and nitrogen (N) deposition. In a 9-year experiment, N was added to monocultures of seven C 3 grassland species exposed to elevated atmospheric CO 2 (560 μmol CO 2 mol −1 ) to evaluate how N addition affects CO 2 responsiveness in species of contrasting functional groups. Functional groups differed in their responses to elevated CO 2 and N treatments. Forb species exhibited strong down-regulation of leaf N mass concentrations (−26%) and photosynthetic capacity (−28%) in response to elevated CO 2 , especially at high N supply, whereas C 3 grasses did not. Hence, achieved photosynthetic performance was markedly enhanced for C 3 grasses (+68%) in elevated CO 2 , but not significantly for forbs. Differences in access to soil resources between forbs and grasses may distinguish their responses to elevated CO 2 and N addition. Forbs had lesser root biomass, a lower distribution of biomass to roots, and lower specific root length than grasses. Maintenance of leaf N, possibly through increased root foraging in this nutrient-poor grassland, was necessary to sustain stimulation of photosynthesis under long-term elevated CO 2 . Dilution of leaf N and associated photosynthetic down-regulation in forbs under elevated [CO 2 ], relative to the C 3 grasses, illustrates the potential for shifts in species composition and diversity in grassland ecosystems that have significant forb and grass components.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/78679/1/j.1365-2486.2009.02058.x.pd

Crossref

The Australian National University

Western Sydney ResearchDirect

Deep Blue Documents at the University of Michigan

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.Comment: 143 pages, 79 tables, 4 figure

arXiv.org e-Print Archive

LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

Author: Chilton Adam
Chohlas-Wood Alex
Choi Jonathan H.
Dickinson Gregory M.
Fagan Frank
Gandhi Sunny
Gao Shang
Goel Sharad
Guha Neel
Hagan Margaret
Hegland Jason
Henderson Peter
Ho Daniel E.
Holzenberger Nils
Hoque Enam
Iyer Varun
Kolt Noam
Li Zehua
Livermore Michael A.
Ma Megan
Narayana Aditya
Nay John
Niklaus Joel
Nudell Joe
Nyarko Julian
Peters Austin
Porat Haggai
Rasumov-Rahe Nikon
Rehaag Sean
Rockmore Daniel
Ré Christopher
Sarfaty Galit
Surani Faiz
Talisman Dmitry
Tobia Kevin
Waldon Brandon
Williams Spencer
Wu Jessica
Zambrano Diego A.
Zur Tom
Publication venue: Osgoode Digital Commons
Publication date: 26/09/2023
Field of study

The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning—which distinguish between its many forms—correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables

York University, Osgoode Hall Law School

Re-Identifizierung in Gerichtsurteilen mit Simap Daten

Author: Chodup Magda
Kettiger Daniel
Lüthi Thomas
Niklaus Joel
Publication venue: Zenodo
Publication date: 22/10/2023
Field of study

<p>Die digitale Transformation erreicht nach und nach immer mehr Bereiche der Justiz. Bereits heute veröffentlichen viele Gerichte ihre Urteile in anonymisierter Form im Internet. Gleichzeitig werden technische Hilfsmittel, die auch zur Re-Identifikation dieser Urteile eingesetzt werden können, immer leistungsfähiger und ausgeklügelter. In der vorliegenden Untersuchung wurde im Bereich des öffentlichen Beschaffungswesens – durch ein vergleichsweise einfaches «String-Matching» mit Simap Projektnummern – eine Re- Identifikation von Verfahrensbeteiligten von bis zu 81.2 Prozent erreicht.</p&gt

ZENODO