30 research outputs found

    Lextreme: A multi-lingual and multi-task benchmark for the legal domain

    Get PDF
    Lately, propelled by the phenomenal advances around the transformer architecture, the legal NLP field has enjoyed spectacular growth. To measure progress, well curated and challenging benchmarks are crucial. However, most benchmarks are English only and in legal NLP specifically there is no multilingual benchmark available yet. Additionally, many benchmarks are saturated, with the best models clearly outperforming the best humans and achieving near perfect scores. We survey the legal NLP literature and select 11 datasets covering 24 languages, creating LEXTREME. To provide a fair comparison, we propose two aggregate scores, one based on the datasets and one on the languages. The best baseline (XLM-R large) achieves both a dataset aggregate score a language aggregate score of 61.3. This indicates that LEXTREME is still very challenging and leaves ample room for improvement. To make it easy for researchers and practitioners to use, we release LEXTREME on huggingface together with all the code required to evaluate models and a public Weights and Biases project with all the runs

    MultiLegalPile: A 689GB Multilingual Legal Corpus

    Get PDF
    Large, high-quality datasets are crucial for training Large Language Models (LLMs). However, so far, there are few datasets available for specialized critical domains such as law and the available ones are often only for the English language. We curate and release MULTILEGALPILE, a 689GB corpus in 24 languages from 17 jurisdictions. The MULTILEGALPILE corpus, which includes diverse legal data sources with varying licenses, allows for pretraining NLP models under fair use, with more permissive licenses for the Eurlex Resources and Legal mC4 subsets. We pretrain two RoBERTa models and one Longformer multilingually, and 24 monolingual models on each of the language-specific subsets and evaluate them on LEXTREME. Additionally, we evaluate the English and multilingual models on LexGLUE. Our multilingual models set a new SotA on LEXTREME and our English models on LexGLUE. We release the dataset, the trained models, and all of the code under the most open possible licenses

    Introduction: Language in Contact: Yesterday – Today – Tomorrow

    Get PDF
    The symposium Language in Contact; Yesterday–Today–Tomorrow took place June 21–23, 2017 and was organized by The Graduate School Language & Literature Munich - Class of Language. Scholars using interdisciplinary approaches were invited to Munich and conveyed both traditional and innovative insights into the vast field of language contact. This included both diachronic (Yesterday) and synchronic contributions (Today) as well as papers discussing the future of contact linguistics (Tomorrow). At the symposium, language contact was defined in a broad sense as the language that emerges when speakers of different languages influence one another’s speech; this brought together multiple areas of linguistic study ranging from language change and language policy to language acquisition and language processing. Key to the conference was connecting what we can learn from past instances of language contact that will help us understand language phenomena in present and future research

    SCALE: Scaling up the Complexity for Advanced Language Model Evaluation

    Get PDF
    Recent strides in Large Language Models (LLMs) have saturated many NLP benchmarks (even professional domain-specific ones), emphasizing the need for novel, more challenging novel ones to properly assess LLM capabilities. In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks). Our benchmark comprises diverse legal NLP datasets from the Swiss legal system, allowing for a comprehensive study of the underlying Non-English, inherently multilingual, federal legal system. Despite recent advances, efficiently processing long documents for intense review/analysis tasks remains an open challenge for language models. Also, comprehensive, domain-specific benchmarks requiring high expertise to develop are rare, as are multilingual benchmarks. This scarcity underscores our contribution’s value, considering most public models are trained predominantly on English corpora, while other languages remain understudied, particularly for practical domain-specific NLP tasks. Our benchmark allows for testing and advancing the state-of-the-art LLMs. As part of our study, we evaluate several pre-trained multilingual language models on our benchmark to establish strong baselines as a point of reference. Despite the large size of our datasets ∗ Equal contribution. (tens to hundreds of thousands of examples), existing publicly available models struggle with most tasks, even after in-domain pretraining. We publish all resources (benchmark suite, pre-trained models, code) under a fully permissive open CC BY-SA license

    SCALE: Scaling up the Complexity for Advanced Language Model Evaluation

    Full text link
    Recent strides in Large Language Models (LLMs) have saturated many NLP benchmarks (even professional domain-specific ones), emphasizing the need for novel, more challenging novel ones to properly assess LLM capabilities. In this paper, we introduce a novel NLP benchmark that poses challenges to current LLMs across four key dimensions: processing long documents (up to 50K tokens), utilizing domain specific knowledge (embodied in legal texts), multilingual understanding (covering five languages), and multitasking (comprising legal document to document Information Retrieval, Court View Generation, Leading Decision Summarization, Citation Extraction, and eight challenging Text Classification tasks). Our benchmark comprises diverse legal NLP datasets from the Swiss legal system, allowing for a comprehensive study of the underlying Non-English, inherently multilingual, federal legal system. Despite recent advances, efficiently processing long documents for intense review/analysis tasks remains an open challenge for language models. Also, comprehensive, domain-specific benchmarks requiring high expertise to develop are rare, as are multilingual benchmarks. This scarcity underscores our contribution's value, considering most public models are trained predominantly on English corpora, while other languages remain understudied, particularly for practical domain-specific NLP tasks. Our benchmark allows for testing and advancing the state-of-the-art LLMs. As part of our study, we evaluate several pre-trained multilingual language models on our benchmark to establish strong baselines as a point of reference. Despite the large size of our datasets (tens to hundreds of thousands of examples), existing publicly available models struggle with most tasks, even after in-domain pretraining. We publish all resources (benchmark suite, pre-trained models, code) under a fully permissive open CC BY-SA license

    Language change for the worse

    Get PDF
    Many theories hold that language change, at least on a local level, is driven by a need for improvement. The present volume explores to what extent this assumption holds true, and whether there is a particular type of language change that we dub language change for the worse, i.e., change with a worsening effect that cannot be explained away as a side-effect of improvement in some other area of the linguistic system. The chapters of the volume, written by leading junior and senior scholars, combine expertise in diachronic and historical linguistics, typology, and formal modelling. They focus on different aspects of grammar (phonology, morphosyntax, semantics) in a variety of language families (Germanic, Romance, Austronesian, Bantu, Jê-Kaingang, Wu Chinese, Greek, Albanian, Altaic, Indo-Aryan, and languages of the Caucasus). The volume contributes to ongoing theoretical debates and discussions between linguists with different theoretical orientations

    Language change for the worse

    Get PDF
    Many theories hold that language change, at least on a local level, is driven by a need for improvement. The present volume explores to what extent this assumption holds true, and whether there is a particular type of language change that we dub language change for the worse, i.e., change with a worsening effect that cannot be explained away as a side-effect of improvement in some other area of the linguistic system. The chapters of the volume, written by leading junior and senior scholars, combine expertise in diachronic and historical linguistics, typology, and formal modelling. They focus on different aspects of grammar (phonology, morphosyntax, semantics) in a variety of language families (Germanic, Romance, Austronesian, Bantu, Jê-Kaingang, Wu Chinese, Greek, Albanian, Altaic, Indo-Aryan, and languages of the Caucasus). The volume contributes to ongoing theoretical debates and discussions between linguists with different theoretical orientations

    Forthcoming: Language change for the worse

    No full text
    Many theories hold that language change, at least on a local level, is driven by a need for improvement. The present volume explores to what extent this assumption holds true, and whether there is a particular type of language change that we dub language change for the worse, i.e., change with a worsening effect that cannot be explained away as a side-effect of improvement in some other area of the linguistic system. The chapters of the volume, written by leading junior and senior scholars, combine expertise in diachronic and historical linguistics, typology, and formal modelling. They focus on different aspects of grammar (phonology, morphosyntax, semantics) in a variety of language families (Germanic, Romance, Austronesian, Bantu, Jê-Kaingang, Wu Chinese, Greek, Albanian, Altaic, Indo-Aryan, and languages of the Caucasus). The volume contributes to ongoing theoretical debates and discussions between linguists with different theoretical orientations

    Forthcoming: Language change for the worse

    No full text
    Many theories hold that language change, at least on a local level, is driven by a need for improvement. The present volume explores to what extent this assumption holds true, and whether there is a particular type of language change that we dub language change for the worse, i.e., change with a worsening effect that cannot be explained away as a side-effect of improvement in some other area of the linguistic system. The chapters of the volume, written by leading junior and senior scholars, combine expertise in diachronic and historical linguistics, typology, and formal modelling. They focus on different aspects of grammar (phonology, morphosyntax, semantics) in a variety of language families (Germanic, Romance, Austronesian, Bantu, Jê-Kaingang, Wu Chinese, Greek, Albanian, Altaic, Indo-Aryan, and languages of the Caucasus). The volume contributes to ongoing theoretical debates and discussions between linguists with different theoretical orientations
    corecore