16 research outputs found

    LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

    Get PDF
    The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning—which distinguish between its many forms—correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables

    LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

    Full text link
    The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.Comment: 143 pages, 79 tables, 4 figure

    Predicting consensus in legal document interpretation

    No full text

    Numerals under negation: Empirical findings

    No full text
    Despite a vast literature on the semantics and pragmatics of cardinal numerals, it has gone largely unnoticed that they exhibit a variety of polarity sensitivity, in that they require contextual support to occur felicitously in the scope of sentential negation. We present the results of a corpus analysis and two experiments that demonstrate that negated cardinals are acceptable when the negated value has been asserted or otherwise explicitly mentioned in the preceding discourse context, but unacceptable when such a value is neither mentioned nor inferable from that context. In this, bare cardinals exhibit both similarities to and differences from other types of numerical expressions. We propose an account of our findings based on the notion of convexity of linguistic meanings (Gärdenfors 2004) and discuss the implications for the semantics of numerical expressions more generally

    Pragmatic Enrichment with Semantically Exhaustive Alternatives

    No full text
    corecore