13 research outputs found

    Learning Word Embeddings: Unsupervised Methods for Fixed-size Representations of Variable-length Speech Segments

    Get PDF
    International audienceFixed-length embeddings of words are very useful for a variety of tasks in speech and language processing. Here we systematically explore two methods of computing fixed-length embeddings for variable-length sequences. We evaluate their susceptibility to phonetic and speaker-specific variability on English, a high resource language and Xitsonga, a low resource language, using two evaluation metrics: ABX word discrimination and ROC-AUC on same-different phoneme n-grams. We show that a simple downsampling method supplemented with length information can outperform the variable-length input feature representation on both evaluations. Recurrent autoencoders, trained without supervision, can yield even better results at the expense of increased computational complexity

    LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

    Get PDF
    The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning—which distinguish between its many forms—correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables

    LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models

    Full text link
    The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning -- which distinguish between its many forms -- correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables.Comment: 143 pages, 79 tables, 4 figure

    COMPUTATIONAL STATUTORY REASONING

    No full text
    Statutory reasoning is the task of determining how laws apply to a legal case. This is a basic skill for lawyers, and in its computational form, a fundamental task for legal Artificial Intelligence (AI) systems. In this thesis, I take a first step towards solving computational statutory reasoning. First, I define computational statutory reasoning, in the context of legal practice, and AI more broadly (Chapter 1). I detail why statutory reasoning is important for legal AI, and how solving it will require addressing multiple problems in general AI research. In Chapter 2, I review research in reasoning, knowledge representation and legal AI that is relevant to the work in this thesis. My first contribution is the StAtutory Reasoning Assessment dataset (SARA), a benchmark dataset for computational statutory reasoning (Chapter 3). With the ability to measure performance on statutory reasoning, I show how a symbolic system can solve the SARA dataset, while state-of-the-art Machine Reading (MR) struggles. In Chapter 4, I connect statutory reasoning to established natural language processing tasks, in an attempt to diagnose MR errors. This yields more annotations on SARA, and a performance boost compared to my initial MR baselines. In Chapter 5, I return to the symbolic approach of Chapter 3. Revising the ontology used in Chapter 3, I introduce models for Information Extraction (IE) from SARA cases. The attained performance opens up new perspectives on how to solve statutory reasoning. I close by summarizing the contributions of this thesis, and expanding on possible future research (Chapter 6)

    COMPUTATIONAL STATUTORY REASONING

    No full text
    Statutory reasoning is the task of determining how laws apply to a legal case. This is a basic skill for lawyers, and in its computational form, a fundamental task for legal Artificial Intelligence (AI) systems. In this thesis, I take a first step towards solving computational statutory reasoning. First, I define computational statutory reasoning, in the context of legal practice, and AI more broadly (Chapter 1). I detail why statutory reasoning is important for legal AI, and how solving it will require addressing multiple problems in general AI research. In Chapter 2, I review research in reasoning, knowledge representation and legal AI that is relevant to the work in this thesis. My first contribution is the StAtutory Reasoning Assessment dataset (SARA), a benchmark dataset for computational statutory reasoning (Chapter 3). With the ability to measure performance on statutory reasoning, I show how a symbolic system can solve the SARA dataset, while state-of-the-art Machine Reading (MR) struggles. In Chapter 4, I connect statutory reasoning to established natural language processing tasks, in an attempt to diagnose MR errors. This yields more annotations on SARA, and a performance boost compared to my initial MR baselines. In Chapter 5, I return to the symbolic approach of Chapter 3. Revising the ontology used in Chapter 3, I introduce models for Information Extraction (IE) from SARA cases. The attained performance opens up new perspectives on how to solve statutory reasoning. I close by summarizing the contributions of this thesis, and expanding on possible future research (Chapter 6)

    Asking the Right Questions in Low Resource Template Extraction

    Full text link
    Information Extraction (IE) researchers are mapping tasks to Question Answering (QA) in order to leverage existing large QA resources, and thereby improve data efficiency. Especially in template extraction (TE), mapping an ontology to a set of questions can be more time-efficient than collecting labeled examples. We ask whether end users of TE systems can design these questions, and whether it is beneficial to involve an NLP practitioner in the process. We compare questions to other ways of phrasing natural language prompts for TE. We propose a novel model to perform TE with prompts, and find it benefits from questions over other styles of prompts, and that they do not require an NLP background to author

    Can GPT-3 Perform Statutory Reasoning?

    Full text link
    Statutory reasoning is the task of reasoning with facts and statutes, which are rules written in natural language by a legislature. It is a basic legal skill. In this paper we explore the capabilities of the most capable GPT-3 model, text-davinci-003, on an established statutory-reasoning dataset called SARA. We consider a variety of approaches, including dynamic few-shot prompting, chain-of-thought prompting, and zero-shot prompting. While we achieve results with GPT-3 that are better than the previous best published results, we also identify several types of clear errors it makes. We investigate why these errors happen. We discover that GPT-3 has imperfect prior knowledge of the actual U.S. statutes on which SARA is based. More importantly, we create simple synthetic statutes, which GPT-3 is guaranteed not to have seen during training. We find GPT-3 performs poorly at answering straightforward questions about these simple synthetic statutes.Comment: 10 page
    corecore