16 research outputs found
Recommended from our members
Modeling Behavior in Truth Value Judgment Task Experiments
Truth Value Judgment Task experiments (TVJTs) are a common means of investigating pragmatic competence, particularly with regards to scalar inference. We present a novel quantitative linking function from pragmatic competence to participant behavior on TVJTs, based upon a Bayesian probabilistic model of linguistic production. Our model captures a range of observed phenomena on TVJTs, including intermediate responses on a non-binary scale, population and individual-level variation, participant endorsement of false utterances, and variation in response due to so-called scalar diversity
Recommended from our members
Modeling cross-linguistic production of referring expressions
We present a novel probabilistic model of referring expression production, synthesizing recent analyses proposed within the Rational Speech Act (RSA) framework (Frank and Goodman, 2012). Our model makes incremental utterance choice predictions (Cohn-Gordon et al. 2018a; Cohn-Gordon et al. 2018b) and assumes a non-deterministic semantics for adjectives in referring expressions (Degen et al. 2020). The model captures previously attested production patterns in reference game experiments, including English speakers’ tendency to produce redundant color adjectives more frequently than redundant size adjectives, as well as Spanish speakers’ tendency to employ redundant color adjectives less frequently than English speakers. We report the predictions made by the model under various parameter regimes, motivating future empirical work
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
The advent of large language models (LLMs) and their adoption by the legal community has given rise to the question: what types of legal reasoning can LLMs perform? To enable greater study of this question, we present LegalBench: a collaboratively constructed legal reasoning benchmark consisting of 162 tasks covering six different types of legal reasoning. LegalBench was built through an interdisciplinary process, in which we collected tasks designed and hand-crafted by legal professionals. Because these subject matter experts took a leading role in construction, tasks either measure legal reasoning capabilities that are practically useful, or measure reasoning skills that lawyers find interesting. To enable cross-disciplinary conversations about LLMs in the law, we additionally show how popular legal frameworks for describing legal reasoning—which distinguish between its many forms—correspond to LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary. This paper describes LegalBench, presents an empirical evaluation of 20 open-source and commercial LLMs, and illustrates the types of research explorations LegalBench enables
LegalBench: A Collaboratively Built Benchmark for Measuring Legal Reasoning in Large Language Models
The advent of large language models (LLMs) and their adoption by the legal
community has given rise to the question: what types of legal reasoning can
LLMs perform? To enable greater study of this question, we present LegalBench:
a collaboratively constructed legal reasoning benchmark consisting of 162 tasks
covering six different types of legal reasoning. LegalBench was built through
an interdisciplinary process, in which we collected tasks designed and
hand-crafted by legal professionals. Because these subject matter experts took
a leading role in construction, tasks either measure legal reasoning
capabilities that are practically useful, or measure reasoning skills that
lawyers find interesting. To enable cross-disciplinary conversations about LLMs
in the law, we additionally show how popular legal frameworks for describing
legal reasoning -- which distinguish between its many forms -- correspond to
LegalBench tasks, thus giving lawyers and LLM developers a common vocabulary.
This paper describes LegalBench, presents an empirical evaluation of 20
open-source and commercial LLMs, and illustrates the types of research
explorations LegalBench enables.Comment: 143 pages, 79 tables, 4 figure
Numerals under negation: Empirical findings
Despite a vast literature on the semantics and pragmatics of cardinal numerals, it has gone largely unnoticed that they exhibit a variety of polarity sensitivity, in that they require contextual support to occur felicitously in the scope of sentential negation. We present the results of a corpus analysis and two experiments that demonstrate that negated cardinals are acceptable when the negated value has been asserted or otherwise explicitly mentioned in the preceding discourse context, but unacceptable when such a value is neither mentioned nor inferable from that context. In this, bare cardinals exhibit both similarities to and differences from other types of numerical expressions. We propose an account of our findings based on the notion of convexity of linguistic meanings (Gärdenfors 2004) and discuss the implications for the semantics of numerical expressions more generally
Recommended from our members
The cross-linguistic order of adjectives and nouns may be the result of iterated pragmatic pressures on referential communication
The world's languages differ in how they order adjectives and nouns relative to each other. We ask whether cross-linguistic variation and systematicity in adjective-noun order can be explained by the iterated pressure for pragmatic referential communication. To this end, we apply the Rational Speech Act framework with an an iterated learning mechanism to study how cooperative pressures may shape typological regularities in referential communication. First, we show that the less informative adjectives are relative to nouns, the more likely they are to occur post-nominally. This is the case when informativeness is manipulated via the composition of the lexical space (i.e., changing the relative number of adjectives vs.~nouns that are available for reference), and via the inherent referential utility of adjectives vs.~nouns. Secondly, we show that under the assumption that nouns are on average more informative than adjectives, the model predicts a cross-linguistic distribution of ordering preferences that qualitatively resembles the empirical one, with these biases becoming further entrenched with iterated language use. Taken together, these results suggest a possible pathway for syntactic preferences to be calcified over time as the result of pragmatic communicative pressures on language
Recommended from our members
Informativity and accessibility in incremental production of the dative alternation
Variation in the use of syntactic alternations has long been an explanatory target of language production theories. In this work, we test the predictions of several semantic, pragmatic and psycholinguistic theories of language use for the English dative alternation. We first experimentally test the role of incremental constituent informativity in the dative alternation, and find that contrary to information structural and RSA models of production, informativity has little effect on production preferences. We then more rigorously focus on accessibility effects, demonstrating that a lossy-context automatic policy can recover a key pattern of accessibility. Ultimately, we conclude that audience design pressures likely do not influence incremental production, but simply may affect planning at a broader scope