159 research outputs found
Recommended from our members
Tell Me Everything You Know: A Conversation Update System for the Rational Speech Acts Framework
The Rational Speech Acts (RSA) framework has been applied to an increasing number of linguistic phenomena. Despite its promise as a model of conversational reasoning, it has rarely been used to model more than a single conversation turn. I propose a system for conversation update in the RSA framework that allows iterative simulations of production and comprehension. I explore three key issues: how to simulate the Common Ground; how to update the Common Ground and participant belief states; and how to select observations
Recommended from our members
Shifting the Perspectival Landscape: Methods for Encoding, Identifying, and Selecting Perspectives
This dissertation explores the semantics and pragmatics of perspectival expressions. Perspective, or point-of-view, encompasses an individualās thoughts, perceptions, and location. Many expressions in natural language have components of their meanings that shift depending on whose perspective they are evaluated against. In this dissertation, I explore two sets of questions relating to perspective sensitivity. The first set of questions relate to how perspective is encoded in the semantics of perspectival expressions. The second set of questions relate to how conversation participants treat perspectival expressions: the speakerās selection of a perspective and the listenerās identification of the speakerās perspective.
In Part I, I explore the landscape of perspectival expressions by exploring different semantic mechanisms for encoding the perspective holder. In Chapter 2, I introduce key properties of perspectival expressions through a discussion of one canonical perspectival expression: the motion verb come. In Chapter 3, I discuss the various ways of encoding the perspective holder in the semantics of perspectival expressions. I contrast the predictions of these approaches and lay out a set of diagnostics to guide the analysis of perspectival expressions.
I present two case studies using this set of diagnostics. In Chapter 3, I probe the semantics of the well-studied perspectival expression come in American English, and argue in favor of a perspective-anaphoric analysis. In Chapter 4, I focus on an expression that has not previously been recognized as perspectival, the temporal adverbial tomorrow. Through a series of experimental studies, I make the case that tomorrow is perspective-sensitive for some American English speakers, and narrow the hypothesis space for a perspectival account of tomorrow. I sketch a perspective-anaphoric semantics for tomorrow, while leaving open the possibility of a logophoric analysis. I conclude Part I with a discussion of how perspectival expressions fit into the broader landscape of context sensitivity.
In Part II, I turn to a fresh set of questions about perspective: how do conversation participants select and identify perspectives? In Chapter 6, I discuss previous models of perspective production and comprehension, and factors that affect these processes, such as a bias towards the perspective of the speaker. I argue that although the selection and identification of perspective holders may be guided by simple heuristics some of the time, certain cases require a more involved reasoning system. In Chapters 7 and 8, I develop models of perspectival reasoning in comprehension and production rooted in a leading framework for pragmatic reasoning: the Rational Speech Acts framework.
In Chapter 7, I propose and implement a computational model of perspective identification. I posit that listeners reason jointly about the speakerās intended message and their adopted perspective using a mental model of the speakerās production process. I present two comprehension studies that support a key assumption of the proposed Perspectival Rational Speech Acts model: that listeners reason simultaneously over multiple perspectives to better understand the speakerās intended meaning.
In Chapter 8, I propose a model of perspective selection that mirrors the Perspectival Rational Speech Acts comprehension model. I posit that speakers reason about the listenerās comprehension process in order to pick a perspective and an utterance that will maximize their chance of being understood. However, the results of the production study do not match the modelās predictions. I conclude with a discussion of the challenges that the attested asymmetry between speaker and listeners poses for the Rational Speech Acts framework.
The main contributions of this dissertation are as follows: (1) a comparison of four approaches to encoding the semantics of perspective, leading to a diagnostic toolkit for perspectival expressions; (2) an experimental case study that employs the diagnostics to identify a novel perspectival expression; (3) an implemented computational model of perspective identification, supported by experimental evidence; and (4) an implemented computational model of perspective selection, which reveals further challenges in perspective production
Negation in Colonial Valley Zapotec
This paper presents an overview of negation in Colonial Valley Zapotec (CVZ) based on a corpus of texts written in Valley Zapotec between 1565 and 1808. There are four negative markers in CVZ, two bound (ya=, qui=) and two free (aca, yaca). Standard negation employs a negative word and an optional clitic, =ti. Understanding the syntax of an historical form of Valley Zapotec allows us to make some observations about related forms in modern Valley Zapotec languages, in particular San Lucas Quiavin ı Zapotec (SLQZ). For example, the morpheme =ti, which is required in clausal negation in SLQZ, is not obligatory in any negative constructions in CVZ until around 1800. In Vellon 1808, the youngest text in the corpus, we observe =ti required in one type of clausal negation. This allows us to observe details of the development of the modern Valley Zapotec negation system, including the fact that the remaining changes leading to obligatory =ti in clausal negation in SLQZ must have occurred within the last 200 years
Recommended from our members
Do All Minority Languages Look the Same to GPT-3? Linguistic (Mis)information in a Large Language Model
Recommended from our members
Guess Whoās Coming (and Whoās Going): Bringing Perspective to the Rational Speech Acts Framework
We present a Rational Speech Acts approach to modeling how conversation participants reason about perspectival expressions. The interpretation of perspectival expressions, such as the motion verbs \u27come\u27 and \u27go\u27, depends on the point-of-view from which they are evaluated. In order to interpret a perspectival expression, the listener must jointly reason about the speakerās intended message and their choice of perspective. We propose a Bayesian approach to this inference problem and describe an extension of the Rational Speech Acts model that incorporates perspective. We lay out three sets of predictions that this model makes relating to the lexical semantics of go, the cost of non-speaker perspectives, and marginal inference over worlds
NetKAT: Semantic Foundations for Networks
Recent years have seen growing interest in high-level languages for programming networks. But the design of these languages has been largely ad hoc, driven more by the needs of applications and the capabilities of network hardware than by foundational principles. The lack of a semantic foundation has left language designers with little guidance in determining how to incorporate new features, and programmers without a means to reason precisely about their code. This paper presents NetKAT, a new network programming language that is based on a solid mathematical foundation and comes equipped with a sound and complete equational theory. We describe the design of NetKAT, including primitives for filtering, modifying, and transmitting packets; operators for combining programs in parallel and in sequence; and a Kleene star operator for iteration. We show that NetKAT is an instance of a canonical and well studied mathematical structure called a Kleene algebra with tests (KAT) and prove that its equational theory is sound and complete with respect to its denotational semantics. Finally, we present practical applications of the equational theory including syntactic techniques for checking reachability properties, proving the correctness of compilation and optimization algorithms, and establishing a non-interference property that ensures isolation between programs.Supported in part by the NSF under grant CNS-1111698, the ONR under award N00014-12-1-0757, a Sloan Research Fellowship, and a Google Research Award
Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs
Over the past few years, Large Language Models of Code (Code LLMs) have
started to have a significant impact on programming practice. Code LLMs are
also emerging as a building block for research in programming languages and
software engineering. However, the quality of code produced by a Code LLM
varies significantly by programming languages. Code LLMs produce impressive
results on programming languages that are well represented in their training
data (e.g., Java, Python, or JavaScript), but struggle with low-resource
languages, like OCaml and Racket.
This paper presents an effective approach for boosting the performance of
Code LLMs on low-resource languages using semi-synthetic data. Our approach
generates high-quality datasets for low-resource languages, which can then be
used to fine-tune any pretrained Code LLM. Our approach, called MultiPL-T,
translates training data from high-resource languages into training data for
low-resource languages. We apply our approach to generate tens of thousands of
new, validated training items for Racket, OCaml, and Lua from Python. Moreover,
we use an open dataset (The Stack) and model (StarCoderBase), which allow us to
decontaminate benchmarks and train models on this data without violating the
model license.
With MultiPL-T generated data, we present fine-tuned versions of
StarCoderBase that achieve state-of-the-art performance for Racket, OCaml, and
Lua on benchmark problems. For Lua, our fine-tuned model achieves the same
performance as StarCoderBase as Python -- a very high-resource language -- on
the MultiPL-E benchmarks. For Racket and OCaml, we double their performance on
MultiPL-E, bringing their performance close to higher-resource languages such
as Ruby and C#
A Scalable and Extensible Approach to Benchmarking NL2Code for 18 Programming Languages
Large language models have demonstrated the ability to condition on and
generate both natural language and programming language text. Such models open
up the possibility of multi-language code generation: could code generation
models generalize knowledge from one language to another? Although contemporary
code generation models can generate semantically correct Python code, little is
known about their abilities with other languages. We facilitate the exploration
of this topic by proposing MultiPL-E, the first multi-language parallel
benchmark for natural-language-to-code-generation.
MultiPL-E extends the HumanEval benchmark (Chen et al, 2021) to support 18
more programming languages, encompassing a range of programming paradigms and
popularity. We evaluate two state-of-the-art code generation models on
MultiPL-E: Codex and InCoder. We find that on several languages, Codex matches
and even exceeds its performance on Python. The range of programming languages
represented in MultiPL-E allow us to explore the impact of language frequency
and language features on model performance. Finally, the MultiPL-E approach of
compiling code generation benchmarks to new programming languages is both
scalable and extensible. We describe a general approach for easily adding
support for new benchmarks and languages to MultiPL-E
Longitudinal Assessment of Growth in Hypoplastic Left Heart Syndrome: Results From the Single Ventricle Reconstruction Trial
Background: We sought to characterize growth between birth and age 3 years in infants with hypoplastic left heart syndrome who underwent the Norwood procedure. Methods and Results: We performed a secondary analysis using the Single Ventricle Reconstruction Trial database after excluding patients 2 SD below normal). Failure to find consistent risk factors supports the strategy of tailoring nutritional therapies to patientā and stageāspecific targets. Clinical Trial Registration URL: http://clinicaltrials.gov/. Unique identifier: NCT00115934
- ā¦