Search CORE

159 research outputs found

Recommended from our members

Tell Me Everything You Know: A Conversation Update System for the Rational Speech Acts Framework

Author: Anderson Carolyn Jane
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2021
Field of study

The Rational Speech Acts (RSA) framework has been applied to an increasing number of linguistic phenomena. Despite its promise as a model of conversational reasoning, it has rarely been used to model more than a single conversation turn. I propose a system for conversation update in the RSA framework that allows iterative simulations of production and comprehension. I explore three key issues: how to simulate the Common Ground; how to update the Common Ground and participant belief states; and how to select observations

ScholarWorks@UMass Amherst

Recommended from our members

Shifting the Perspectival Landscape: Methods for Encoding, Identifying, and Selecting Perspectives

Author: Anderson Carolyn Jane
Publication venue: ScholarWorks@UMass Amherst
Publication date: 06/04/2021
Field of study

This dissertation explores the semantics and pragmatics of perspectival expressions. Perspective, or point-of-view, encompasses an individual’s thoughts, perceptions, and location. Many expressions in natural language have components of their meanings that shift depending on whose perspective they are evaluated against. In this dissertation, I explore two sets of questions relating to perspective sensitivity. The first set of questions relate to how perspective is encoded in the semantics of perspectival expressions. The second set of questions relate to how conversation participants treat perspectival expressions: the speaker’s selection of a perspective and the listener’s identification of the speaker’s perspective. In Part I, I explore the landscape of perspectival expressions by exploring different semantic mechanisms for encoding the perspective holder. In Chapter 2, I introduce key properties of perspectival expressions through a discussion of one canonical perspectival expression: the motion verb come. In Chapter 3, I discuss the various ways of encoding the perspective holder in the semantics of perspectival expressions. I contrast the predictions of these approaches and lay out a set of diagnostics to guide the analysis of perspectival expressions. I present two case studies using this set of diagnostics. In Chapter 3, I probe the semantics of the well-studied perspectival expression come in American English, and argue in favor of a perspective-anaphoric analysis. In Chapter 4, I focus on an expression that has not previously been recognized as perspectival, the temporal adverbial tomorrow. Through a series of experimental studies, I make the case that tomorrow is perspective-sensitive for some American English speakers, and narrow the hypothesis space for a perspectival account of tomorrow. I sketch a perspective-anaphoric semantics for tomorrow, while leaving open the possibility of a logophoric analysis. I conclude Part I with a discussion of how perspectival expressions fit into the broader landscape of context sensitivity. In Part II, I turn to a fresh set of questions about perspective: how do conversation participants select and identify perspectives? In Chapter 6, I discuss previous models of perspective production and comprehension, and factors that affect these processes, such as a bias towards the perspective of the speaker. I argue that although the selection and identification of perspective holders may be guided by simple heuristics some of the time, certain cases require a more involved reasoning system. In Chapters 7 and 8, I develop models of perspectival reasoning in comprehension and production rooted in a leading framework for pragmatic reasoning: the Rational Speech Acts framework. In Chapter 7, I propose and implement a computational model of perspective identification. I posit that listeners reason jointly about the speaker’s intended message and their adopted perspective using a mental model of the speaker’s production process. I present two comprehension studies that support a key assumption of the proposed Perspectival Rational Speech Acts model: that listeners reason simultaneously over multiple perspectives to better understand the speaker’s intended meaning. In Chapter 8, I propose a model of perspective selection that mirrors the Perspectival Rational Speech Acts comprehension model. I posit that speakers reason about the listener’s comprehension process in order to pick a perspective and an utterance that will maximize their chance of being understood. However, the results of the production study do not match the model’s predictions. I conclude with a discussion of the challenges that the attested asymmetry between speaker and listeners poses for the Rational Speech Acts framework. The main contributions of this dissertation are as follows: (1) a comparison of four approaches to encoding the semantics of perspective, leading to a diagnostic toolkit for perspectival expressions; (2) an experimental case study that employs the diagnostics to identify a novel perspectival expression; (3) an implemented computational model of perspective identification, supported by experimental evidence; and (4) an implemented computational model of perspective selection, which reveals further challenges in perspective production

ScholarWorks@UMass Amherst

Negation in Colonial Valley Zapotec

Author: Brook Lillehaugen
Carolyn Jane Anderson
Publication venue: 'Modern Language Association'
Publication date: 01/01/2016
Field of study

This paper presents an overview of negation in Colonial Valley Zapotec (CVZ) based on a corpus of texts written in Valley Zapotec between 1565 and 1808. There are four negative markers in CVZ, two bound (ya=, qui=) and two free (aca, yaca). Standard negation employs a negative word and an optional clitic, =ti. Understanding the syntax of an historical form of Valley Zapotec allows us to make some observations about related forms in modern Valley Zapotec languages, in particular San Lucas Quiavin ı Zapotec (SLQZ). For example, the morpheme =ti, which is required in clausal negation in SLQZ, is not obligatory in any negative constructions in CVZ until around 1800. In Vellon 1808, the youngest text in the corpus, we observe =ti required in one type of clausal negation. This allows us to observe details of the development of the modern Valley Zapotec negation system, including the fact that the remaining changes leading to obligatory =ti in clausal negation in SLQZ must have occurred within the last 200 years

Humanities Commons

Haverford College: Haverford Scholarship

Recommended from our members

Do All Minority Languages Look the Same to GPT-3? Linguistic (Mis)information in a Large Language Model

Author: Anderson Carolyn Jane
Nguyen Sydney
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/06/2023
Field of study

ScholarWorks@UMass Amherst

Recommended from our members

Guess Who’s Coming (and Who’s Going): Bringing Perspective to the Rational Speech Acts Framework

Author: Anderson Carolyn Jane
Dillon Brian W.
Publication venue: ScholarWorks@UMass Amherst
Publication date: 01/01/2019
Field of study

We present a Rational Speech Acts approach to modeling how conversation participants reason about perspectival expressions. The interpretation of perspectival expressions, such as the motion verbs \u27come\u27 and \u27go\u27, depends on the point-of-view from which they are evaluated. In order to interpret a perspectival expression, the listener must jointly reason about the speaker’s intended message and their choice of perspective. We propose a Bayesian approach to this inference problem and describe an extension of the Rational Speech Acts model that incorporates perspective. We lay out three sets of predictions that this model makes relating to the lexical semantics of go, the cost of non-speaker perspectives, and marginal inference over worlds

ScholarWorks@UMass Amherst

NetKAT: Semantic Foundations for Networks

Author: Anderson Carolyn Jane
Foster Nate
Guha Arjun
Jeannin Jean-Baptiste
Kozen Dexter
Schlesinger Cole
Walker David
Publication venue
Publication date: 11/10/2013
Field of study

Recent years have seen growing interest in high-level languages for programming networks. But the design of these languages has been largely ad hoc, driven more by the needs of applications and the capabilities of network hardware than by foundational principles. The lack of a semantic foundation has left language designers with little guidance in determining how to incorporate new features, and programmers without a means to reason precisely about their code. This paper presents NetKAT, a new network programming language that is based on a solid mathematical foundation and comes equipped with a sound and complete equational theory. We describe the design of NetKAT, including primitives for filtering, modifying, and transmitting packets; operators for combining programs in parallel and in sequence; and a Kleene star operator for iteration. We show that NetKAT is an instance of a canonical and well studied mathematical structure called a Kleene algebra with tests (KAT) and prove that its equational theory is sound and complete with respect to its denotational semantics. Finally, we present practical applications of the equational theory including syntactic techniques for checking reachability properties, proving the correctness of compilation and optimization algorithms, and establishing a non-interference property that ensures isolation between programs.Supported in part by the NSF under grant CNS-1111698, the ONR under award N00014-12-1-0757, a Sloan Research Fellowship, and a Google Research Award

eCommons@Cornell

Knowledge Transfer from High-Resource to Low-Resource Programming Languages for Code LLMs

Author: Anderson Carolyn Jane
Cassano Federico
Gouwar John
Greenberg Michael
Guha Arjun
Jangda Abhinav
Lucchetti Francesca
Schlesinger Claire
Publication venue
Publication date: 21/08/2023
Field of study

Over the past few years, Large Language Models of Code (Code LLMs) have started to have a significant impact on programming practice. Code LLMs are also emerging as a building block for research in programming languages and software engineering. However, the quality of code produced by a Code LLM varies significantly by programming languages. Code LLMs produce impressive results on programming languages that are well represented in their training data (e.g., Java, Python, or JavaScript), but struggle with low-resource languages, like OCaml and Racket. This paper presents an effective approach for boosting the performance of Code LLMs on low-resource languages using semi-synthetic data. Our approach generates high-quality datasets for low-resource languages, which can then be used to fine-tune any pretrained Code LLM. Our approach, called MultiPL-T, translates training data from high-resource languages into training data for low-resource languages. We apply our approach to generate tens of thousands of new, validated training items for Racket, OCaml, and Lua from Python. Moreover, we use an open dataset (The Stack) and model (StarCoderBase), which allow us to decontaminate benchmarks and train models on this data without violating the model license. With MultiPL-T generated data, we present fine-tuned versions of StarCoderBase that achieve state-of-the-art performance for Racket, OCaml, and Lua on benchmark problems. For Lua, our fine-tuned model achieves the same performance as StarCoderBase as Python -- a very high-resource language -- on the MultiPL-E benchmarks. For Racket and OCaml, we double their performance on MultiPL-E, bringing their performance close to higher-resource languages such as Ruby and C#

arXiv.org e-Print Archive

A Scalable and Extensible Approach to Benchmarking NL2Code for 18 Programming Languages

Author: Anderson Carolyn Jane
Cassano Federico
Feldman Molly Q
Gouwar John
Greenberg Michael
Guha Arjun
Jangda Abhinav
Nguyen Daniel
Nguyen Sydney
Phipps-Costin Luna
Pinckney Donald
Yee Ming-Ho
Zi Yangtian
Publication venue
Publication date: 08/11/2022
Field of study

Large language models have demonstrated the ability to condition on and generate both natural language and programming language text. Such models open up the possibility of multi-language code generation: could code generation models generalize knowledge from one language to another? Although contemporary code generation models can generate semantically correct Python code, little is known about their abilities with other languages. We facilitate the exploration of this topic by proposing MultiPL-E, the first multi-language parallel benchmark for natural-language-to-code-generation. MultiPL-E extends the HumanEval benchmark (Chen et al, 2021) to support 18 more programming languages, encompassing a range of programming paradigms and popularity. We evaluate two state-of-the-art code generation models on MultiPL-E: Codex and InCoder. We find that on several languages, Codex matches and even exceeds its performance on Python. The range of programming languages represented in MultiPL-E allow us to explore the impact of language frequency and language features on model performance. Finally, the MultiPL-E approach of compiling code generation benchmarks to new programming languages is both scalable and extensible. We describe a general approach for easily adding support for new benchmarks and languages to MultiPL-E

arXiv.org e-Print Archive

Longitudinal Assessment of Growth in Hypoplastic Left Heart Syndrome: Results From the Single Ventricle Reconstruction Trial

Author: Carolyn Dunbar‐Masterson
Chitra Ravishankar
David A. Hehir
David S. Cooper
Eric Gerstenberger
Ismee A. Williams
Jane W. Newburger
Jeffrey B. Anderson
Jennifer S. Li
Karen Uzark
L. LuAnn Minich
Linda M. Lambert
Lynn A. Sleeper
Martha L. Clabby
Nancy A. Pike
null null
Phillip T. Burch
Ryan R. Davies
Sinai C. Zyblewski
Steven D. Colan
Svetlana Khaikin
Victoria L. Pemberton
Publication venue: 'Ovid Technologies (Wolters Kluwer Health)'
Publication date: 09/05/2014
Field of study

Background: We sought to characterize growth between birth and age 3 years in infants with hypoplastic left heart syndrome who underwent the Norwood procedure. Methods and Results: We performed a secondary analysis using the Single Ventricle Reconstruction Trial database after excluding patients 2 SD below normal). Failure to find consistent risk factors supports the strategy of tailoring nutritional therapies to patient‐ and stage‐specific targets. Clinical Trial Registration URL: http://clinicaltrials.gov/. Unique identifier: NCT00115934

Crossref

Harvard University - DASH

PubMed Central

Deep Blue Documents at the University of Michigan