1,748 research outputs found
The role of word frequency and morpho-orthography in agreement processing
Agreement attraction in comprehension (when an ungrammatical verb is read quickly if preceded by a feature-matching local noun) is well described by a cue-based retrieval framework. This suggests a role for lexical retrieval in attraction. To examine this, we manipulated two probabilistic factors known to affect lexical retrieval: local noun word frequency and morpho-orthography (agreement morphology realised with or without –s endings) in a self-paced reading study. Noun number and word frequency affected noun and verb region reading times, with higher-frequency words not eliciting attraction. Morpho-orthography impacted verb processing but not attraction: atypical plurals led to slower verb reading times regardless of verb number. Exploratory individual difference analyses further underscore the importance of lexical retrieval dynamics in sentence processing. This provides evidence that agreement operates via a cue-based retrieval mechanism over lexical representations that vary in their strength and association to number features
Polyglot Semantic Parsing in APIs
Traditional approaches to semantic parsing (SP) work by training individual
models for each available parallel dataset of text-meaning pairs. In this
paper, we explore the idea of polyglot semantic translation, or learning
semantic parsing models that are trained on multiple datasets and natural
languages. In particular, we focus on translating text to code signature
representations using the software component datasets of Richardson and Kuhn
(2017a,b). The advantage of such models is that they can be used for parsing a
wide variety of input natural languages and output programming languages, or
mixed input languages, using a single unified model. To facilitate modeling of
this type, we develop a novel graph-based decoding framework that achieves
state-of-the-art performance on the above datasets, and apply this method to
two other benchmark SP tasks.Comment: accepted for NAACL-2018 (camera ready version
Functional Baby Talk: Analysis of Code Fragments from Novice Haskell Programmers
What kinds of mistakes are made by novice Haskell developers, as they learn about functional programming? Is it possible to analyze these errors in order to improve the pedagogy of Haskell? In 2016, we delivered a massive open online course which featured an interactive code evaluation environment. We captured and analyzed 161K interactions from learners. We report typical novice developer behavior; for instance, the mean time spent on an interactive tutorial is around eight minutes. Although our environment was restricted, we gain some understanding of Haskell novice errors. Parenthesis mismatches, lexical scoping errors and do block misunderstandings are common. Finally, we make recommendations about how such beginner code evaluation environments might be enhanced
Improving Hypernymy Extraction with Distributional Semantic Classes
In this paper, we show how distributionally-induced semantic classes can be
helpful for extracting hypernyms. We present methods for inducing sense-aware
semantic classes using distributional semantics and using these induced
semantic classes for filtering noisy hypernymy relations. Denoising of
hypernyms is performed by labeling each semantic class with its hypernyms. On
the one hand, this allows us to filter out wrong extractions using the global
structure of distributionally similar senses. On the other hand, we infer
missing hypernyms via label propagation to cluster terms. We conduct a
large-scale crowdsourcing study showing that processing of automatically
extracted hypernyms using our approach improves the quality of the hypernymy
extraction in terms of both precision and recall. Furthermore, we show the
utility of our method in the domain taxonomy induction task, achieving the
state-of-the-art results on a SemEval'16 task on taxonomy induction.Comment: In Proceedings of the 11th Conference on Language Resources and
Evaluation (LREC 2018). Miyazaki, Japa
The C++0x "Concepts" Effort
C++0x is the working title for the revision of the ISO standard of the C++
programming language that was originally planned for release in 2009 but that
was delayed to 2011. The largest language extension in C++0x was "concepts",
that is, a collection of features for constraining template parameters. In
September of 2008, the C++ standards committee voted the concepts extension
into C++0x, but then in July of 2009, the committee voted the concepts
extension back out of C++0x.
This article is my account of the technical challenges and debates within the
"concepts" effort in the years 2003 to 2009. To provide some background, the
article also describes the design space for constrained parametric
polymorphism, or what is colloquially know as constrained generics. While this
article is meant to be generally accessible, the writing is aimed toward
readers with background in functional programming and programming language
theory. This article grew out of a lecture at the Spring School on Generic and
Indexed Programming at the University of Oxford, March 2010
Constraint Generation for the Jeeves Privacy Language
Our goal is to present a completed, semantic formalization of the Jeeves privacy language evaluation engine, based on the original Jeeves constraint semantics defined by Yang et al at POPL12, but sufficiently strong to support a first complete implementation thereof. Specifically, we present and implement a syntactically and semantically completed concrete syntax for Jeeves that meets the example criteria given in the paper. We also present and implement the associated translation to J, but here formulated by a completed and decompositional operational semantic formulation. Finally, we present an enhanced and decompositional, non-substitutional operational semantic formulation and implementation of the J evaluation engine (the dynamic semantics) with privacy constraints. In particular, we show how implementing the constraints can be defined as a monad, and evaluation can be defined as monadic operation on the constraint environment. The implementations are all completed in Haskell, utilizing its almost one-to-one capability to transparently reflect the underlying semantic reasoning when formalized this way. In practice, we have applied the "literate" program facility of Haskell to this report, a feature that enables the source LATEX to also serve as the source code for the implementation (skipping the report-parts as comment regions). The implementation is published as a github project
Finding The Lazy Programmer's Bugs
Traditionally developers and testers created huge numbers of explicit tests, enumerating interesting cases, perhaps
biased by what they believe to be the current boundary conditions of the function being tested. Or at
least, they were supposed to.
A major step forward was the development of property testing. Property testing requires the user to write a few
functional properties that are used to generate tests, and requires an external library or tool to create test data
for the tests. As such many thousands of tests can be created for a single property. For the purely functional
programming language Haskell there are several such libraries; for example QuickCheck [CH00], SmallCheck
and Lazy SmallCheck [RNL08].
Unfortunately, property testing still requires the user to write explicit tests. Fortunately, we note there are
already many implicit tests present in programs. Developers may throw assertion errors, or the compiler may
silently insert runtime exceptions for incomplete pattern matches.
We attempt to automate the testing process using these implicit tests. Our contributions are in four main
areas: (1) We have developed algorithms to automatically infer appropriate constructors and functions needed
to generate test data without requiring additional programmer work or annotations. (2) To combine the
constructors and functions into test expressions we take advantage of Haskell's lazy evaluation semantics by
applying the techniques of needed narrowing and lazy instantiation to guide generation. (3) We keep the type
of test data at its most general, in order to prevent committing too early to monomorphic types that cause
needless wasted tests. (4) We have developed novel ways of creating Haskell case expressions to inspect elements
inside returned data structures, in order to discover exceptions that may be hidden by laziness, and to make
our test data generation algorithm more expressive.
In order to validate our claims, we have implemented these techniques in Irulan, a fully automatic tool for
generating systematic black-box unit tests for Haskell library code. We have designed Irulan to generate high
coverage test suites and detect common programming errors in the process
Inflectional morphology and compounding in English : a single route, associative memory based account
Native English speakers include irregular plurals in English compounds (e. g., mice
chaser) more frequently than regular plurals (e. g., *rats chaser) (Gordon, 1985).
This dissociation in inflectional morphology has been argued to stem from an
internal and innate morphological constraint as it is thought that the input to which
English speaking children are exposed is insufficient to signal that regular plurals are
prohibited in compounds but irregulars might be allowed (Marcus, Brinkmann,
Clahsen, Weise & Pinker, 1995). In addition, this dissociation in English compounds
has been invoked to support the idea that regular and irregular morphology are
mediated by separate cognitive systems (Pinker, 1999). It is argued in this thesis
however, that the constraint on English compounds can be derived from the general
frequencies and patterns in which the two types of plural (regular and irregular) and
the possessive morpheme occur in the input. In English both plurality (on regular
nouns) and possession are denoted by a [-s] morpheme. It is argued that the
constraint on the use of plurals in English compounds occurs because of competition
between these two identical morphemes. Regular plurals are excluded before a
second noun because the pattern -noun-[-sJ morpheme- noun- is reserved for
marking possession in English. Irregular plurals do not end in the [-s] morpheme and
as such do not compete with the possessive marker and consequently may be
optionally included in compounds. Interestingly, plurals are allowed in compounds
in other languages where this competitive relationship does not exist (e. g. Dutch
(Schreuder, Neijt, van der Weide & Baayen, 1998) and French (Murphy, 2000). As
well as not being in competition with the possessive structure irregular plurals also
occur relatively infrequently in the input compared to regular plurals. This
imbalance between the frequency of regular and irregular plurals in compounds also
affects the way the two types of plural are treated in compounds. Thus there is no
need for an innate mechanism to explain the treatment of plurals in English
compounds. There is enough evidence available in the input to constrain the
formation of compound words in English
- …