24,383 research outputs found
Grammaticalization and grammar
This paper is concerned with developing Joan Bybee's proposals regarding the nature of grammatical meaning and synthesizing them with Paul Hopper's concept of grammar as emergent. The basic question is this: How much of grammar may be modeled in terms of grammaticalization? In contradistinction to Heine, Claudi & Hünnemeyer (1991), who propose a fairly broad and unconstrained framework for grammaticalization, we try to present a fairly specific and constrained theory of grammaticalization in order to get a more precise idea of the potential and the problems of this approach. Thus, while Heine et al. (1991:25) expand – without discussion – the traditional notion of grammaticalization to the clause level, and even include non-segmental structure (such as word order), we will here adhere to a strictly 'element-bound' view of grammaticalization: where no grammaticalized element exists, there is no grammaticalization. Despite this fairly restricted concept of grammaticalization, we will attempt to corroborate the claim that essential aspects of grammar may be understood and modeled in terms of grammaticalization. The approach is essentially theoretical (practical applications will, hopefully, follow soon) and many issues are just mentioned and not discussed in detail. The paper presupposes a familiarity with the basic facts of grammaticalization and it does not present any new facts
From Language to Programs: Bridging Reinforcement Learning and Maximum Marginal Likelihood
Our goal is to learn a semantic parser that maps natural language utterances
into executable programs when only indirect supervision is available: examples
are labeled with the correct execution result, but not the program itself.
Consequently, we must search the space of programs for those that output the
correct result, while not being misled by spurious programs: incorrect programs
that coincidentally output the correct result. We connect two common learning
paradigms, reinforcement learning (RL) and maximum marginal likelihood (MML),
and then present a new learning algorithm that combines the strengths of both.
The new algorithm guards against spurious programs by combining the systematic
search traditionally employed in MML with the randomized exploration of RL, and
by updating parameters such that probability is spread more evenly across
consistent programs. We apply our learning algorithm to a new neural semantic
parser and show significant gains over existing state-of-the-art results on a
recent context-dependent semantic parsing task.Comment: Proceedings of the 55th Annual Meeting of the Association for
Computational Linguistics (2017
Frequency Value Grammar and Information Theory
I previously laid the groundwork for Frequency Value Grammar (FVG) in papers I submitted in the proceedings of the 4th International Conference on Cognitive Science (2003), Sydney Australia, and Corpus Linguistics Conference (2003), Lancaster, UK. FVG is a formal syntax theoretically based in large part on Information Theory principles. FVG relies on dynamic physical principles external to the corpus which shape and mould the corpus whereas generative grammar and other formal syntactic theories are based exclusively on patterns (fractals) found occurring within the well-formed portion of the corpus. However, FVG should not be confused with Probability Syntax, (PS), as described by Manning (2003). PS is a corpus based approach that will yield the probability distribution of possible syntax constructions over a fixed corpus. PS makes no distinction between well and ill formed sentence constructions and assumes everything found in the corpus is well formed. In contrast, FVG’s primary objective is to distinguish between well and ill formed sentence constructions and, in so doing, relies on corpus based parameters which determine sentence competency. In PS, a syntax of high probability will not necessarily yield a well formed sentence. However, in FVG, a syntax or sentence construction of high ‘frequency value’ will yield a well-formed sentence, at least, 95% of the time satisfying most empirical standards. Moreover, in FVG, a sentence construction of ‘high frequency value’ could very well be represented by an underlying syntactic construction of low probability as determined by PS. The characteristic ‘frequency values’ calculated in FVG are not measures of probability but rather are fundamentally determined values derived from exogenous principles which impact and determine corpus based parameters serving as an index of sentence competency. The theoretical framework of FVG has broad applications beyond that of formal syntax and NLP. In this paper, I will demonstrate how FVG can be used as a model for improving the upper bound calculation of entropy of written English. Generally speaking, when a function word precedes an open class word, the backward n-gram analysis will be homomorphic with the information source and will result in frequency values more representative of co-occurrences in the information source
Nonuniform Markov models
A statistical language model assigns probability to strings of arbitrary
length. Unfortunately, it is not possible to gather reliable statistics on
strings of arbitrary length from a finite corpus. Therefore, a statistical
language model must decide that each symbol in a string depends on at most a
small, finite number of other symbols in the string. In this report we propose
a new way to model conditional independence in Markov models. The central
feature of our nonuniform Markov model is that it makes predictions of varying
lengths using contexts of varying lengths. Experiments on the Wall Street
Journal reveal that the nonuniform model performs slightly better than the
classic interpolated Markov model. This result is somewhat remarkable because
both models contain identical numbers of parameters whose values are estimated
in a similar manner. The only difference between the two models is how they
combine the statistics of longer and shorter strings.
Keywords: nonuniform Markov model, interpolated Markov model, conditional
independence, statistical language model, discrete time series.Comment: 17 page
Architecture of a Web-based Predictive Editor for Controlled Natural Language Processing
In this paper, we describe the architecture of a web-based predictive text
editor being developed for the controlled natural language PENG^{ASP). This
controlled language can be used to write non-monotonic specifications that have
the same expressive power as Answer Set Programs. In order to support the
writing process of these specifications, the predictive text editor
communicates asynchronously with the controlled natural language processor that
generates lookahead categories and additional auxiliary information for the
author of a specification text. The text editor can display multiple sets of
lookahead categories simultaneously for different possible sentence
completions, anaphoric expressions, and supports the addition of new content
words to the lexicon
Human assessments of document similarity
Two studies are reported that examined the reliability of human assessments of document similarity and the association between human ratings and the results of n-gram automatic text analysis (ATA). Human interassessor reliability (IAR) was moderate to poor. However, correlations between average human ratings and n-gram solutions were strong. The average correlation between ATA and individual human solutions was greater than IAR. N-gram length influenced the strength of association, but optimum string length depended on the nature of the text (technical vs. nontechnical). We conclude that the methodology applied in previous studies may have led to overoptimistic views on human reliability, but that an optimal n-gram solution can provide a good approximation of the average human assessment of document similarity, a result that has important implications for future development of document visualization systems
Inferring the Origin Locations of Tweets with Quantitative Confidence
Social Internet content plays an increasingly critical role in many domains,
including public health, disaster management, and politics. However, its
utility is limited by missing geographic information; for example, fewer than
1.6% of Twitter messages (tweets) contain a geotag. We propose a scalable,
content-based approach to estimate the location of tweets using a novel yet
simple variant of gaussian mixture models. Further, because real-world
applications depend on quantified uncertainty for such estimates, we propose
novel metrics of accuracy, precision, and calibration, and we evaluate our
approach accordingly. Experiments on 13 million global, comprehensively
multi-lingual tweets show that our approach yields reliable, well-calibrated
results competitive with previous computationally intensive methods. We also
show that a relatively small number of training data are required for good
estimates (roughly 30,000 tweets) and models are quite time-invariant
(effective on tweets many weeks newer than the training set). Finally, we show
that toponyms and languages with small geographic footprint provide the most
useful location signals.Comment: 14 pages, 6 figures. Version 2: Move mathematics to appendix, 2 new
references, various other presentation improvements. Version 3: Various
presentation improvements, accepted at ACM CSCW 201
Applied Type System: An Approach to Practical Programming with Theorem-Proving
The framework Pure Type System (PTS) offers a simple and general approach to
designing and formalizing type systems. However, in the presence of dependent
types, there often exist certain acute problems that make it difficult for PTS
to directly accommodate many common realistic programming features such as
general recursion, recursive types, effects (e.g., exceptions, references,
input/output), etc. In this paper, Applied Type System (ATS) is presented as a
framework for designing and formalizing type systems in support of practical
programming with advanced types (including dependent types). In particular, it
is demonstrated that ATS can readily accommodate a paradigm referred to as
programming with theorem-proving (PwTP) in which programs and proofs are
constructed in a syntactically intertwined manner, yielding a practical
approach to internalizing constraint-solving needed during type-checking. The
key salient feature of ATS lies in a complete separation between statics, where
types are formed and reasoned about, and dynamics, where programs are
constructed and evaluated. With this separation, it is no longer possible for a
program to occur in a type as is otherwise allowed in PTS. The paper contains
not only a formal development of ATS but also some examples taken from
ats-lang.org, a programming language with a type system rooted in ATS, in
support of employing ATS as a framework to formulate advanced type systems for
practical programming
- …