240 research outputs found
Formulaic language
The notion of formulaicity has received increasing attention in disciplines and areas as diverse as linguistics, literary studies, art theory and art history. In recent years, linguistic studies of formulaicity have been flourishing and the very notion of formulaicity has been approached from various methodological and theoretical perspectives and with various purposes in mind. The linguistic approach to formulaicity is still in a state of rapid development and the objective of the current volume is to present the current explorations in the field. Papers collected in the volume make numerous suggestions for further development of the field and they are arranged into three complementary parts. The first part, with three chapters, presents new theoretical and methodological insights as well as their practical application in the development of custom-designed software tools for identification and exploration of formulaic language in texts. Two papers in the second part explore formulaic language in the context of language learning. Finally, the third part, with three chapters, showcases descriptive research on formulaic language conducted primarily from the perspectives of corpus linguistics and translation studies. The volume will be of interest to anyone involved in the study of formulaic language either from a theoretical or a practical perspective
Semi-supervised lexical acquisition for wide-coverage parsing
State-of-the-art parsers suffer from incomplete lexicons, as evidenced by the fact
that they all contain built-in methods for dealing with out-of-lexicon items at parse
time. Since new labelled data is expensive to produce and no amount of it will conquer
the long tail, we attempt to address this problem by leveraging the enormous amount of
raw text available for free, and expanding the lexicon offline, with a semi-supervised
word learner. We accomplish this with a method similar to self-training, where a fully
trained parser is used to generate new parses with which the next generation of parser
is trained.
This thesis introduces Chart Inference (CI), a two-phase word-learning method
with Combinatory Categorial Grammar (CCG), operating on the level of the partial
parse as produced by a trained parser. CI uses the parsing model and lexicon to identify
the CCG category type for one unknown word in a context of known words by inferring
the type of the sentence using a model of end punctuation, then traversing the chart
from the top down, filling in each empty cell as a function of its mother and its sister.
We first specify the CI algorithm, and then compare it to two baseline wordlearning
systems over a battery of learning tasks. CI is shown to outperform the
baselines in every task, and to function in a number of applications, including grammar
acquisition and domain adaptation. This method performs consistently better than
self-training, and improves upon the standard POS-backoff strategy employed by the
baseline StatCCG parser by adding new entries to the lexicon.
The first learning task establishes lexical convergence over a toy corpus, showing
that CI’s ability to accurately model a target lexicon is more robust to initial conditions
than either of the baseline methods. We then introduce a novel natural language corpus
based on children’s educational materials, which is fully annotated with CCG derivations.
We use this corpus as a testbed to establish that CI is capable in principle of
recovering the whole range of category types necessary for a wide-coverage lexicon.
The complexity of the learning task is then increased, using the CCGbank corpus,
a version of the Penn Treebank, and showing that CI improves as its initial seed corpus
is increased. The next experiment uses CCGbank as the seed and attempts to recover
missing question-type categories in the TREC question answering corpus. The final
task extends the coverage of the CCGbank-trained parser by running CI over the raw
text of the Gigaword corpus. Where appropriate, a fine-grained error analysis is also
undertaken to supplement the quantitative evaluation of the parser performance with
deeper reasoning as to the linguistic points of the lexicon and parsing model
The Meaning of Syntactic Dependencies
This paper discusses the semantic content of syntactic dependencies. We assume that syntactic dependencies play a central role in the process of semantic interpretation. They are defined as selective functions on word denotations. Among their properties, special attention will be paid to their ability to make interpretation co-compositional and incremental. To describe the semantic properties of dependencies, the paper will be focused on two particular linguistic tasks: word sense disambiguation and attachment resolution. The second task will be performed using a strategy based on automatic acquisition from corpora
- …