37,293 research outputs found
Efficient deep processing of japanese
We present a broad coverage Japanese grammar written in the HPSG formalism with MRS semantics. The grammar is created for use in real world applications, such that robustness and performance issues play an important role. It is connected to a POS tagging and word segmentation tool. This grammar is being developed in a multilingual context, requiring MRS structures that are easily comparable across languages
Modeling Global Syntactic Variation in English Using Dialect Classification
This paper evaluates global-scale dialect identification for 14 national
varieties of English as a means for studying syntactic variation. The paper
makes three main contributions: (i) introducing data-driven language mapping as
a method for selecting the inventory of national varieties to include in the
task; (ii) producing a large and dynamic set of syntactic features using
grammar induction rather than focusing on a few hand-selected features such as
function words; and (iii) comparing models across both web corpora and social
media corpora in order to measure the robustness of syntactic variation across
registers
Automatic acquisition of Spanish LFG resources from the Cast3LB treebank
In this paper, we describe the automatic annotation of the Cast3LB Treebank with LFG f-structures for the subsequent extraction of Spanish probabilistic grammar and lexical resources. We adapt the approach and methodology of Cahill et al. (2004), O’Donovan et al. (2004) and elsewhere for English to Spanish and the Cast3LB treebank encoding. We report on the quality and coverage of the automatic f-structure annotation. Following the pipeline and integrated models of Cahill et al. (2004), we extract wide-coverage
probabilistic LFG approximations and parse unseen Spanish text into f-structures. We also extend Bikel’s (2002) Multilingual Parse Engine to include a Spanish language module. Using the retrained Bikel parser in the pipeline model gives the best results against a manually constructed gold standard (73.20% predsonly f-score). We also extract Spanish lexical resources: 4090 semantic form types with 98 frame types. Subcategorised prepositions and particles are included in the frames
Translating and Evolving: Towards a Model of Language Change in DisCoCat
The categorical compositional distributional (DisCoCat) model of meaning
developed by Coecke et al. (2010) has been successful in modeling various
aspects of meaning. However, it fails to model the fact that language can
change. We give an approach to DisCoCat that allows us to represent language
models and translations between them, enabling us to describe translations from
one language to another, or changes within the same language. We unify the
product space representation given in (Coecke et al., 2010) and the functorial
description in (Kartsaklis et al., 2013), in a way that allows us to view a
language as a catalogue of meanings. We formalize the notion of a lexicon in
DisCoCat, and define a dictionary of meanings between two lexicons. All this is
done within the framework of monoidal categories. We give examples of how to
apply our methods, and give a concrete suggestion for compositional translation
in corpora.Comment: In Proceedings CAPNS 2018, arXiv:1811.0270
- …