51,974 research outputs found
Computational Analyses of Arabic Morphology
This paper demonstrates how a (multi-tape) two-level formalism can be used to
write two-level grammars for Arabic non-linear morphology using a high level,
but computationally tractable, notation. Three illustrative grammars are
provided based on CV-, moraic- and affixational analyses. These are
complemented by a proposal for handling the hitherto computationally untreated
problem of the broken plural. It will be shown that the best grammars for
describing Arabic non-linear morphology are moraic in the case of templatic
stems, and affixational in the case of a-templatic stems. The paper will
demonstrate how the broken plural can be derived under two-level theory via the
`implicit' derivation of the singular.Comment: to appear in Narayanan A., Ditters E. (eds). The Linguistic
Computation of Arabic. uuencoded, compressed .ps file, 27 page
Two-level grammars: Some interesting properties of van Wijngaarden grammars.
The van Wijngaarden grammars are two-level grammars that present many interesting properties. In the present article I elaborate on six of these properties, to wit, (i) their being constituted by two grammars, (ii) their ability to generate (possibly infinitely many) strict languages and their own metalanguage, (iii) their context-sensitivity, (iv) their high descriptive power, (v) their productivity, or the ability to generate an infinite number of production rules, and (vi) their equivalence with the unrestricted, or Type-0, Chomsky grammars
Two-level grammars: Some interesting properties of van Wijngaarden grammars.
The van Wijngaarden grammars are two-level grammars that present many interesting properties. In the present article I elaborate on six of these properties, to wit, (i) their being constituted by two grammars, (ii) their ability to generate (possibly infinitely many) strict languages and their own metalanguage, (iii) their context-sensitivity, (iv) their high descriptive power, (v) their productivity, or the ability to generate an infinite number of production rules, and (vi) their equivalence with the unrestricted, or Type-0, Chomsky grammars
Toric grammars: a new statistical approach to natural language modeling
We propose a new statistical model for computational linguistics. Rather than
trying to estimate directly the probability distribution of a random sentence
of the language, we define a Markov chain on finite sets of sentences with many
finite recurrent communicating classes and define our language model as the
invariant probability measures of the chain on each recurrent communicating
class. This Markov chain, that we call a communication model, recombines at
each step randomly the set of sentences forming its current state, using some
grammar rules. When the grammar rules are fixed and known in advance instead of
being estimated on the fly, we can prove supplementary mathematical properties.
In particular, we can prove in this case that all states are recurrent states,
so that the chain defines a partition of its state space into finite recurrent
communicating classes. We show that our approach is a decisive departure from
Markov models at the sentence level and discuss its relationships with Context
Free Grammars. Although the toric grammars we use are closely related to
Context Free Grammars, the way we generate the language from the grammar is
qualitatively different. Our communication model has two purposes. On the one
hand, it is used to define indirectly the probability distribution of a random
sentence of the language. On the other hand it can serve as a (crude) model of
language transmission from one speaker to another speaker through the
communication of a (large) set of sentences
- …