2,160 research outputs found
A declarative characterization of different types of multicomponent tree adjoining grammars
Multicomponent Tree Adjoining Grammars (MCTAGs) are a formalism that has been shown to be useful for many natural language applications. The definition of non-local MCTAG however is problematic since it refers to the process of the derivation itself: a simultaneity constraint must be respected concerning the way the members of the elementary tree sets are added. Looking only at the result of a derivation (i.e., the derived tree and the derivation tree), this simultaneity is no longer visible and therefore cannot be checked. I.e., this way of characterizing MCTAG does not allow to abstract away from the concrete order of derivation. In this paper, we propose an alternative definition of MCTAG that characterizes the trees in the tree language of an MCTAG via the properties of the derivation trees (in the underlying TAG) the MCTAG licences. We provide similar characterizations for various types of MCTAG. These characterizations give a better understanding of the formalisms, they allow a more systematic comparison of different types of MCTAG, and, furthermore, they can be exploited for parsing
On Hilberg's Law and Its Links with Guiraud's Law
Hilberg (1990) supposed that finite-order excess entropy of a random human
text is proportional to the square root of the text length. Assuming that
Hilberg's hypothesis is true, we derive Guiraud's law, which states that the
number of word types in a text is greater than proportional to the square root
of the text length. Our derivation is based on some mathematical conjecture in
coding theory and on several experiments suggesting that words can be defined
approximately as the nonterminals of the shortest context-free grammar for the
text. Such operational definition of words can be applied even to texts
deprived of spaces, which do not allow for Mandelbrot's ``intermittent
silence'' explanation of Zipf's and Guiraud's laws. In contrast to
Mandelbrot's, our model assumes some probabilistic long-memory effects in human
narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic
Aperiodicity, Star-freeness, and First-order Definability of Structured Context-Free Languages
A classic result in formal language theory is the equivalence among
noncounting, or aperiodic, regular languages, and languages defined through
star-free regular expressions, or first-order logic. Together with first-order
completeness of linear temporal logic these results constitute a theoretical
foundation for model-checking algorithms. Extending these results to structured
subclasses of context-free languages, such as tree-languages did not work as
smoothly: for instance W. Thomas showed that there are star-free tree languages
that are counting. We show, instead, that investigating the same properties
within the family of operator precedence languages leads to equivalences that
perfectly match those on regular languages. The study of this old family of
context-free languages has been recently resumed to enhance not only parsing
(the original motivation of its inventor R. Floyd) but also to exploit their
algebraic and logic properties. We have been able to reproduce the classic
results of regular languages for this much larger class by going back to string
languages rather than tree languages. Since operator precedence languages
strictly include other classes of structured languages such as visibly pushdown
languages, the same results given in this paper hold as trivial corollary for
that family too
- âŚ