Search CORE

180 research outputs found

On Hilberg's Law and Its Links with Guiraud's Law

Author: Altmann G.
Belevitch V.
Bell T. C.
Billingsley P.
Bod R.
De Marcken C. G.
Dębowski Ł.
Dębowski Ł.
Dębowski Ł.
Dębowski Ł.
Guiraud H.
Hoffmann L.
Jelinek F.
Kallenberg O.
Kornai A.
Lehman E.
Lehman E.
Li M.
Li W.
Mandelbrot B.
Mandelbrot B.
Manning C. D.
Megyesi B.
Menzerath P.
Montemurro M. A.
Nevill-Manning C.
Pareto V.
Petrova N. V.
Shalizi C. R.
Shannon C.
Upper D. R.
Wolff J. G.
Zipf G. K.
Zipf G. K.
Łukasz De¸bowski
Publication venue: 'Informa UK Limited'
Publication date: 07/07/2005
Field of study

Hilberg (1990) supposed that finite-order excess entropy of a random human text is proportional to the square root of the text length. Assuming that Hilberg's hypothesis is true, we derive Guiraud's law, which states that the number of word types in a text is greater than proportional to the square root of the text length. Our derivation is based on some mathematical conjecture in coding theory and on several experiments suggesting that words can be defined approximately as the nonterminals of the shortest context-free grammar for the text. Such operational definition of words can be applied even to texts deprived of spaces, which do not allow for Mandelbrot's ``intermittent silence'' explanation of Zipf's and Guiraud's laws. In contrast to Mandelbrot's, our model assumes some probabilistic long-memory effects in human narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic

arXiv.org e-Print Archive

Crossref

E-Generalization Using Grammars

Author: Burghardt Jochen
Publication venue
Publication date: 21/03/2017
Field of study

We extend the notion of anti-unification to cover equational theories and present a method based on regular tree grammars to compute a finite representation of E-generalization sets. We present a framework to combine Inductive Logic Programming and E-generalization that includes an extension of Plotkin's lgg theorem to the equational case. We demonstrate the potential power of E-generalization by three example applications: computation of suggestions for auxiliary lemmas in equational inductive proofs, computation of construction laws for given term sequences, and learning of screen editor command sequences.Comment: 49 pages, 16 figures, author address given in header is meanwhile outdated, full version of an article in the "Artificial Intelligence Journal", appeared as technical report in 2003. An open-source C implementation and some examples are found at the Ancillary file

arXiv.org e-Print Archive

CiteSeerX

On Language Processors and Software Maintenance

Author: Lohmann Wolfgang (gnd: 138536171)
Publication venue: Universität Rostock
Publication date: 01/01/2009
Field of study

This work investigates declarative transformation tools in the context of software maintenance. Besides maintenance of the language specification, evolution of a software language requires the adaptation of the software written in that language as well as the adaptation of the software that transforms software written in the evolving language. This co-evolution is studied to derive automatic adaptations of artefacts from adaptations of the language specification. Furthermore, AOP for Prolog is introduced to improve maintainability of language specifications and derived tools.Die Arbeit unterstützt deklarative Transformationswerkzeuge im Kontext der Softwarewartung. Neben der Wartung der Sprachbeschreibung erfordert die Evolution einer Sprache sowohl die Anpassung der Software, die in dieser Sprache geschrieben ist als auch die Anpassung der Software, die diese Software transformiert. Diese Koevolution wird untersucht, um automatische Anpassungen von Artefakten von Anpassungen der Sprachbeschreibungen abzuleiten. Weiterhin wird AOP für Prolog eingeführt, um die Wartbarkeit von Sprachbeschreibungen und den daraus abgeleiteten Werkzeugen zu erhöhen

Rostocker Dokumentenserver

Parsing Inside-Out

Author: Goodman Joshua
Publication venue
Publication date: 01/01/1998
Field of study

The inside-outside probabilities are typically used for reestimating Probabilistic Context Free Grammars (PCFGs), just as the forward-backward probabilities are typically used for reestimating HMMs. I show several novel uses, including improving parser accuracy by matching parsing algorithms to evaluation criteria; speeding up DOP parsing by 500 times; and 30 times faster PCFG thresholding at a given accuracy level. I also give an elegant, state-of-the-art grammar formalism, which can be used to compute inside-outside probabilities; and a parser description formalism, which makes it easy to derive inside-outside formulas and many others.Comment: Ph.D. Thesis, 257 pages, 40 postscript figure

arXiv.org e-Print Archive

CiteSeerX