Search CORE

5,107 research outputs found

On Hilberg's Law and Its Links with Guiraud's Law

Author: Altmann G.
Belevitch V.
Bell T. C.
Billingsley P.
Bod R.
De Marcken C. G.
Dębowski Ł.
Dębowski Ł.
Dębowski Ł.
Dębowski Ł.
Guiraud H.
Hoffmann L.
Jelinek F.
Kallenberg O.
Kornai A.
Lehman E.
Lehman E.
Li M.
Li W.
Mandelbrot B.
Mandelbrot B.
Manning C. D.
Megyesi B.
Menzerath P.
Montemurro M. A.
Nevill-Manning C.
Pareto V.
Petrova N. V.
Shalizi C. R.
Shannon C.
Upper D. R.
Wolff J. G.
Zipf G. K.
Zipf G. K.
Łukasz De¸bowski
Publication venue: 'Informa UK Limited'
Publication date: 07/07/2005
Field of study

Hilberg (1990) supposed that finite-order excess entropy of a random human text is proportional to the square root of the text length. Assuming that Hilberg's hypothesis is true, we derive Guiraud's law, which states that the number of word types in a text is greater than proportional to the square root of the text length. Our derivation is based on some mathematical conjecture in coding theory and on several experiments suggesting that words can be defined approximately as the nonterminals of the shortest context-free grammar for the text. Such operational definition of words can be applied even to texts deprived of spaces, which do not allow for Mandelbrot's ``intermittent silence'' explanation of Zipf's and Guiraud's laws. In contrast to Mandelbrot's, our model assumes some probabilistic long-memory effects in human narration and might be capable of explaining Menzerath's law.Comment: To appear in Journal of Quantitative Linguistic

arXiv.org e-Print Archive

Crossref

Electronic Dictionaries and Transducers for Automatic Processing of the Albanian Language

Author: Lagji Klara
Pernaska Remzi
Piton Odile
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2007
Field of study

International audienceWe intend on developing electronic dictionaries and Finite State Transducers for the automatic processing of the Albanian Language. We describe some peculiarities of this language and we explain how FST and generally speaking NooJ's graphs enable to treat them. We point out agglutinated words, mixed words or ‘XY' words that are not (or cannot be) listed into dictionaries and we use FST for their dynamic treatment. We take into consideration the problem of unknown words in a lately reformed language and the evolving of features in the dictionaries

HAL-Paris1

Recommended from our members

What can mathematical, computational and robotic models tell us about the origins of syntax?

Author: Baronchelli A.
Briscoe E.
Christiansen M. H.
Griffiths T.
Jager G.
Jäger H.
Kirby S.
Komarova N.
Richerson P. J.
Steels L.
Triesch J.
Publication venue: 'MIT Press - Journals'
Publication date: 11/09/2009
Field of study

City Research Online

Acquiring and Maintaining Knowledge by Natural Multimodal Dialog

Author: Holzapfel Hartwig
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2009
Field of study

KITopen

Finding structure in language

Author: Finch Steven
Publication venue: The University of Edinburgh
Publication date: 01/01/1995
Field of study

Since the Chomskian revolution, it has become apparent that natural language is richly structured, being naturally represented hierarchically, and requiring complex context sensitive rules to define regularities over these representations. It is widely assumed that the richness of the posited structure has strong nativist implications for mechanisms which might learn natural language, since it seemed unlikely that such structures could be derived directly from the observation of linguistic data (Chomsky 1965).This thesis investigates the hypothesis that simple statistics of a large, noisy, unlabelled corpus of natural language can be exploited to discover some of the structure which exists in natural language automatically. The strategy is to initially assume no knowledge of the structures present in natural language, save that they might be found by analysing statistical regularities which pertain between a word and the words which typically surround it in the corpus.To achieve this, various statistical methods are applied to define similarity between statistical distributions, and to infer a structure for a domain given knowledge of the similarities which pertain within it. Using these tools, it is shown that it is possible to form a hierarchical classification of many domains, including words in natural language. When this is done, it is shown that all the major syntactic categories can be obtained, and the classification is both relatively complete, and very much in accord with a standard linguistic conception of how words are classified in natural language.Once this has been done, the categorisation derived is used as the basis of a similar classification of short sequences of words. If these are analysed in a similar way, then several syntactic categories can be derived. These include simple noun phrases, various tensed forms of verbs, and simple prepositional phrases. Once this has been done, the same technique can be applied one level higher, and at this level simple sentences and verb phrases, as well as more complicated noun phrases and prepositional phrases, are shown to be derivable

Edinburgh Research Archive

16th International NooJ 2022 Conference: Book of Abstracts

Author: Reyes Silvia Susana
Rodrigo Andrea
Silberztein Max
Tramallino Carolina
Publication venue: 'Universidad Nacional de Rosario'
Publication date: 28/11/2022
Field of study

Libro de resúmenes presentados en la "16th International NooJ 2022 Conference", de modalidad híbrida, realizada en el ECU (Espacio Cultural Universitario, UNR) en Rosario, Santa Fe, Argentina, entre el 14 y 15 de junio de 2022.Fil: Reyes, Silvia Susana. Universidad Nacional de Rosario. Facultad de Humanidades y Artes; Argentin

Repositorio Hipermedial de la Universidad Nacional de Rosario

Review of Susan Brown (1999) Sentential negation in Russian. (Stanford: Center for the Study of Language and Information.)

Author: Rowlett PA
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/01/2001
Field of study

University of Salford Institutional Repository

Crossref

Entry Generation for New Words by Analogy for Morphological Lexicons

Author: Linden Krister
Publication venue
Publication date: 01/01/2009
Field of study

Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Natural language processing

Author: Adams
Amsler
Bangalore
Barker
Benoît
Bian
Bondale
Carrick
Ceric
Chandrasekar
Chang
Charniak
Chen
Chowdhury
Chowdhury
Costantino
Cowie
Craven
Craven
Craven
Dogru
Evans
Feldman
Fernandez
Gaizauskas
Glasgow
Haas
Hayes
Hayes
Hedlund
Herath
Ide
Isahara
Jelinek
Jeong
Jurafsky
Kazakov
Kehler
Khoo
Kim
King
Lange
Lee
Lehmam
Lehtokangas
Lewis
Liddy
Liddy
Lovis
Ma
Magnini
Mani
Manning
Marquez
Martinez
Martinez
McMurchie
Meyer
Mihalcea
Mock
Moens
Morin
Narita
Nerbonne
Oard
Ogura
Oudet
Owei
Paris
Pasero
Pedersen
Perez-Carballo
Petreley
Pirkola
Poesio
Rosenfield
Roux
Say
Scarlett
Schenker
Silber
Smeaton
Smeaton
Smith
Sokol
Song
Sparck Jones
Staab
Stock
Tolle
Trybula
Tsuda
Vickery
Waldrop
Warner
Weigard
Wilks
Wong
Yang
Yang
Zadrozny
Zweigenbaum
Publication venue: 'Wiley'
Publication date: 01/01/2003
Field of study

Beginning with the basic issues of NLP, this chapter aims to chart the major research activities in this area since the last ARIST Chapter in 1996 (Haas, 1996), including: (i) natural language text processing systems - text summarization, information extraction, information retrieval, etc., including domain-specific applications; (ii) natural language interfaces; (iii) NLP in the context of www and digital libraries ; and (iv) evaluation of NLP systems

Crossref

University of Strathclyde Institutional Repository

OPUS - University of Technology Sydney