Search CORE

2,985 research outputs found

An Abstract Machine for Unification Grammars

Author: Wintner Shuly
Publication venue
Publication date: 01/01/1997
Field of study

This work describes the design and implementation of an abstract machine, Amalia, for the linguistic formalism ALE, which is based on typed feature structures. This formalism is one of the most widely accepted in computational linguistics and has been used for designing grammars in various linguistic theories, most notably HPSG. Amalia is composed of data structures and a set of instructions, augmented by a compiler from the grammatical formalism to the abstract instructions, and a (portable) interpreter of the abstract instructions. The effect of each instruction is defined using a low-level language that can be executed on ordinary hardware. The advantages of the abstract machine approach are twofold. From a theoretical point of view, the abstract machine gives a well-defined operational semantics to the grammatical formalism. This ensures that grammars specified using our system are endowed with well defined meaning. It enables, for example, to formally verify the correctness of a compiler for HPSG, given an independent definition. From a practical point of view, Amalia is the first system that employs a direct compilation scheme for unification grammars that are based on typed feature structures. The use of amalia results in a much improved performance over existing systems. In order to test the machine on a realistic application, we have developed a small-scale, HPSG-based grammar for a fragment of the Hebrew language, using Amalia as the development platform. This is the first application of HPSG to a Semitic language.Comment: Doctoral Thesis, 96 pages, many postscript figures, uses pstricks, pst-node, psfig, fullname and a macros fil

arXiv.org e-Print Archive

CiteSeerX

Multiliteracy, past and present, in the Karaim communities

Author: Csató Éva Á.
Nathan David
Publication venue: SOAS: The Hans Rausing Endangered Languages Project
Publication date: 01/01/2007
Field of study

SOAS Research Online

Languages cool as they expand: Allometric scaling and the decreasing need for new words

Author: A Clauset
A Gnedin
A Vespignani
AA Tsonis
AL Barabási
AM Petersen
B Mandelbrot
B Podobnik
B Podobnik
D Fu
D Helbing
D Lazer
DC van Leijenhorst
E Alvarez-Lacalle
EA Altmann
EG Altmann
EG Altmann
GB West
GW Oehlert
HA Makse
HAJrJSA Makse
HD Rozenfeld
HD Rozenfeld
J Gao
J-B Michel
JA Evans
L Lü
LAN Amaral
LAN Amaral
LMA Bettencourt
M Batty
M Kleiber
M Markosova
M Riccaboni
M Sigman
M Steyvers
MA Montemurro
MEJ Newman
MHR Stanley
MÁ Serrano
R Ferrer i Cancho
R Ferrer i Cancho
R Ferrer i Cancho
RN Mantegna
S Bernhardsson
S Bernhardsson
S Karlin
SK Baek
X Gabaix
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 10/12/2012
Field of study

We analyze the occurrence frequencies of over 15 million words recorded in millions of books published during the past two centuries in seven different languages. For all languages and chronological subsets of the data we confirm that two scaling regimes characterize the word frequency distributions, with only the more common words obeying the classic Zipf law. Using corpora of unprecedented size, we test the allometric scaling relation between the corpus size and the vocabulary size of growing languages to demonstrate a decreasing marginal need for new words, a feature that is likely related to the underlying correlations between words. We calculate the annual growth fluctuations of word use which has a decreasing trend as the corpus size increases, indicating a slowdown in linguistic evolution following language expansion. This ‘‘cooling pattern’’ forms the basis of a third statistical regularity, which unlike the Zipf and the Heaps law, is dynamical in nature

arXiv.org e-Print Archive

Crossref

Boston University Institutional Repository (OpenBU)

Digital library of University of Maribor

PubMed Central

IMT Institutional Repository

The role of comparative/historical linguistics in reconstructing the past: what borrowed and inherited words tell us about the early history of Hausa

Author: Jaggar Philip J.
Publication venue: 'Brill'
Publication date
Field of study

SOAS Research Online

In the Beginning Was the Verb: The Emergence and Evolution of Language Problem in the Light of the Big Bang Epistemological Paradigm.

Author: Belaga Dr. Edward G.
Publication venue
Publication date: 01/01/2008
Field of study

The enigma of the Emergence of Natural Languages, coupled or not with the closely related problem of their Evolution is perceived today as one of the most important scientific problems. \ud The purpose of the present study is actually to outline such a solution to our problem which is epistemologically consonant with the Big Bang solution of the problem of the Emergence of the Universe}. Such an outline, however, becomes articulable, understandable, and workable only in a drastically extended epistemic and scientific oecumene, where known and habitual approaches to the problem, both theoretical and experimental, become distant, isolated, even if to some degree still hospitable conceptual and methodological islands. \ud The guiding light of our inquiry will be Eugene Paul Wigner's metaphor of ``the unreasonable effectiveness of mathematics in natural sciences'', i.e., the steadily evolving before our eyes, since at least XVIIth century, \ud ``the miracle of the appropriateness of the language of mathematics for the formulation of the laws of physics''. Kurt Goedel's incompleteness and undecidability theory will be our guardian discerner against logical fallacies of otherwise apparently plausible explanations. \ud John Bell's ``unspeakableness'' and the commonplace counterintuitive character of quantum phenomena will be our encouragers. And the radical novelty of the introduced here and adapted to our purposes Big Bang epistemological paradigm will be an appropriate, even if probably shocking response to our equally shocking discovery in the oldest among well preserved linguistic fossils of perfect mathematical structures outdoing the best artifactual Assemblers

PhilPapers

CiteSeerX

CogPrints Cognitive Sciences Eprint Archive

Onset-to-onset probability and gradient acceptability in Korean

Author: Koo Hahn
Publication venue: SJSU ScholarWorks
Publication date: 01/01/2007
Field of study

SJSU ScholarWorks

Linguistic constraints on statistical word segmentation: The role of consonants in Arabic and English

Author: Adriaans
Albright
Aronoff
Bat-El
Benavides-Varela
Berent
Berman
Berman
Bernstein-Ratner
Bertoncini
Bonatti
Borer
Brent
Cairns
Coetzee
Coetzee
Cutler
Daland
Doron
Eimas
Ferguson
Ferguson
Frisch
Frisch
Geman
Goldwater
Graff
Greenberg
Gwilliams
Habash
Hayes
Hochmann
Hochmann
Johnson
Jusczyk
Jusczyk
Kager
Kawahara
Keidel
Levy
Lignos
Mattys
McCarthy
McCarthy
McCarthy
McCarthy
Nespor
Newport
Newport
Peperkamp
Peters
Phillips
Phillips
Pierrehumbert
Polka
Ravid
Saffran
Swingley
Thiessen
Ussishkin
Werker
Publication venue: 'Wiley'
Publication date: 01/01/2018
Field of study

Statistical learning is often taken to lie at the heart of many cognitive tasks, including the acquisition of language. One particular task in which probabilistic models have achieved considerable success is the segmentation of speech into words. However, these models have mostly been tested against English data, and as a result little is known about how a statistical learning mechanism copes with input regularities that arise from the structural properties of different languages. This study focuses on statistical word segmentation in Arabic, a Semitic language in which words are built around consonantal roots. We hypothesize that segmentation in such languages is facilitated by tracking consonant distributions independently from intervening vowels. Previous studies have shown that human learners can track consonant probabilities across intervening vowels in artificial languages, but it is unknown to what extent this ability would be beneficial in the segmentation of natural language. We assessed the performance of a Bayesian segmentation model on English and Arabic, comparing consonant-only representations with full representations. In addition, we examined to what extent structurally different proto-lexicons reflect adult language. The results suggest that for a child learning a Semitic language, separating consonants from vowels is beneficial for segmentation. These findings indicate that probabilistic models require appropriate linguistic representations in order to effectively meet the challenges of language acquisition

Crossref

Edinburgh Research Explorer

Utrecht University Repository

The "handedness" of language: Directional symmetry breaking of sign usage in words

Author: Ashraf Md Izhar
Sinha Sitabhra
Publication venue
Publication date: 01/01/2018
Field of study

Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability distributions of signs at the left and right ends of words have a distinct heterogeneous nature. Characterizing this asymmetry using quantitative inequality measures, viz. information entropy and the Gini index, we show that the beginning of a word is less restrictive in sign usage than the end. This property is not simply attributable to the use of common affixes as it is seen even when only word roots are considered. We use the existence of this asymmetry to infer the direction of writing in undeciphered inscriptions that agrees with the archaeological evidence. Unlike traditional investigations of phonotactic constraints which focus on language-specific patterns, our study reveals a property valid across languages and writing systems. As both language and writing are unique aspects of our species, this universal signature may reflect an innate feature of the human cognitive phenomenon.Comment: 10 pages, 4 figures + Supplementary Information (15 pages, 8 figures), final corrected versio

arXiv.org e-Print Archive

Directory of Open Access Journals

Amalia -- A Unified Platform for Parsing and Generation

Author: Francez Nissim
Gabrilovich Evgeniy
Wintner Shuly
Publication venue
Publication date: 01/01/1997
Field of study

Contemporary linguistic theories (in particular, HPSG) are declarative in nature: they specify constraints on permissible structures, not how such structures are to be computed. Grammars designed under such theories are, therefore, suitable for both parsing and generation. However, practical implementations of such theories don't usually support bidirectional processing of grammars. We present a grammar development system that includes a compiler of grammars (for parsing and generation) to abstract machine instructions, and an interpreter for the abstract machine language. The generation compiler inverts input grammars (designed for parsing) to a form more suitable for generation. The compiled grammars are then executed by the interpreter using one control strategy, regardless of whether the grammar is the original or the inverted version. We thus obtain a unified, efficient platform for developing reversible grammars.Comment: 8 pages postscrip

arXiv.org e-Print Archive

CiteSeerX