163,780 research outputs found
Principles and Parameters: a coding theory perspective
We propose an approach to Longobardi's parametric comparison method (PCM) via
the theory of error-correcting codes. One associates to a collection of
languages to be analyzed with the PCM a binary (or ternary) code with one code
words for each language in the family and each word consisting of the binary
values of the syntactic parameters of the language, with the ternary case
allowing for an additional parameter state that takes into account phenomena of
entailment of parameters. The code parameters of the resulting code can be
compared with some classical bounds in coding theory: the asymptotic bound, the
Gilbert-Varshamov bound, etc. The position of the code parameters with respect
to some of these bounds provides quantitative information on the variability of
syntactic parameters within and across historical-linguistic families. While
computations carried out for languages belonging to the same family yield codes
below the GV curve, comparisons across different historical families can give
examples of isolated codes lying above the asymptotic bound.Comment: 11 pages, LaTe
Multi-Head Finite Automata: Characterizations, Concepts and Open Problems
Multi-head finite automata were introduced in (Rabin, 1964) and (Rosenberg,
1966). Since that time, a vast literature on computational and descriptional
complexity issues on multi-head finite automata documenting the importance of
these devices has been developed. Although multi-head finite automata are a
simple concept, their computational behavior can be already very complex and
leads to undecidable or even non-semi-decidable problems on these devices such
as, for example, emptiness, finiteness, universality, equivalence, etc. These
strong negative results trigger the study of subclasses and alternative
characterizations of multi-head finite automata for a better understanding of
the nature of non-recursive trade-offs and, thus, the borderline between
decidable and undecidable problems. In the present paper, we tour a fragment of
this literature
Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization
In Automatic Text Summarization, preprocessing is an important phase to
reduce the space of textual representation. Classically, stemming and
lemmatization have been widely used for normalizing words. However, even using
normalization on large texts, the curse of dimensionality can disturb the
performance of summarizers. This paper describes a new method for normalization
of words to further reduce the space of representation. We propose to reduce
each word to its initial letters, as a form of Ultra-stemming. The results show
that Ultra-stemming not only preserve the content of summaries produced by this
representation, but often the performances of the systems can be dramatically
improved. Summaries on trilingual corpora were evaluated automatically with
Fresa. Results confirm an increase in the performance, regardless of summarizer
system used.Comment: 22 pages, 12 figures, 9 table
The "handedness" of language: Directional symmetry breaking of sign usage in words
Language, which allows complex ideas to be communicated through symbolic
sequences, is a characteristic feature of our species and manifested in a
multitude of forms. Using large written corpora for many different languages
and scripts, we show that the occurrence probability distributions of signs at
the left and right ends of words have a distinct heterogeneous nature.
Characterizing this asymmetry using quantitative inequality measures, viz.
information entropy and the Gini index, we show that the beginning of a word is
less restrictive in sign usage than the end. This property is not simply
attributable to the use of common affixes as it is seen even when only word
roots are considered. We use the existence of this asymmetry to infer the
direction of writing in undeciphered inscriptions that agrees with the
archaeological evidence. Unlike traditional investigations of phonotactic
constraints which focus on language-specific patterns, our study reveals a
property valid across languages and writing systems. As both language and
writing are unique aspects of our species, this universal signature may reflect
an innate feature of the human cognitive phenomenon.Comment: 10 pages, 4 figures + Supplementary Information (15 pages, 8
figures), final corrected versio
Language contact and language decay. Socio-political and linguistic perspectives
The present linguistic situation in Malta is a reflection of historical and political
permutations of the past. The simultaneous presence of two languages in Malta â
generally described as a bilingual situation, but which in fact includes a number of
features which can be defined more appropriately through diglossia â gives rise to a
context wherein language contact is extremely frequent: this occurs through both
inter- and intrasentential code-switching as well as through the constant integration of
foreign terms, mainly from Italian and English, into Maltese. Language policies in Malta are frequently caught in the midst of these dynamic diachronic and synchronic linguistic processes and often operate on two fronts: on the one hand internal changes inherent to the Maltese language must be taken into
consideration, on the other hand language use, characterized by the presence of both
English and Maltese, also must be accounted for.peer-reviewe
Cooperating Distributed Grammar Systems of Finite Index Working in Hybrid Modes
We study cooperating distributed grammar systems working in hybrid modes in
connection with the finite index restriction in two different ways: firstly, we
investigate cooperating distributed grammar systems working in hybrid modes
which characterize programmed grammars with the finite index restriction;
looking at the number of components of such systems, we obtain surprisingly
rich lattice structures for the inclusion relations between the corresponding
language families. Secondly, we impose the finite index restriction on
cooperating distributed grammar systems working in hybrid modes themselves,
which leads us to new characterizations of programmed grammars of finite index.Comment: In Proceedings AFL 2014, arXiv:1405.527
Challenging the state educational system in Western Siberia: taiga school by the Tiuitiakha River
Julkaistu versi
WARTEGâ FOOD SELLERSâ LANGUAGE ATTITUDES TOWARD TEGAL DIALECT OF JAVANESE LANGUAGE IN SEMARANG
This paper presents a sociolinguistics research on multilingual society which aims to describe
the language attitudes and language choice of the food sellers of Tegal food stalls (warteg) in
Semarang toward Tegal dialect of Javanese Language (TL). The language choice research was
also done to support the respondentsâ answer in language attitude questions. The data was
collected during June and July 2016 to warteg food sellers in Semarang as the respondents.
The questionnaires were assessed about their agreement or disagreement for 10 statements on
a five-point Likert type scale. The respondents were also being asked about the language used
to talk to others in their daily activities. Using mean score, Likert type formula and
Independent t test, the results indicated that the total 111 respondents still have positive
attitudes toward TL even though they live outside of Tegal area. They prefer to use TL than
other languages to talk to other Tegalese
- âŠ