163,780 research outputs found

    Concepts of structural underspecification in Bantu and Romance

    Get PDF

    Principles and Parameters: a coding theory perspective

    Get PDF
    We propose an approach to Longobardi's parametric comparison method (PCM) via the theory of error-correcting codes. One associates to a collection of languages to be analyzed with the PCM a binary (or ternary) code with one code words for each language in the family and each word consisting of the binary values of the syntactic parameters of the language, with the ternary case allowing for an additional parameter state that takes into account phenomena of entailment of parameters. The code parameters of the resulting code can be compared with some classical bounds in coding theory: the asymptotic bound, the Gilbert-Varshamov bound, etc. The position of the code parameters with respect to some of these bounds provides quantitative information on the variability of syntactic parameters within and across historical-linguistic families. While computations carried out for languages belonging to the same family yield codes below the GV curve, comparisons across different historical families can give examples of isolated codes lying above the asymptotic bound.Comment: 11 pages, LaTe

    Multi-Head Finite Automata: Characterizations, Concepts and Open Problems

    Full text link
    Multi-head finite automata were introduced in (Rabin, 1964) and (Rosenberg, 1966). Since that time, a vast literature on computational and descriptional complexity issues on multi-head finite automata documenting the importance of these devices has been developed. Although multi-head finite automata are a simple concept, their computational behavior can be already very complex and leads to undecidable or even non-semi-decidable problems on these devices such as, for example, emptiness, finiteness, universality, equivalence, etc. These strong negative results trigger the study of subclasses and alternative characterizations of multi-head finite automata for a better understanding of the nature of non-recursive trade-offs and, thus, the borderline between decidable and undecidable problems. In the present paper, we tour a fragment of this literature

    Beyond Stemming and Lemmatization: Ultra-stemming to Improve Automatic Text Summarization

    Full text link
    In Automatic Text Summarization, preprocessing is an important phase to reduce the space of textual representation. Classically, stemming and lemmatization have been widely used for normalizing words. However, even using normalization on large texts, the curse of dimensionality can disturb the performance of summarizers. This paper describes a new method for normalization of words to further reduce the space of representation. We propose to reduce each word to its initial letters, as a form of Ultra-stemming. The results show that Ultra-stemming not only preserve the content of summaries produced by this representation, but often the performances of the systems can be dramatically improved. Summaries on trilingual corpora were evaluated automatically with Fresa. Results confirm an increase in the performance, regardless of summarizer system used.Comment: 22 pages, 12 figures, 9 table

    The "handedness" of language: Directional symmetry breaking of sign usage in words

    Full text link
    Language, which allows complex ideas to be communicated through symbolic sequences, is a characteristic feature of our species and manifested in a multitude of forms. Using large written corpora for many different languages and scripts, we show that the occurrence probability distributions of signs at the left and right ends of words have a distinct heterogeneous nature. Characterizing this asymmetry using quantitative inequality measures, viz. information entropy and the Gini index, we show that the beginning of a word is less restrictive in sign usage than the end. This property is not simply attributable to the use of common affixes as it is seen even when only word roots are considered. We use the existence of this asymmetry to infer the direction of writing in undeciphered inscriptions that agrees with the archaeological evidence. Unlike traditional investigations of phonotactic constraints which focus on language-specific patterns, our study reveals a property valid across languages and writing systems. As both language and writing are unique aspects of our species, this universal signature may reflect an innate feature of the human cognitive phenomenon.Comment: 10 pages, 4 figures + Supplementary Information (15 pages, 8 figures), final corrected versio

    Language contact and language decay. Socio-political and linguistic perspectives

    Get PDF
    The present linguistic situation in Malta is a reflection of historical and political permutations of the past. The simultaneous presence of two languages in Malta – generally described as a bilingual situation, but which in fact includes a number of features which can be defined more appropriately through diglossia – gives rise to a context wherein language contact is extremely frequent: this occurs through both inter- and intrasentential code-switching as well as through the constant integration of foreign terms, mainly from Italian and English, into Maltese. Language policies in Malta are frequently caught in the midst of these dynamic diachronic and synchronic linguistic processes and often operate on two fronts: on the one hand internal changes inherent to the Maltese language must be taken into consideration, on the other hand language use, characterized by the presence of both English and Maltese, also must be accounted for.peer-reviewe

    Cooperating Distributed Grammar Systems of Finite Index Working in Hybrid Modes

    Full text link
    We study cooperating distributed grammar systems working in hybrid modes in connection with the finite index restriction in two different ways: firstly, we investigate cooperating distributed grammar systems working in hybrid modes which characterize programmed grammars with the finite index restriction; looking at the number of components of such systems, we obtain surprisingly rich lattice structures for the inclusion relations between the corresponding language families. Secondly, we impose the finite index restriction on cooperating distributed grammar systems working in hybrid modes themselves, which leads us to new characterizations of programmed grammars of finite index.Comment: In Proceedings AFL 2014, arXiv:1405.527

    WARTEG’ FOOD SELLERS’ LANGUAGE ATTITUDES TOWARD TEGAL DIALECT OF JAVANESE LANGUAGE IN SEMARANG

    Get PDF
    This paper presents a sociolinguistics research on multilingual society which aims to describe the language attitudes and language choice of the food sellers of Tegal food stalls (warteg) in Semarang toward Tegal dialect of Javanese Language (TL). The language choice research was also done to support the respondents’ answer in language attitude questions. The data was collected during June and July 2016 to warteg food sellers in Semarang as the respondents. The questionnaires were assessed about their agreement or disagreement for 10 statements on a five-point Likert type scale. The respondents were also being asked about the language used to talk to others in their daily activities. Using mean score, Likert type formula and Independent t test, the results indicated that the total 111 respondents still have positive attitudes toward TL even though they live outside of Tegal area. They prefer to use TL than other languages to talk to other Tegalese
    • 

    corecore