2,058 research outputs found
Focus in Gur and Kwa
The project investigates focus phenomena in the two genetically relatedWest African Gur and Kwa language groups of the Niger-Congo phylum. Most of its members are tone languages, they are similar with respect to word order typology (all are SVO languages), but of divergent morphological type (agglutinating Gur versus isolating Kwa)
Applying Tools and Techniques of Natural Language Processing to the Creation of Resources for Less Commonly Taught Languages
This paper proposes that research results from the area of naturallanguage processing could effectively be applied to creating softwareto facilitate the development oflanguage learning materials foranynaturallanguage. We will suggest that a knowledge-elicitationsystem called Boas, which was originally created to support amachine-translation application, could be modified to supportlanguage-learning ends. Boas leads a speaker of any natural Ianguage,who is not necessarily trained in linguistics, through a seriesof pedagogically-supported questionnaires, the responses to whichconstitute a" profile" of the language. This profile includes morphological,lexical and syntactic information. Once this structuredprofile is created, it can feed into virtually any type of system,including one to support language learning. Creating languagelearningsoftware using a system like this would be efficient in twoways: first, it would exploit extant cutting-edge research and technologiesin naturallanguage processin~ and second, it would permita single tool to be used for all languages, including less commonlytaught ones, for which limited funding for resource development isa bottleneck
Recommended from our members
Comments on Allan Bomhard, “The Origins of Proto-Indo-European: The Caucasian substrate hypothesis”
The main claims of Bomhard's paper are that PIE originated in Central Asia, which accounts for its Eurasiatic properties such as resemblant pronouns (Uralic, IE, Kartvelian, Turkic, Mongolic, Tungusic) and originally agglutinating morphology; then it moved by migration to the western steppe, where profound influence of a North Caucasian language or languages (chiefly West Caucasian) reshaped its sound system, aspects of its morphology, and its lexicon. The work is carefully done, with a large and systematic lexical survey, consideration of archaeological evidence, attention to evidence of contacts and migration, and extensive bibliography. PIE does indeed seem to have a curious typological mix of southwestern and north-central Eurasian traits. I have questions, however, about aspects of the linguistic geography, the Caucasian contacts, and the number and type of lexical resemblances
On the Typology of Inflection Class Systems
Inflectional classes are a property of the ideal inflecting-fusional language type. Thus strongly inflecting languages have the most complex vertical and horizontal stratification of hierarchical tree structures. Weakly inflecting languages which also approach the ideal isolating type or languages which also approach the agglutinating type have much shallower structures. Such properties follow from principles of Natural Morphology and from the distinction of the descendent hierarchy of macroclasses, classes, subclasses, subsubclasses etc. and homogeneous microclasses. The main languages of illustration are Latin, Lithuanian, Russian, German, French, Finnish, Hungarian and Turkis
Subject focus in West African languages : International Conference on Information Structure 6-8 June, 2006, University of Potsdam
Polysynthetic Tendencies in Modern Greek
The aim of this paper is to provide a more accurate typological classification of Modern Greek. The verb in MG shows many polysynthetic traits, such as noun and adverb incorporation into the verbal complex, a large inventory of bound morphemes, pronominal marking of objects, many potential slots before the verbal head, nonconfigurational syntax, etc. On the basis of these traits, MG has similarities with polysynthetic languages such as Abkhaz, Cayuga, Chukchi, Mohawk, Nahuatl, a.o. I will show that the abundance of similar patterns between MG and polysynthesis point to the evolution of a new system away from the traditional dependent-marking strategy and simple synthesis towards head-marking and polysynthesis. Finally, I will point to the risk of undertaking a direct comparison of different language systems by discussing the pronominal head-marking strategies in MG and the North American languages
The morphology-phonology interface: Isolating to polysynthetic languages
Given the substantial variation in the nature of the grammatical word (GW) across languages, this paper addresses the question of whether the Phonological Word (PW) exhibits the same degree of variation or rather abstracts away from it due to the typically flatter nature of the phonological hierarchy. Various types of languages are examined, focusing on isolating and polysynthetic languages—opposite ends of a word structure continuum. It is demonstrated that, indeed, the PW exhibits substantially less variation across languages than might be expected on the basis of the differences in GW structure. Furthermore, it is shown that an additional constituent (i.e., the Clitic Group, renamed Composite Group) is required between the PW and the Phonological Phrase to fully account for the interface between morpho-syntactic and phonological structures
Kannada named entity recognition and classification (nerc) based on multinomial na\"ive bayes (mnb) classifier
Named Entity Recognition and Classification (NERC) is a process of
identification of proper nouns in the text and classification of those nouns
into certain predefined categories like person name, location, organization,
date, and time etc. NERC in Kannada is an essential and challenging task. The
aim of this work is to develop a novel model for NERC, based on Multinomial
Na\"ive Bayes (MNB) Classifier. The Methodology adopted in this paper is based
on feature extraction of training corpus, by using term frequency, inverse
document frequency and fitting them to a tf-idf-vectorizer. The paper discusses
the various issues in developing the proposed model. The details of
implementation and performance evaluation are discussed. The experiments are
conducted on a training corpus of size 95,170 tokens and test corpus of 5,000
tokens. It is observed that the model works with Precision, Recall and
F1-measure of 83%, 79% and 81% respectively.Comment: 14 pages, 3 figures, International Journal on Natural Language
Computing (IJNLC) Vol. 4, No.4, August 201
- …
