53 research outputs found

    Features and Differences of the Parallel Corpus of English and Uzbek Languages

    Get PDF
    A parallel corpus consists of texts that have been translated one / more than the original. Which topic to PC, the choice of text in the genre depends on the purpose of the compiler. When choosing a text for the Uzbek-English PC, it is advisable to collect translations from Uzbek into English and direct translations from English into Uzbek. Because some units may lose their value in indirect translation, the PC cannot fully perform its function, so texts that translate directly from the original to the PC are included

    A MT System from Turkmen to Turkish employing finite state and statistical methods

    Get PDF
    In this work, we present a MT system from Turkmen to Turkish. Our system exploits the similarity of the languages by using a modified version of direct translation method. However, the complex inflectional and derivational morphology of the Turkic languages necessitate special treatment for word-by-word translation model. We also employ morphology-aware multi-word processing and statistical disambiguation processes in our system. We believe that this approach is valid for most of the Turkic languages and the architecture implemented using FSTs can be easily extended to those languages

    Investigating the Effect of Emoji in Opinion Classification of Uzbek Movie Review Comments

    Full text link
    Opinion mining on social media posts has become more and more popular. Users often express their opinion on a topic not only with words but they also use image symbols such as emoticons and emoji. In this paper, we investigate the effect of emoji-based features in opinion classification of Uzbek texts, and more specifically movie review comments from YouTube. Several classification algorithms are tested, and feature ranking is performed to evaluate the discriminative ability of the emoji-based features.Comment: 10 pages, 1 figure, 3 table

    Morphologic, Syntactic, and Phonologic Distance Between Japanese and Altaic, Dravidian, Austronesian, and Korean Languages

    Get PDF
    The present study measures the resemblances of Japanese with Altaic languages (Turkic; Tungstic; Mongolic; Nivkh); the Dravidian language Tamil; Austronesian languages (Western Malayo-Polynesian; Malayo-Sumbawan; Central Luzon; Central Malayo-Polynesian), and Korean, in an effort to pin down the genealogy of Japanese. Morphologic, syntactic, and phonologic distance are calculated using data from corpora. The chi-square homogeneity test and Euclidean distances are used for statistical analysis. The finding brings to light, morphologically, in the light of preferences of causative/inchoative verb alternation patterning and morphemes that convey the alternation, that Japanese and Korean are close for the most part. Syntactically, Altaics and Tamil convey case via suffixes; case in Austronesian languages is marked by prefixes. Japanese and Korean share a similarity in rendering case with particles. Phonologically, the Tamil and Austronesian languages share a resemblance in the harmony of vowel height. The Korean, Altaic languages, and Austronesian languages show similarities in the harmony of vowel backness. Japanese, the Altaic languages, and the Austronesian language Madurese display vowel-consonant harmony. Pulling these strands together, a conclusion is thus drawn that Japanese is most closely related to Korean

    Evidentiality in Uzbek and Kazakh

    Get PDF
    The purpose of this work is to describe and account for the broad range of phenomena referred to as “evidentiality” in two Turkic languages: Uzbek and Kazakh. Much previous work on the Turkic languages treats evidentiality as a distinct verbal category. However, morphemes that express evidential meaning also often express other meanings such as dubitativity and admirativity, or may even express rhetorical questions. This work follows Friedman (1978; 1981; 1988) and others in considering these meanings to be the result of an evidential-like strategy: the expression of non-confirmativity. In Uzbek and Kazakh, as well as in many other Eurasian languages, the past tense is the locus of evidential meaning. There are three items in the Uzbek and Kazakh past tense paradigm, and these differ in terms of markedness for confirmativity: one is marked as confirmative, one as non-confirmative, and one is unmarked for confirmativity. The unmarked item, often referred to as the perfect, exists in a copular form. As a copular form, it expresses marked non-confirmativity. When this copular form (in Uzbek: ekan, in Kazakh: eken) is employed to express non-confirmativity, this non-confirmativity is manifested either as non-firsthand information source or as admirativity. By employing the non-confirmative analysis, we are able to account for the broad range of phenomena considered “evidential” without resorting to postulating an evidential category. Rather, in Uzbek and Kazakh, evidential meaning is merely one effect of the expression of non-confirmativity, which is a subtype of the categories of status or modality. xv NOTES ON ORTHOGRAPHY AND PHONOLOGY For the purpose of readabil

    Proceedings of the 1st Conference on Central Asian Languages and Linguistics (ConCALL)

    Get PDF
    The Conference on Central Asian Languages and Linguistics (ConCALL) was founded in 2014 at Indiana University by Dr. Öner Özçelik, the residing director of the Center for Languages of the Central Asian Region (CeLCAR). As the nation’s sole U.S. Department of Education funded Language Resource Center focusing on the languages of the Central Asian Region, CeLCAR’s main mission is to strengthen and improve the nation’s capacity for teaching and learning Central Asian languages through teacher training, research, materials development projects, and dissemination. As part of this mission, CeLCAR has an ultimate goal to unify and fortify the Central Asian language learning community by facilitating networking between linguists and language educators, encouraging research projects that will inform language instruction, and provide opportunities for professionals in the field to both showcase their work and receive feedback from their peers. Thus ConCALL was established to be the first international academic conference to bring together linguists and language educators in the languages of the Central Asian region, including both the Altaic and Eastern Indo-European languages spoken in the region, to focus on research into how these specific languages are represented formally, as well as acquired by second/foreign language learners, and also to present research driven teaching methods. Languages served by ConCALL include, but are not limited to: Azerbaijani, Dari, Karakalpak, Kazakh, Kyrgyz, Lokaabharan, Mari, Mongolian, Pamiri, Pashto, Persian, Russian, Shughnani, Tajiki, Tibetan, Tofalar, Tungusic, Turkish, Tuvan, Uyghur, Uzbek, Wakhi and more!The Conference on Central Asian Languages and Linguistics held at Indiana University on 16-17 May 1014 was made possible through the generosity of our sponsors: Center for Languages of the Central Asian Region (CeLCAR), Ostrom Grant Programs, IU's College of Arts and Humanities Center (CAHI), Inner Asian and Uralic National Resource Center (IAUNRC), IU's School of Global and International Studies (SGIS), IU's College of Arts and Sciences, Sinor Research Institute for Inner Asian Studies (SRIFIAS), IU's Department of Central Eurasian Studies (CEUS), and IU's Department of Linguistics
    • …
    corecore