1,286 research outputs found

    Part of Speech Tagging of Marathi Text Using Trigram Method

    Get PDF
    In this paper we present a Marathi part of speech tagger. It is a morphologically rich language. It is spoken by the native people of Maharashtra. The general approach used for development of tagger is statistical using trigram Method. The main concept of trigram is to explore the most likely POS for a token based on given information of previous two tags by calculating probabilities to determine which is the best sequence of a tag. In this paper we show the development of the tagger. Moreover we have also shown the evaluation done

    Implementation of Rule Based Algorithm for Sandhi-Vicheda Of Compound Hindi Words

    Get PDF
    Sandhi means to join two or more words to coin new word. Sandhi literally means `putting together' or combining (of sounds), It denotes all combinatory sound-changes effected (spontaneously) for ease of pronunciation. Sandhi-vicheda describes [5] the process by which one letter (whether single or cojoined) is broken to form two words. Part of the broken letter remains as the last letter of the first word and part of the letter forms the first letter of the next letter. Sandhi-Vicheda is an easy and interesting way that can give entirely new dimension that add new way to traditional approach to Hindi Teaching. In this paper using the Rule based algorithm we have reported an accuracy of 60-80% depending upon the number of rules to be implemented

    Part of Speech Tagging of Marathi Text Using Trigram Method

    Full text link

    Ergative case, aspect and person splits: Two case studies

    Get PDF
    Ergativity splits between perfect and imperfective/progressive predicates are observed in languages with a specialized ergative case (Punjabi) and without it (Kurdish). Perfect predicates correspond to a VP projection; external arguments are introduced by means of an oblique case, namely an elementary part–whole predicate saying that the event is ‘included by’, ‘located at’ the argument. A more complex organization is found with imperfective/progressive predicates, where a head Asp projects a functional layer and introduces the external argument. Our proposal further yields the 1/2P vs. 3P Person split as a result of the intrinsic ability of 1/2P to serve as ‘location-of-event’

    Beyond Arabic: Software for Perso-Arabic Script Manipulation

    Full text link
    This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script. The operations include various levels of script normalization, including visual invariance-preserving operations that subsume and go beyond the standard Unicode normalization forms, as well as transformations that modify the visual appearance of characters in accordance with the regional orthographies for eleven contemporary languages from diverse language families. The library also provides simple FST-based romanization and transliteration. We additionally attempt to formalize the typology of Perso-Arabic characters by providing one-to-many mappings from Unicode code points to the languages that use them. While our work focuses on the Arabic script diaspora rather than Arabic itself, this approach could be adopted for any language that uses the Arabic script, thus providing a unified framework for treating a script family used by close to a billion people.Comment: Preprint to appear in the Proceedings of the 7th Arabic Natural Language Processing Workshop (WANLP 2022) at EMNLP, Abu Dhabi, United Arab Emirates, December 7-11, 2022. 7 page

    Language acquisition

    Get PDF
    This project investigates acquisition of a new language by example. Syntax induction has been studied widely and the more complex syntax associated with Natural Language is difficult to induce without restrictions. Chomsky conjectured that natural languages are restricted by a Universal Grammar. English could be used as a Universal Grammar and Punjabi derived from it in a similar way as the acquisition of a first language. However, if English has already been acquired then Punjabi would be induced from English as a second language. [Continues.

    Community languages in higher education : towards realising the potential

    Get PDF
    This study, Community Languages in Higher Education: Towards Realising the Potential, forms part of the Routes into Languages initiative funded by the Higher Education Funding Council in England (HEFCE) and the Department for Children, Schools and Families (DCSF). It sets out to map provision for community languages, defined as 'all languages in use in a society, other than the dominant, official or national language'. In England, where the dominant language is English, some 300 community languages are in use, the most widespread being Urdu, Cantonese, Punjabi, Bengali, Arabic, Turkish, Russian, Spanish, Portuguese, Gujerati, Hindi and Polish. The research was jointly conducted by the Scottish Centre for Information on Language Teaching and Research (Scottish CILT) at the University of Stirling, and the SOAS-UCL Centre for Excellence for Teaching and Learning 'Languages of the Wider World' (LWW CETL), between February 2007 and January 2008. The overall aim of this study was to map provision for community languages in higher education in England and to consider how it can be developed to meet emerging demand for more extensive provision
    • …
    corecore