Search CORE

1,286 research outputs found

Part of Speech Tagging of Marathi Text Using Trigram Method

Author: Joshi Nisheeth
Mathur Iti
Singh Jyoti
Publication venue
Publication date: 01/04/2013
Field of study

In this paper we present a Marathi part of speech tagger. It is a morphologically rich language. It is spoken by the native people of Maharashtra. The general approach used for development of tagger is statistical using trigram Method. The main concept of trigram is to explore the most likely POS for a token based on given information of previous two tags by calculating probabilities to determine which is the best sequence of a tag. In this paper we show the development of the tagger. Moreover we have also shown the evaluation done

arXiv.org e-Print Archive

CogPrints Cognitive Sciences Eprint Archive

Implementation of Rule Based Algorithm for Sandhi-Vicheda Of Compound Hindi Words

Author: Goyal Vishal
Gupta Priyanka
Publication venue: International Journal of Computer Science Issues, IJCSI
Publication date: 01/08/2009
Field of study

Sandhi means to join two or more words to coin new word. Sandhi literally means `putting together' or combining (of sounds), It denotes all combinatory sound-changes effected (spontaneously) for ease of pronunciation. Sandhi-vicheda describes [5] the process by which one letter (whether single or cojoined) is broken to form two words. Part of the broken letter remains as the last letter of the first word and part of the letter forms the first letter of the next letter. Sandhi-Vicheda is an easy and interesting way that can give entirely new dimension that add new way to traditional approach to Hindi Teaching. In this paper using the Rule based algorithm we have reported an accuracy of 60-80% depending upon the number of rules to be implemented

arXiv.org e-Print Archive

Directory of Open Access Journals

Part of Speech Tagging of Marathi Text Using Trigram Method

Author
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date
Field of study

Crossref

Ergative case, aspect and person splits: Two case studies

Author: Franco Ludovico
Manzini M. Rita
Savoia Leonardo M.
Publication venue: 'Akademiai Kiado Zrt.'
Publication date: 01/01/2015
Field of study

Ergativity splits between perfect and imperfective/progressive predicates are observed in languages with a specialized ergative case (Punjabi) and without it (Kurdish). Perfect predicates correspond to a VP projection; external arguments are introduced by means of an oblique case, namely an elementary part–whole predicate saying that the event is ‘included by’, ‘located at’ the argument. A more complex organization is found with imperfective/progressive predicates, where a head Asp projects a functional layer and introduces the external argument. Our proposal further yields the 1/2P vs. 3P Person split as a result of the intrinsic ability of 1/2P to serve as ‘location-of-event’

Crossref

Florence Research

Repository of the Academy's Library

Beyond Arabic: Software for Perso-Arabic Script Manipulation

Author: Doctor Raiomond
Gutkin Alexander
Johny Cibu
Roark Brian
Sproat Richard
Publication venue
Publication date: 26/01/2023
Field of study

This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script. The operations include various levels of script normalization, including visual invariance-preserving operations that subsume and go beyond the standard Unicode normalization forms, as well as transformations that modify the visual appearance of characters in accordance with the regional orthographies for eleven contemporary languages from diverse language families. The library also provides simple FST-based romanization and transliteration. We additionally attempt to formalize the typology of Perso-Arabic characters by providing one-to-many mappings from Unicode code points to the languages that use them. While our work focuses on the Arabic script diaspora rather than Arabic itself, this approach could be adopted for any language that uses the Arabic script, thus providing a unified framework for treating a script family used by close to a billion people.Comment: Preprint to appear in the Proceedings of the 7th Arabic Natural Language Processing Workshop (WANLP 2022) at EMNLP, Abu Dhabi, United Arab Emirates, December 7-11, 2022. 7 page

arXiv.org e-Print Archive

Automatic Detection of Gender and Number Agreement Errors in Spanish Texts Written by Japanese Learners

Author: Ibanez Maria del Pilar Valverde
Otani Akira
Publication venue: 'Faculty of Computer Science, Universitas Indonesia'
Publication date: 01/01/2012
Field of study

Waseda University Repository

OBLIQUE CASE AND INDO-EUROPEAN ERGATIVITY SPLITS

Author: Manzini Maria Rita
Savoia Leonardo Maria
Publication venue
Publication date: 01/01/2015
Field of study

Florence Research

Language acquisition

Author: Anita Claire (7169903)
Publication venue
Publication date: 01/01/2002
Field of study

This project investigates acquisition of a new language by example. Syntax induction has been studied widely and the more complex syntax associated with Natural Language is difficult to induce without restrictions. Chomsky conjectured that natural languages are restricted by a Universal Grammar. English could be used as a Universal Grammar and Punjabi derived from it in a similar way as the acquisition of a first language. However, if English has already been acquired then Punjabi would be induced from English as a second language. [Continues.

Loughborough University Institutional Repository

Community languages in higher education : towards realising the potential

Author: McPake Joanna
Routes into Languages (HEFCE and DCSF) (Funder)
Sachdev Itesh
Publication venue: Routes into Languages, University of Southampton
Publication date: 01/01/2008
Field of study

This study, Community Languages in Higher Education: Towards Realising the Potential, forms part of the Routes into Languages initiative funded by the Higher Education Funding Council in England (HEFCE) and the Department for Children, Schools and Families (DCSF). It sets out to map provision for community languages, defined as 'all languages in use in a society, other than the dominant, official or national language'. In England, where the dominant language is English, some 300 community languages are in use, the most widespread being Urdu, Cantonese, Punjabi, Bengali, Arabic, Turkish, Russian, Spanish, Portuguese, Gujerati, Hindi and Polish. The research was jointly conducted by the Scottish Centre for Information on Language Teaching and Research (Scottish CILT) at the University of Stirling, and the SOAS-UCL Centre for Excellence for Teaching and Learning 'Languages of the Wider World' (LWW CETL), between February 2007 and January 2008. The overall aim of this study was to map provision for community languages in higher education in England and to consider how it can be developed to meet emerging demand for more extensive provision

University of Strathclyde Institutional Repository