Search CORE

213 research outputs found

Improving the quality of Gujarati-Hindi Machine Translation through part-of-speech tagging and stemmer-assisted transliteration

Author: Ameta Juhi
Joshi Nisheeth
Mathur Iti
Publication venue
Publication date: 01/06/2013
Field of study

Machine Translation for Indian languages is an emerging research area. Transliteration is one such module that we design while designing a translation system. Transliteration means mapping of source language text into the target language. Simple mapping decreases the efficiency of overall translation system. We propose the use of stemming and part-of-speech tagging for transliteration. The effectiveness of translation can be improved if we use part-of-speech tagging and stemming assisted transliteration.We have shown that much of the content in Gujarati gets transliterated while being processed for translation to Hindi language

CogPrints Cognitive Sciences Eprint Archive

A Lightweight Stemmer for Gujarati

Author: Ameta Juhi
Joshi Nisheeth
Mathur Iti
Publication venue
Publication date: 01/12/2011
Field of study

Gujarati is a resource poor language with almost no language processing tools being available. In this paper we have shown an implementation of a rule based stemmer of Gujarati. We have shown the creation of rules for stemming and the richness in morphology that Gujarati possesses. We have also evaluated our results by verifying it with a human expert

arXiv.org e-Print Archive

CogPrints Cognitive Sciences Eprint Archive

DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 01/02/2010
Field of study

For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, information retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conation (different stemming approaches and indexing word prefixes), blind relevance feedback, and manual and automatic query translation. The experiments are based on BM25 and on language modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP) compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi, the corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAP than BM25 (0.4944 vs. 0.4526). In all experiments using BM25, blind relevance feedback yields considerably higher MAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) are based on query translations obtained from native speakers and the Google translate web service. For the automatically translated queries, MAP is slightly (but not significantly) lower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi) experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best corresponding monolingual experiments

Irish Universities

DCU Online Research Access Service