Search CORE

10,463 research outputs found

DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 01/02/2010
Field of study

For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, information retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conation (different stemming approaches and indexing word prefixes), blind relevance feedback, and manual and automatic query translation. The experiments are based on BM25 and on language modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP) compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi, the corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAP than BM25 (0.4944 vs. 0.4526). In all experiments using BM25, blind relevance feedback yields considerably higher MAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) are based on query translations obtained from native speakers and the Google translate web service. For the automatically translated queries, MAP is slightly (but not significantly) lower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi) experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best corresponding monolingual experiments

Irish Universities

DCU Online Research Access Service

Translation and Bilingualism in Monica Ali’s and Jhumpa Lahiri’s Marginalized Identities

Author: Rizzo Alessandra
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 01/12/2012
Field of study

This investigation seeks to demonstrate how Ali and Lahiri represent two different migrant experiences, Muslim and Indian, each of which functioning within a multicultural Anglo-American context. Each text is transformed into the lieu where identities become both identities-intranslation and translated identities and each text itself may be looked at as the site of preservation of native identities but also of the assimilation (or adaptation) of identity. Second-generation immigrant women writers become the interpreters of the old and new cultures, the translators of their own local cultures in a space of transition

Biblioteka Nauki - repozytorium artykuÅÃ³w

Repozytorium Uniwersytetu Łódzkiego (University of Lodz Repository)

Goethe’s “Welt” poet in Bengal: The Influence of World Literature on Jibanananda Das and other Bengali Poets of the 1930s-40s

Author: Firoze Basu
Publication venue: Perception Publishing
Publication date: 30/08/2021
Field of study

oai:ojs2.www.thecreativelauncher.com:article/1This study aims to establish a link between the concept of “Weltliteratur” or World Literature, in terms of the free movement of literary themes and ideas between nations in original form or translation, and the Bengali poets of the thirties and forties who actively translated French and German poets. It identifies Johann Wolfgang von Goethe's (1749-1832) concept of World Literature as a vehicle for the Kallol Jug poets. Johann Wolfgang von Goethe introduced the concept of “Weltliteratur” in a few of his essays in the first half of the nineteenth century to describe the international circulation and reception of literary works in Europe, including works of non-Western origin. My emphasis will be on Jibanananda Das (1899-1954) arguably the most celebrated poet in Bengali literature who was well versed in the contemporary Western Canons of Poetry. Jibanananda’s defamiliarization of the rural Bengal Landscape, his use of exotic foreign images owe a debt to contemporary European poets. Interestingly, Jibanananda had reviewed an English translation of German author Thomas Mann’s novel “Dr Faustus’ for a Bengali magazine “Chaturanga”. In the Bengali review he states that despite prevalent misconceptions (some critics considering the novel to be superior to the original Faust epic by Goethe) Goethe’s Faust was the first text to capture the hope, despair and crisis in the modern world and articulate it in such a manner that “true” literature of the age was created in its new light. In Jibanananda’s estimation, Thomas Mann deserves credit for treating the Faust legend in a unique and creative way

The Creative Launcher

Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR

Author: Jones Gareth J.F.
Leveling Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2010
Field of study

The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conation step and useful in case of few language-specific resources. For English, the corpusbased stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages

Irish Universities

DCU Online Research Access Service

Community languages in higher education : towards realising the potential

Author: McPake Joanna
Routes into Languages (HEFCE and DCSF) (Funder)
Sachdev Itesh
Publication venue: Routes into Languages, University of Southampton
Publication date: 01/01/2008
Field of study

This study, Community Languages in Higher Education: Towards Realising the Potential, forms part of the Routes into Languages initiative funded by the Higher Education Funding Council in England (HEFCE) and the Department for Children, Schools and Families (DCSF). It sets out to map provision for community languages, defined as 'all languages in use in a society, other than the dominant, official or national language'. In England, where the dominant language is English, some 300 community languages are in use, the most widespread being Urdu, Cantonese, Punjabi, Bengali, Arabic, Turkish, Russian, Spanish, Portuguese, Gujerati, Hindi and Polish. The research was jointly conducted by the Scottish Centre for Information on Language Teaching and Research (Scottish CILT) at the University of Stirling, and the SOAS-UCL Centre for Excellence for Teaching and Learning 'Languages of the Wider World' (LWW CETL), between February 2007 and January 2008. The overall aim of this study was to map provision for community languages in higher education in England and to consider how it can be developed to meet emerging demand for more extensive provision

University of Strathclyde Institutional Repository