Search CORE

144,863 research outputs found

Sub-word indexing and blind relevance feedback for English, Bengali, Hindi, and Marathi IR

Author: Jones Gareth J.F.
Leveling Johannes
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/09/2010
Field of study

The Forum for Information Retrieval Evaluation (FIRE) provides document collections, topics, and relevance assessments for information retrieval (IR) experiments on Indian languages. Several research questions are explored in this paper: 1. how to create create a simple, languageindependent corpus-based stemmer, 2. how to identify sub-words and which types of sub-words are suitable as indexing units, and 3. how to apply blind relevance feedback on sub-words and how feedback term selection is affected by the type of the indexing unit. More than 140 IR experiments are conducted using the BM25 retrieval model on the topic titles and descriptions (TD) for the FIRE 2008 English, Bengali, Hindi, and Marathi document collections. The major findings are: The corpus-based stemming approach is effective as a knowledge-light term conation step and useful in case of few language-specific resources. For English, the corpusbased stemmer performs nearly as well as the Porter stemmer and significantly better than the baseline of indexing words when combined with query expansion. In combination with blind relevance feedback, it also performs significantly better than the baseline for Bengali and Marathi IR. Sub-words such as consonant-vowel sequences and word prefixes can yield similar or better performance in comparison to word indexing. There is no best performing method for all languages. For English, indexing using the Porter stemmer performs best, for Bengali and Marathi, overlapping 3-grams obtain the best result, and for Hindi, 4-prefixes yield the highest MAP. However, in combination with blind relevance feedback using 10 documents and 20 terms, 6-prefixes for English and 4-prefixes for Bengali, Hindi, and Marathi IR yield the highest MAP. Sub-word identification is a general case of decompounding. It results in one or more index terms for a single word form and increases the number of index terms but decreases their average length. The corresponding retrieval experiments show that relevance feedback on sub-words benefits from selecting a larger number of index terms in comparison with retrieval on word forms. Similarly, selecting the number of relevance feedback terms depending on the ratio of word vocabulary size to sub-word vocabulary size almost always slightly increases information retrieval effectiveness compared to using a fixed number of terms for different languages

Irish Universities

DCU Online Research Access Service

The battle for 'the Mack'

Author: Wieber Sabine
Publication venue: Association of Historians of Nineteenth-Century Art
Publication date: 01/01/2014
Field of study

No abstract available

Enlighten

DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation

Author: Ganguly Debasis
Jones Gareth J.F.
Leveling Johannes
Publication venue
Publication date: 01/02/2010
Field of study

For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, information retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conation (different stemming approaches and indexing word prefixes), blind relevance feedback, and manual and automatic query translation. The experiments are based on BM25 and on language modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP) compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi, the corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAP than BM25 (0.4944 vs. 0.4526). In all experiments using BM25, blind relevance feedback yields considerably higher MAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) are based on query translations obtained from native speakers and the Google translate web service. For the automatically translated queries, MAP is slightly (but not significantly) lower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi) experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best corresponding monolingual experiments

Irish Universities

DCU Online Research Access Service

Readers and Reading in the First World War

Author: Edmund G. C. King
Francesca Benatti
Shafquat Towheed
The Stanford Natural Language Processing Group
University of Sheffield
Publication venue: 'Modern Humanities Research Association'
Publication date: 01/01/2015
Field of study

This essay consists of three individually authored and interlinked sections. In ‘A Digital Humanities Approach’, Francesca Benatti looks at datasets and databases (including the UK Reading Experience Database) and shows how a systematic, macro-analytical use of digital humanities tools and resources might yield answers to some key questions about reading in the First World War. In ‘Reading behind the Wire in the First World War’ Edmund G. C. King scrutinizes the reading practices and preferences of Allied prisoners of war in Mainz, showing that reading circumscribed by the contingencies of a prison camp created an unique literary community, whose legacy can be traced through their literary output after the war. In ‘Book-hunger in Salonika’, Shafquat Towheed examines the record of a single reader in a specific and fairly static frontline, and argues that in the case of the Salonika campaign, reading communities emerged in close proximity to existing centres of print culture. The focus of this essay moves from the general to the particular, from the scoping of large datasets, to the analyses of identified readers within a specific geographical and temporal space. The authors engage with the wider issues and problems of recovering, interpreting, visualizing, narrating, and representing readers in the First World War

Crossref

Open Research Online (The Open University)

The experience of student use of eBooks on mobile devices

Author: Devenney Amy
Sarjantson Maggie
Stone Graham
Thompson Sarah
Publication venue: 'University of Huddersfield Press'
Publication date: 01/07/2015
Field of study

University of Huddersfield Repository

Effigy Vessel Documentation, Caddo Collections at the Texas Archeological Research Laboratory at The University of Texas at Austin

Author: Perttula Timothy K.
Selden Robert Z., Jr.
Publication venue: SFA ScholarWorks
Publication date: 01/01/2015
Field of study

Ceramic vessels from ancestral Caddo sites in East Texas are diverse in form, size, manufacture, and decoration, both spatially and temporally. Variation in these attributes, including vessel form as well as any attachments, also “is connected with particular local and regional traditions” (Brown 1996:335). To both appreciate and understand the meaning of vessel form diversity in Caddo vessel assemblages in East Texas—or any other part of the much larger southern Caddo area—the consistent identification of different vessel forms and vessel shapes is crucial. The formal identification of the diverse vessel forms and vessel shapes, in conjunction with other vessel attributes, most notably decorative motifs and elements, present in Caddo vessel assemblages should contribute to delimiting the existence and spatial distribution of communities of Caddo potters that were sharing or not sharing ceramic practices and traditions in both short-term and long-term spatial scales, and illuminating small or expansive networks of social groups tied together through regional interaction. In this study, the focus is on ceramic effigy vessels from Caddo sites in East Texas that are in the collections at the Texas Archeological Research Laboratory at The University of Texas at Austin (TARL). Ceramic effigy vessels are a very rare vessel form found on Caddo sites, as they comprise about 1 percent of the more than 3100 Caddo vessels currently in the TARL collections. Three different effigy bowl shapes have been identified in East Texas Caddo vessel assemblages. The differences primarily resolve around the character of the effigy head (both bird and abstract forms) as well as the nature of any other appendages, such as tab tails and tail riders. The effigy bowls themselves are simple in form, with rounded body wall contours

SFA ScholarWorks

Retrieving with good sense

Author: Sanderson M.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2000
Field of study

Although always present in text, word sense ambiguity only recently became regarded as a problem to information retrieval which was potentially solvable. The growth of interest in word senses resulted from new directions taken in disambiguation research. This paper first outlines this research and surveys the resulting efforts in information retrieval. Although the majority of attempts to improve retrieval effectiveness were unsuccessful, much was learnt from the research. Most notably a notion of under what circumstance disambiguation may prove of use to retrieval

CiteSeerX

White Rose Research Online

Index to Library Trends Volume 33

Author: Burger Robert H.
Publication venue: Graduate School of Library and Information Science. University of Illinois at Urbana-Champaign
Publication date: 01/01/1985
Field of study

published or submitted for publicatio

Illinois Digital Environment for Access to Learning and Scholarship Repository