Search CORE

10 research outputs found

Disambiguation strategies for cross-language information retrieval

Author: D. Harman
G. Salton
S.E. Robertson
Publication venue: Springer Verlag
Publication date: 01/01/1999
Field of study

This paper gives an overview of tools and methods for Cross-Language Information Retrieval (CLIR) that are developed within the Twenty-One project. The tools and methods are evaluated with the TREC CLIR task document collection using Dutch queries on the English document base. The main issue addressed here is an evaluation of two approaches to disambiguation. The underlying question is whether a lot of effort should be put in finding the correct translation for each query term before searching, or whether searching with more than one possible translation leads to better results? The experimental study suggests that the quality of search methods is more important than the quality of disambiguation methods. Good retrieval methods are able to disambiguate translated queries implicitly during searching

CiteSeerX

Crossref

Radboud Repository

University of Twente Research Information

Transitive probabilistic CLIR models.

Author: Jong F.M.G. de
Kraaij W.
Publication venue: Centre de hautes etudes internationales (CID)
Publication date: 01/01/2004
Field of study

Transitive translation could be a useful technique to enlarge the number of supported language pairs for a cross-language information retrieval (CLIR) system in a cost-effective manner. The paper describes several setups for transitive translation based on probabilistic translation models. The transitive CLIR models were evaluated on the CLEF test collection and yielded a retrieval effectiveness\ud up to 83% of monolingual performance, which is significantly better than a baseline using the synonym operator

CiteSeerX

University of Twente Research Information

Twenty-One: a baseline for multilingual multimedia retrieval

Author: de Jong Franciska
Publication venue: 'University Library/University of Twente'
Publication date: 27/01/1998
Field of study

University of Twente Research Information

Twenty-One at TREC-7: ad-hoc and cross-language track

Author: Hiemstra Djoerd
Kraaij Wessel
Publication venue: National Institute of Standards and Technology (NIST)
Publication date: 01/01/1998
Field of study

This paper describes the official runs of the Twenty-One group for TREC-7. The Twenty-One group participated in the ad-hoc and the cross-language track and made the following accomplishments: We developed a new weighting algorithm, which outperforms the popular Cornell version of BM25 on the ad-hoc collection. For the CLIR task we developed a fuzzy matching algorithm to recover from missing translations and spelling variants of proper names. Also for CLIR we investigated translation strategies that make extensive use of information from our dictionaries by identifying preferred translations, main translations and synonym translations, by defining weights of possible translations and by experimenting with probabilistic boolean matching strategies

CiteSeerX

Radboud Repository

University of Twente Research Information

Language technology in multimedia information retrieval:Proceedings of the fourteenth International Twente Workshop on Language Technology

Author
Publication venue: 'University Library/University of Twente'
Publication date: 01/12/1998
Field of study

University of Twente Research Information

Proceedings. Workshop Ontologie-basiertes Wissensmanagement, WOW2003, Luzern, 2.-4. April, 2003 [online]

Author: [Hrsg.] Hans-Peter
Schnurr Hans-Peter
Sure York
Publication venue
Publication date: 02/08/2007
Field of study

KITopen

Robust methods for Chinese spoken document retrieval.

Author
Publication venue
Publication date: 01/01/2003
Field of study

Hui Pui Yu.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 158-169).Abstracts in English and Chinese.Abstract --- p.2Acknowledgements --- p.6Chapter 1 --- Introduction --- p.23Chapter 1.1 --- Spoken Document Retrieval --- p.24Chapter 1.2 --- The Chinese Language and Chinese Spoken Documents --- p.28Chapter 1.3 --- Motivation --- p.33Chapter 1.3.1 --- Assisting the User in Query Formation --- p.34Chapter 1.4 --- Goals --- p.34Chapter 1.5 --- Thesis Organization --- p.35Chapter 2 --- Multimedia Repository --- p.37Chapter 2.1 --- The Cantonese Corpus --- p.37Chapter 2.1.1 --- The RealMedia´ёØCollection --- p.39Chapter 2.1.2 --- The MPEG-1 Collection --- p.40Chapter 2.2 --- The Multimedia Markup Language --- p.42Chapter 2.3 --- Chapter Summary --- p.44Chapter 3 --- Monolingual Retrieval Task --- p.45Chapter 3.1 --- Properties of Cantonese Video Archive --- p.45Chapter 3.2 --- Automatic Speech Transcription --- p.46Chapter 3.2.1 --- Transcription of Cantonese Spoken Documents --- p.47Chapter 3.2.2 --- Indexing Units --- p.48Chapter 3.3 --- Known-Item Retrieval Task --- p.49Chapter 3.3.1 --- Evaluation ´ؤ Average Inverse Rank --- p.50Chapter 3.4 --- Retrieval Model --- p.51Chapter 3.5 --- Experimental Results --- p.52Chapter 3.6 --- Chapter Summary --- p.53Chapter 4 --- The Use of Audio and Video Information for Monolingual Spoken Document Retrieval --- p.55Chapter 4.1 --- Video-based Segmentation --- p.56Chapter 4.1.1 --- Metric Computation --- p.57Chapter 4.1.2 --- Shot Boundary Detection --- p.58Chapter 4.1.3 --- Shot Transition Detection --- p.67Chapter 4.2 --- Audio-based Segmentation --- p.69Chapter 4.2.1 --- Gaussian Mixture Models --- p.69Chapter 4.2.2 --- Transition Detection --- p.70Chapter 4.3 --- Performance Evaluation --- p.72Chapter 4.3.1 --- Automatic Story Segmentation --- p.72Chapter 4.3.2 --- Video-based Segmentation Algorithm --- p.73Chapter 4.3.3 --- Audio-based Segmentation Algorithm --- p.74Chapter 4.4 --- Fusion of Video- and Audio-based Segmentation --- p.75Chapter 4.5 --- Retrieval Performance --- p.76Chapter 4.6 --- Chapter Summary --- p.78Chapter 5 --- Document Expansion for Monolingual Spoken Document Retrieval --- p.79Chapter 5.1 --- Document Expansion using Selected Field Speech Segments --- p.81Chapter 5.1.1 --- Annotations from MmML --- p.81Chapter 5.1.2 --- Selection of Cantonese Field Speech --- p.83Chapter 5.1.3 --- Re-weighting Different Retrieval Units --- p.84Chapter 5.1.4 --- Retrieval Performance with Document Expansion using Selected Field Speech --- p.84Chapter 5.2 --- Document Expansion using N-best Recognition Hypotheses --- p.87Chapter 5.2.1 --- Re-weighting Different Retrieval Units --- p.90Chapter 5.2.2 --- Retrieval Performance with Document Expansion using TV-best Recognition Hypotheses --- p.90Chapter 5.3 --- Document Expansion using Selected Field Speech and N-best Recognition Hypotheses --- p.92Chapter 5.3.1 --- Re-weighting Different Retrieval Units --- p.92Chapter 5.3.2 --- Retrieval Performance with Different Indexed Units --- p.93Chapter 5.4 --- Chapter Summary --- p.94Chapter 6 --- Query Expansion for Cross-language Spoken Document Retrieval --- p.97Chapter 6.1 --- The TDT-2 Corpus --- p.99Chapter 6.1.1 --- English Textual Queries --- p.100Chapter 6.1.2 --- Mandarin Spoken Documents --- p.101Chapter 6.2 --- Query Processing --- p.101Chapter 6.2.1 --- Query Weighting --- p.101Chapter 6.2.2 --- Bigram Formation --- p.102Chapter 6.3 --- Cross-language Retrieval Task --- p.103Chapter 6.3.1 --- Indexing Units --- p.104Chapter 6.3.2 --- Retrieval Model --- p.104Chapter 6.3.3 --- Performance Measure --- p.105Chapter 6.4 --- Relevance Feedback --- p.106Chapter 6.4.1 --- Pseudo-Relevance Feedback --- p.107Chapter 6.5 --- Retrieval Performance --- p.107Chapter 6.6 --- Chapter Summary --- p.109Chapter 7 --- Conclusions and Future Work --- p.111Chapter 7.1 --- Future Work --- p.114Chapter A --- XML Schema for Multimedia Markup Language --- p.117Chapter B --- Example of Multimedia Markup Language --- p.128Chapter C --- Significance Tests --- p.135Chapter C.1 --- Selection of Cantonese Field Speech Segments --- p.135Chapter C.2 --- Fusion of Video- and Audio-based Segmentation --- p.137Chapter C.3 --- Document Expansion with Reporter Speech --- p.137Chapter C.4 --- Document Expansion with N-best Recognition Hypotheses --- p.140Chapter C.5 --- Document Expansion with Reporter Speech and N-best Recognition Hypotheses --- p.140Chapter C.6 --- Query Expansion with Pseudo Relevance Feedback --- p.142Chapter D --- Topic Descriptions of TDT-2 Corpus --- p.145Chapter E --- Speech Recognition Output from Dragon in CLSDR Task --- p.148Chapter F --- Parameters Estimation --- p.152Chapter F.1 --- "Estimating the Number of Relevant Documents, Nr" --- p.152Chapter F.2 --- "Estimating the Number of Terms Added from Relevant Docu- ments, Nrt , to Original Query" --- p.153Chapter F.3 --- "Estimating the Number of Non-relevant Documents, Nn , from the Bottom-scoring Retrieval List" --- p.153Chapter F.4 --- "Estimating the Number of Terms, Selected from Non-relevant Documents (Nnt), to be Removed from Original Query" --- p.154Chapter G --- Abbreviations --- p.155Bibliography --- p.15

CUHK Digital Repository

Language Technology in Multimedia Information Retrieval. Twente Workshop on Language Technology 14

Author: Netter Klaus
Publication venue: 'University Library/University of Twente'
Publication date: 01/01/1998
Field of study

University of Twente Research Information

Cross-language retrieval with the Twenty-One system

Author: Djoerd Hiemstra
Wessel Kraaij
Publication venue
Publication date: 01/01/1998
Field of study

TheEUprojectTwenty-One will support cross language queries in a multilingual document base. A prototype version of the Twenty-One system has been subjected to the Cross Language track tests in order to set baseline performances. The runs were based on query translation using dictionaries and corpus based disambiguation methods

CiteSeerX

Radboud Repository

University of Twente Research Information

Cross-Language Retrieval with the Twenty-One System

Author: Harman D.K.
Hiemstra Djoerd
Kraaij Wessel
Voorhees E.M
Publication venue: 'National Institute of Standards and Technology (NIST)'
Publication date: 01/01/1998
Field of study

The EU project Twenty-One will support cross language queries in a multilingual document base. A prototype version of the Twenty-One system has been subjected to the Cross Language track tests in order to set baseline performances. The runs were based on query translation using dictionaries and corpus based disambiguation methods