Search CORE

4 research outputs found

Text retrieval from early printed books

Author: Marinai Simone
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Retrieval from Document Image Collections

Author: A. Balasubramanian
Balasubramanian Million Meshesha
C. V. Jawahar
Publication venue: Springer
Publication date: 01/01/2006
Field of study

This paper presents a system for retrieval of relevant documents from large document image collections. We achieve e#ective search and retrieval from a large collection of printed document images by matching image features at word-level. For representations of the words, profile-based and shape-based features are employed. A novel DTWbased partial matching scheme is employed to take care of morphologically variant words. This is useful for grouping together similar words during the indexing process. The system supports cross-lingual search using OM-Trans transliteration and a dictionary-based approach. Systemlevel issues for retrieval (eg. scalability, e#ective delivery etc.) are addressed in this paper

CiteSeerX

Issues in Cross-Language Retrieval from Document Image Collections

Author: Douglas Oard
Publication venue
Publication date
Field of study

Over the past decade, broad-coverage crosslanguage text retrieval has progressed from isolated experiments on small collections to establish credible performance in large-scale evaluations. Extending this capability to document image collections presents some additional challenges that have not yet been well explored. This paper presents a general framework for cross-language retrieval, specializes that framework to retrieval from document image collections, and identifies opportunities for closer integration of the key enabling technologies and resources. 1 Introduction Information retrieval systems seek to help users obtain information objects from large collections [2]. Early systems typically relied on manually assigned indexing terms, and such "controlled vocabulary" techniques were widely used in libraries to support the retrieval of printed documents. As storage costs declined and processing power improved, "free text" searching became cost effective and was widely deployed. ..

CiteSeerX

Issues in Cross-Language Retrieval from

Author
Publication venue
Publication date
Field of study

Abstract Over the past decade, broad-coverage crosslanguage text retrieval has progressed from isolated experiments on small collections to establish credible performance in large-scale evaluations. Extending this capability to document image collections presents some additional challenges that have not yet been well explored. This paper presents a general framework for cross-language retrieval, specializes that framework to retrieval from document image collections, and identifies opportunities for closer integration of the key enabling technologies and resources

CiteSeerX