4 research outputs found

    Text retrieval from early printed books

    Get PDF

    Retrieval from Document Image Collections

    No full text
    This paper presents a system for retrieval of relevant documents from large document image collections. We achieve e#ective search and retrieval from a large collection of printed document images by matching image features at word-level. For representations of the words, profile-based and shape-based features are employed. A novel DTWbased partial matching scheme is employed to take care of morphologically variant words. This is useful for grouping together similar words during the indexing process. The system supports cross-lingual search using OM-Trans transliteration and a dictionary-based approach. Systemlevel issues for retrieval (eg. scalability, e#ective delivery etc.) are addressed in this paper

    Issues in Cross-Language Retrieval from Document Image Collections

    No full text
    Over the past decade, broad-coverage crosslanguage text retrieval has progressed from isolated experiments on small collections to establish credible performance in large-scale evaluations. Extending this capability to document image collections presents some additional challenges that have not yet been well explored. This paper presents a general framework for cross-language retrieval, specializes that framework to retrieval from document image collections, and identifies opportunities for closer integration of the key enabling technologies and resources. 1 Introduction Information retrieval systems seek to help users obtain information objects from large collections [2]. Early systems typically relied on manually assigned indexing terms, and such "controlled vocabulary" techniques were widely used in libraries to support the retrieval of printed documents. As storage costs declined and processing power improved, "free text" searching became cost effective and was widely deployed. ..

    Issues in Cross-Language Retrieval from

    No full text
    Abstract Over the past decade, broad-coverage crosslanguage text retrieval has progressed from isolated experiments on small collections to establish credible performance in large-scale evaluations. Extending this capability to document image collections presents some additional challenges that have not yet been well explored. This paper presents a general framework for cross-language retrieval, specializes that framework to retrieval from document image collections, and identifies opportunities for closer integration of the key enabling technologies and resources