1 research outputs found

    A Novel Approach for Compound Document Matching

    No full text
    Abstract. Future digital libraries will not only contain pure text documents, but increasingly hold massive amounts of compound documents which comprise many multimedia objects, e.g. texts, images, audio, and video. Already existing collections of documents, e.g. all electronic health records of one clinic can form a digital library with millions of multimedia objects and a total storage of several terabytes. It is therefore important to provide ways for effective and efficient retrieval for those collections. This paper proposes a novel approach for compound document matching using a filter-and-refinement algorithm for similarity-based retrieval. At the same time, this approach increases the effectiveness by establishing only semantically meaningful matches and providing greater expressiveness in queries by restricting the number of allowed matches to a single query object.
    corecore