1 research outputs found

    Hierarchical Indexing and Document Matching in BoW

    No full text
    Bow is an on-line bibliographical repository based on a hierarchical concept index to which entries are linked. Searching in the repository should therefore return matching topics from the hierarchy, rather than just a list of entries. Likewise, when new entries are inserted, a search for relevant topics to which they should be linked is required. We develop a vector-based algorithm that creates keyword vectors for the set of competing topics at each node in the hierarchy, and show how its performance improves when domain-specic features are added (such as special handling of topic titles and author names). The results of a 7-fold cross validation on a corpus of some 3,500 entries with a 5-level index are hit ratios in the range of 89-95%, and most of the misclassications are indeed ambiguous to begin with. 1 Introduction An obvious and natural approach to organize a large corpus of data is a hierarchical index { akin to a book's table of contents. The type of corpus we de..
    corecore