3 research outputs found

    Personalized classification for keyword-based category profiles

    Get PDF
    Personalized classification refers to allowing users to define their own categories and automating the assignment of documents to these categories. In this paper, we examine the use of keywords to define personalized categories and propose the use of Support Vector Machine (SVM) to perform personalized classification. Two scenarios have been investigated. The first assumes that the personalized categories are defined in a flat category space. The second assumes that each personalized category is defined within a pre-defined general category that provides a more specific context for the personalized category. The training documents for personalized categories are obtained from a training document pool using a search engine and a set of keywords. Our experiments have delivered better classification results using the second scenario. We also conclude that the number of keywords used can be very small and increasing them does not always lead to better classification performance

    YFilter at TREC-9

    No full text
    We built a filtering system YFILTER this year, which we used for experiments on profile updating and thresholds setting. Our focus is using incremental Rocchio for introducing new query terms and term weighting. Although 1, 0.5, 0.25 is a widely used Rocchio ratio for query expansion based on relevance feedback, we found that the optimal setting for information filtering is corpus and profile dependent. In addition to a new Rocchio ratio, we tested a modified idf measure for term weighting (ydf) that is biased towards words with middle range term frequency

    YFilter at TREC-9

    No full text
    We built a filtering system YFILTER this year, which we used for experiments on profile updating and thresholds setting. Our focus is using incremental Rocchio for introducing new query terms and term weighting. Although 1, 0.5, 0.25 is a widely used Rocchio ratio for query expansion based on relevance feedback, we found that the optimal setting for information filtering is corpus and profile dependent. In addition to a new Rocchio ratio, we tested a modified idf measure for term weighting (ydf) that is biased towards words with middle range term frequency
    corecore