Concept based retrieval and information filtering

Abstract

Information filtering has become an important component of modern information systems due to significant increase in its applications. The major goals of a successful Information Filtering System include: to close the representation gap between documents and user profiles, and to classify/categorize documents efficiently as they arrive into the system. In this thesis, we develop a conceptual query model that is close to the user\u27s need and investigate an information filtering method for text categorization/classification. The central idea of our conceptual model is captured in a rule based query model. The proposed approach involves preprocessing of the rule base to generate Minimal Term Sets (MTSs) that speed up the retrieval process. Furthermore, we extend our model into two directions when document terms are non-binary. First, we incorporate the p-Norm model into the process of evaluating MTSs. Second, we adopt the Generalized Vector Space model (GVSM) in which the term-term association is well established. For text classification/categorization, we investigate a steepest descent induction algorithm combined with a two-level preference relation on user ranking. The performance of the proposed algorithm is evaluated experimentally. The experiments are conducted using Reuters-21578 data collection. Finally, we demonstrate effectiveness of proposed method by comparing experimental results to other inductive methods

    Similar works