1 research outputs found

    Experimental Assessment of a Threshold Selection Algorithm for Tuning Classifiers in the Field of Hierarchical Text Categorization

    No full text
    Text Categorization is the task of assigning predefined categories to text documents. It can provide conceptual views of document collections and has many important applications in the real world. Nowadays, most of the research on text categorization has focused on mapping text documents to a set of categories among which structural relationships hold. Without loss of generality, let us assume that a classifier entrusted with recognizing documents of a given category outputs a degree of membership, usually a value in [0,1]. The behavior of any such classifier typically depends on an acceptance threshold, which turns the degree of membership into a dichotomous decision. In principle, the problem of finding the best acceptance thresholds for a set of classifiers related by taxonomic relationships is a difficult problem. Hence, any proposal aimed at finding suboptimal solutions to this problem may have great importance, especially in the field of hierarchical text categorization. In this paper, we make an experimental assessment of a greedy threshold selection algorithm aimed at finding a suboptimal combination of thresholds in a hierarchical text categorization setting. The quadratic complexity of the algorithm makes it easier to find good suboptimal solutions even for large taxonomies. Experimental results, performed on Reuters data collections, show that the proposed approach is able to find suboptimal solutions with small computational complexity.
    corecore