Skip to main content
Article thumbnail
Location of Repository

Fast & Confident Probabilistic Categorization

By Cyril Goutte

Abstract

We describe NRC's submission to the Anomaly Detection/Text Mining competition organised at the Text Mining Workshop 2007. This submission relies on a straightforward implementation of the probabilistic categoriser described in (Gaussier et al., ECIR'02). This categoriser is adapted to handle multiple labelling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labelling confidence. This technique achieves a score of 1.689 on the test data

Topics: Statistical Models, Computational Linguistics, Machine Learning
Year: 2007
OAI identifier: oai:cogprints.org:5626
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://cogprints.org/5626/1/go... (external link)
  • http://cogprints.org/5626/ (external link)
  • Suggested articles

    Citations

    1. (1998). A Comparison of Event Models for Naive Bayes Text Classification.
    2. (2002). A hierarchical model for clustering and categorising documents.
    3. (1997). A Probabilistic Approach to Confidence Estimation and Evaluation.
    4. (2005). A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.
    5. (2006). Confidence Estimation for NLP Applications.
    6. (1977). Maximum likelihood from incomplete data via the EM algorithm.
    7. (2004). Method for multi-class, multi-label categorization using probabilistic hierarchical modeling.
    8. (1999). Multi-Label Text Classification with a Mixture Model Trained by EM.
    9. (1999). Probabilistic latent semantic analysis.
    10. (1998). Text Categorization with Suport Vector Machines: Learning with Many Relevant Features.

    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.