Skip to main content
Article thumbnail
Location of Repository

Fast & Confident Probabilistic Categorization

By Cyril Goutte


We describe NRC's submission to the Anomaly Detection/Text Mining competition organised at the Text Mining Workshop 2007. This submission relies on a straightforward implementation of the probabilistic categoriser described in (Gaussier et al., ECIR'02). This categoriser is adapted to handle multiple labelling and a piecewise-linear confidence estimation layer is added to provide an estimate of the labelling confidence. This technique achieves a score of 1.689 on the test data

Topics: Statistical Models, Computational Linguistics, Machine Learning
Year: 2007
OAI identifier:

Suggested articles


  1. (1998). A Comparison of Event Models for Naive Bayes Text Classification.
  2. (2002). A hierarchical model for clustering and categorising documents.
  3. (1997). A Probabilistic Approach to Confidence Estimation and Evaluation.
  4. (2005). A Probabilistic Interpretation of Precision, Recall and F-Score, with Implication for Evaluation.
  5. (2006). Confidence Estimation for NLP Applications.
  6. (1977). Maximum likelihood from incomplete data via the EM algorithm.
  7. (2004). Method for multi-class, multi-label categorization using probabilistic hierarchical modeling.
  8. (1999). Multi-Label Text Classification with a Mixture Model Trained by EM.
  9. (1999). Probabilistic latent semantic analysis.
  10. (1998). Text Categorization with Suport Vector Machines: Learning with Many Relevant Features.

To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.