Search CORE

2 research outputs found

Using Scatterplots to Understand and Improve Probabilistic Models for Text Categorization and Retrieval

Author: DI NUNZIO GIORGIO MARIA
Publication venue: 'Elsevier BV'
Publication date: 01/01/2009
Field of study

The two--dimensional representation of documents which allows documents to be represented on a two-dimensional Cartesian plane has proved to be a valid visualization tool for \ac{ATC} for understanding the relationships between categories of textual documents, and to help users to visually audit the classifier and identify suspicious training data. In this paper, we analyze a specific use of this visualization approach in the case of the \ac{NB} model for text classification and the \ac{BIM} for text retrieval. For text categorization, a reformulation of the equation for the decision of classification has to be written in such a way that each coordinate of a document is the sum of two addends: a variable component

\mathrm{P}(d | c_i)

, and a constant component

\mathrm{P}(c_i)

, the prior of the category. When plotted on the Cartesian plane according to this formulation, the documents that are constantly shifted along the x-axis and the y-axis can be seen. This effect of shifting is more or less evident according to which \ac{NB} model, Bernoulli or multinomial, is chosen. For text retrieval, the same reformulation can be applied in the case of the \ac{BIM} model. The visualization help to understand what are the decisions that are taken in order to order the documents, in particular in the case of relevance feedback

Elsevier - Publisher Connector

Archivio istituzionale della ricerca - Università di Padova

Using scatterplots to understand and improve probabilistic models for text categorization and retrieval

Author: Damerau
Di Nunzio
Ferreira de Oliveria
Fuhr
Giorgio Maria Di Nunzio
Hearst
Keim
Keim
Kohonen
Middleton
Mladenič
Rifkin
Robertson
Sebastiani
Van Rijsbergen
Zang
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref