A Comparison of TextCategorization Methods Applied to NGram Frequency Statistics

Abstract

Abstract. This paper gives an analysis of multi-class e-mail categoriza-tion performance, comparing a character n-gram document representa-tion against a word-frequency based representation. Furthermore the im-pact of using available e-mail specific meta-information on classification performance is explored and the findings are presented.

Similar works

Full text

thumbnail-image

CiteSeerX

redirect
Last time updated on 28/10/2017

This paper was published in CiteSeerX.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.