Search CORE

2 research outputs found

A Generative Model for Statistical Determination of Information Content from Conversation Threads

Author: Malik Magdon-ismail
William A. Wallace
Yingjie Zhou
Publication venue
Publication date: 31/03/2010
Field of study

Abstract. We present a generative model for determining the information content of a message without analyzing the message content. Such a tool is useful for automated analysis of the vast contents of online communication which are extensively contaminated by uninformative content, spam, and broadcast. Content analysis is not feasible in such a setting. We propose a purely statistical methodology to determine the information value of a message, which we denote the Information Content Factor (ICF). Underlying our methodology is the definition of information in a message as the message’s ability to generate conversation. The generative nature of our model allows us to estimate the ICF of a message without prior information on the participants. We test our approach by applying it to separating spam/broadcast messages from non-spam/nonbroadcast. Our algorithms achieve 94 % accuracy when tested against a human classifier which analyzed content.

CiteSeerX

A Generative Model for Statistical Determination of Information Content from Conversation Threads ABSTRACT

Author: Malik Magdon-ismail
Mark Goldberg
Yingjie Zhou
Publication venue
Publication date: 31/03/2010
Field of study

We present a generative model for determining the information content of a message without analyzing the message content. Such a tool is useful for automated analysis of the vast contents of online communication which are extensively contaminated by uninformative, spam, and broadcast. Content analysis is not feasible in such a setting. We propose a purely statistical methodology to determine the information value of a message, which we denote the Information Content Factor (ICF). Underlying our methodology is the definition of information in a message as the message’s ability to generate conversation. The generative nature of our model allows us to estimate the ICF of a message without prior information on the participants. We test our approach by applying it to separating spam/broadcast messages from non-spam/non-broadcast. Our algorithms achieve 94 % accurac

CiteSeerX