Location of Repository

2.1 Automatic Text Processing Tasks.............................. 4

By Mikaela Keller and Samy BengioMartigny Valais Switzerl, Mikaela Keller and Samy Bengio

Abstract

Abstract. We address in this report the problem of representing formally textual data. First, this problem is replaced in the context of automatic text processing. Then, the weaknesses of the basic document representation, i.e. the bag-of-words representation, are explained and some state-ofthe-art methods claiming to overcome these weaknesses are reviewed. Moreover we propose a novel graphical model, the Theme Topic Mixture Model, which also claims to do so, in addition of giving a probabilistic framework in which documents are considered. 2 IDIAP–RR 03-7

Year: 2003
OAI identifier: oai:CiteSeerX.psu:10.1.1.370.9981
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.idiap.ch/ftp/report... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.