Location of Repository

Automatic keyword extraction for the meeting corpus using supervised approach and bigram expansion

By Fei Liu, Feifan Liu and Yang Liu

Abstract

In this paper, we tackle the problem of automatic keyword extraction in the meeting domain, a genre significantly different from written text. For the supervised framework, we proposed a rich set of features beyond the typical TFIDF measures, such as sentence salience weight, lexical features, summary sentences, and speaker information. We also evaluate different candidate sampling approaches for better model training and testing. In addition, we introduced a bigram expansion module which aims at extracting “entity bigrams” using Web resources. Using the ICSI meeting corpus, we demonstrate the effectiveness of the features and show that the supervised method and the bigram expansion module outperform the unsupervised TFIDF selection with POS (part-of-speech) filtering. Finally, we show the approaches introduced in this paper perform well on the speech recognition output

Topics: meeting transcripts, TFIDF
Year: 2008
OAI identifier: oai:CiteSeerX.psu:10.1.1.417.6330
Provided by: CiteSeerX
Download PDF:
Sorry, we are unable to provide the full text but you may find it at the following location(s):
  • http://citeseerx.ist.psu.edu/v... (external link)
  • http://www.hlt.utdallas.edu/~y... (external link)
  • Suggested articles


    To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.