1 research outputs found
Genre identification and goal-focused summarization
In this paper, we present a novel technique of first performing document genre identification, then utilizing the genre for producing tailored summaries based on a user’s information seeking needs – genre oriented goal-focused summarization – such as a plot or opinion summary of a movie review. We create a test corpus to determine genre classification accuracy for 16 genres, and examine performance on various amounts of training data for machine learning algorithms- Random Forests, SVM light and Naïve Bayes. Results show that Random Forests outperforms SVM light and Naïve Bayes. The genre tag is used to inform a downstream summarization engine. We define types of summaries for 7 genres, create a ground truth corpus and analyze the results of genre oriented goal-focused summarization, showing that this type of user based summarization requires different algorithms than the leading sentence baseline which is known to perform well in the case of news articles