Live blogs are an increasingly popular news format to cover breaking news and
live events in online journalism. Online news websites around the world are
using this medium to give their readers a minute by minute update on an event.
Good summaries enhance the value of the live blogs for a reader but are often
not available. In this paper, we study a way of collecting corpora for
automatic live blog summarization. In an empirical evaluation using well-known
state-of-the-art summarization systems, we show that live blogs corpus poses
new challenges in the field of summarization. We make our tools publicly
available to reconstruct the corpus to encourage the research community and
replicate our results.Comment: To appear in the Proceedings of LREC 201