6 research outputs found
Growing Story Forest Online from Massive Breaking News
We describe our experience of implementing a news content organization system
at Tencent that discovers events from vast streams of breaking news and evolves
news story structures in an online fashion. Our real-world system has distinct
requirements in contrast to previous studies on topic detection and tracking
(TDT) and event timeline or graph generation, in that we 1) need to accurately
and quickly extract distinguishable events from massive streams of long text
documents that cover diverse topics and contain highly redundant information,
and 2) must develop the structures of event stories in an online manner,
without repeatedly restructuring previously formed stories, in order to
guarantee a consistent user viewing experience. In solving these challenges, we
propose Story Forest, a set of online schemes that automatically clusters
streaming documents into events, while connecting related events in growing
trees to tell evolving stories. We conducted extensive evaluation based on 60
GB of real-world Chinese news data, although our ideas are not
language-dependent and can easily be extended to other languages, through
detailed pilot user experience studies. The results demonstrate the superior
capability of Story Forest to accurately identify events and organize news text
into a logical structure that is appealing to human readers, compared to
multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page