70,249 research outputs found

    Efficient algorithms for decision tree cross-validation

    Full text link
    Cross-validation is a useful and generally applicable technique often employed in machine learning, including decision tree induction. An important disadvantage of straightforward implementation of the technique is its computational overhead. In this paper we show that, for decision trees, the computational overhead of cross-validation can be reduced significantly by integrating the cross-validation with the normal decision tree induction process. We discuss how existing decision tree algorithms can be adapted to this aim, and provide an analysis of the speedups these adaptations may yield. The analysis is supported by experimental results.Comment: 9 pages, 6 figures. http://www.cs.kuleuven.ac.be/cgi-bin-dtai/publ_info.pl?id=3478

    Growing Story Forest Online from Massive Breaking News

    Full text link
    We describe our experience of implementing a news content organization system at Tencent that discovers events from vast streams of breaking news and evolves news story structures in an online fashion. Our real-world system has distinct requirements in contrast to previous studies on topic detection and tracking (TDT) and event timeline or graph generation, in that we 1) need to accurately and quickly extract distinguishable events from massive streams of long text documents that cover diverse topics and contain highly redundant information, and 2) must develop the structures of event stories in an online manner, without repeatedly restructuring previously formed stories, in order to guarantee a consistent user viewing experience. In solving these challenges, we propose Story Forest, a set of online schemes that automatically clusters streaming documents into events, while connecting related events in growing trees to tell evolving stories. We conducted extensive evaluation based on 60 GB of real-world Chinese news data, although our ideas are not language-dependent and can easily be extended to other languages, through detailed pilot user experience studies. The results demonstrate the superior capability of Story Forest to accurately identify events and organize news text into a logical structure that is appealing to human readers, compared to multiple existing algorithm frameworks.Comment: Accepted by CIKM 2017, 9 page
    • …
    corecore