340 research outputs found

    Combining Visual Layout and Lexical Cohesion Features for Text Segmentation

    Get PDF
    We propose integrating features from lexical cohesion with elements from layout recognition to build a composite framework. We use supervised machine learning on this composite feature set to derive discourse structure on the topic level. We demonstrate a system based on this principle and use both an intrinsic evaluation as well as the task of genre classification to assess its performance

    #mytweet via Instagram: Exploring User Behaviour across Multiple Social Networks

    Full text link
    We study how users of multiple online social networks (OSNs) employ and share information by studying a common user pool that use six OSNs - Flickr, Google+, Instagram, Tumblr, Twitter, and YouTube. We analyze the temporal and topical signature of users' sharing behaviour, showing how they exhibit distinct behaviorial patterns on different networks. We also examine cross-sharing (i.e., the act of user broadcasting their activity to multiple OSNs near-simultaneously), a previously-unstudied behaviour and demonstrate how certain OSNs play the roles of originating source and destination sinks.Comment: IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, 2015. This is the pre-peer reviewed version and the final version is available at http://wing.comp.nus.edu.sg/publications/2015/lim-et-al-15.pd

    A PDTB-Styled End-to-End Discourse Parser

    Full text link
    We have developed a full discourse parser in the Penn Discourse Treebank (PDTB) style. Our trained parser first identifies all discourse and non-discourse relations, locates and labels their arguments, and then classifies their relation types. When appropriate, the attribution spans to these relations are also determined. We present a comprehensive evaluation from both component-wise and error-cascading perspectives.Comment: 15 pages, 5 figures, 7 table

    Resources for Evaluation of Summarization Techniques

    Full text link
    We report on two corpora to be used in the evaluation of component systems for the tasks of (1) linear segmentation of text and (2) summary-directed sentence extraction. We present characteristics of the corpora, methods used in the collection of user judgments, and an overview of the application of the corpora to evaluating the component system. Finally, we discuss the problems and issues with construction of the test set which apply broadly to the construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st
    corecore