340 research outputs found
Combining Visual Layout and Lexical Cohesion Features for Text Segmentation
We propose integrating features from lexical cohesion with elements from layout recognition to build a composite framework. We use supervised machine learning on this composite feature set to derive discourse structure on the topic level. We demonstrate a system based on this principle and use both an intrinsic evaluation as well as the task of genre classification to assess its performance
#mytweet via Instagram: Exploring User Behaviour across Multiple Social Networks
We study how users of multiple online social networks (OSNs) employ and share
information by studying a common user pool that use six OSNs - Flickr, Google+,
Instagram, Tumblr, Twitter, and YouTube. We analyze the temporal and topical
signature of users' sharing behaviour, showing how they exhibit distinct
behaviorial patterns on different networks. We also examine cross-sharing
(i.e., the act of user broadcasting their activity to multiple OSNs
near-simultaneously), a previously-unstudied behaviour and demonstrate how
certain OSNs play the roles of originating source and destination sinks.Comment: IEEE/ACM International Conference on Advances in Social Networks
Analysis and Mining, 2015. This is the pre-peer reviewed version and the
final version is available at
http://wing.comp.nus.edu.sg/publications/2015/lim-et-al-15.pd
A PDTB-Styled End-to-End Discourse Parser
We have developed a full discourse parser in the Penn Discourse Treebank
(PDTB) style. Our trained parser first identifies all discourse and
non-discourse relations, locates and labels their arguments, and then
classifies their relation types. When appropriate, the attribution spans to
these relations are also determined. We present a comprehensive evaluation from
both component-wise and error-cascading perspectives.Comment: 15 pages, 5 figures, 7 table
Resources for Evaluation of Summarization Techniques
We report on two corpora to be used in the evaluation of component systems
for the tasks of (1) linear segmentation of text and (2) summary-directed
sentence extraction. We present characteristics of the corpora, methods used in
the collection of user judgments, and an overview of the application of the
corpora to evaluating the component system. Finally, we discuss the problems
and issues with construction of the test set which apply broadly to the
construction of evaluation resources for language technologies.Comment: LaTeX source, 5 pages, US Letter, uses lrec98.st
- …
