120 research outputs found
New Alignment Methods for Discriminative Book Summarization
We consider the unsupervised alignment of the full text of a book with a
human-written summary. This presents challenges not seen in other text
alignment problems, including a disparity in length and, consequent to this, a
violation of the expectation that individual words and phrases should align,
since large passages and chapters can be distilled into a single summary
phrase. We present two new methods, based on hidden Markov models, specifically
targeted to this problem, and demonstrate gains on an extractive book
summarization task. While there is still much room for improvement,
unsupervised alignment holds intrinsic value in offering insight into what
features of a book are deemed worthy of summarization.Comment: This paper reflects work in progres
Social Meme-ing: Measuring Linguistic Variation in Memes
Much work in the space of NLP has used computational methods to explore
sociolinguistic variation in text. In this paper, we argue that memes, as
multimodal forms of language comprised of visual templates and text, also
exhibit meaningful social variation. We construct a computational pipeline to
cluster individual instances of memes into templates and semantic variables,
taking advantage of their multimodal structure in doing so. We apply this
method to a large collection of meme images from Reddit and make available the
resulting \textsc{SemanticMemes} dataset of 3.8M images clustered by their
semantic function. We use these clusters to analyze linguistic variation in
memes, discovering not only that socially meaningful variation in meme usage
exists between subreddits, but that patterns of meme innovation and
acculturation within these communities align with previous findings on written
language
- …