12,251 research outputs found
Summarizing Dialogic Arguments from Social Media
Online argumentative dialog is a rich source of information on popular
beliefs and opinions that could be useful to companies as well as governmental
or public policy agencies. Compact, easy to read, summaries of these dialogues
would thus be highly valuable. A priori, it is not even clear what form such a
summary should take. Previous work on summarization has primarily focused on
summarizing written texts, where the notion of an abstract of the text is well
defined. We collect gold standard training data consisting of five human
summaries for each of 161 dialogues on the topics of Gay Marriage, Gun Control
and Abortion. We present several different computational models aimed at
identifying segments of the dialogues whose content should be used for the
summary, using linguistic features and Word2vec features with both SVMs and
Bidirectional LSTMs. We show that we can identify the most important arguments
by using the dialog context with a best F-measure of 0.74 for gun control, 0.71
for gay marriage, and 0.67 for abortion.Comment: Proceedings of the 21th Workshop on the Semantics and Pragmatics of
Dialogue (SemDial 2017
Text Summarization Techniques: A Brief Survey
In recent years, there has been a explosion in the amount of text data from a
variety of sources. This volume of text is an invaluable source of information
and knowledge which needs to be effectively summarized to be useful. In this
review, the main approaches to automatic text summarization are described. We
review the different processes for summarization and describe the effectiveness
and shortcomings of the different methods.Comment: Some of references format have update
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Most modern Issue Tracking Systems (ITSs) for open source software (OSS)
projects allow users to add comments to issues. Over time, these comments
accumulate into discussion threads embedded with rich information about the
software project, which can potentially satisfy the diverse needs of OSS
stakeholders. However, discovering and retrieving relevant information from the
discussion threads is a challenging task, especially when the discussions are
lengthy and the number of issues in ITSs are vast. In this paper, we address
this challenge by identifying the information types presented in OSS issue
discussions. Through qualitative content analysis of 15 complex issue threads
across three projects hosted on GitHub, we uncovered 16 information types and
created a labeled corpus containing 4656 sentences. Our investigation of
supervised, automated classification techniques indicated that, when prior
knowledge about the issue is available, Random Forest can effectively detect
most sentence types using conversational features such as the sentence length
and its position. When classifying sentences from new issues, Logistic
Regression can yield satisfactory performance using textual features for
certain information types, while falling short on others. Our work represents a
nontrivial first step towards tools and techniques for identifying and
obtaining the rich information recorded in the ITSs to support various software
engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering
(ICSE2019
- …