Search CORE

3,059 research outputs found

Analysis and Detection of Information Types of Open Source Software Issue Discussions

Author: Arya Deeksha
Cheng Jinghui
Guo Jin L. C.
Wang Wenting
Publication venue
Publication date: 01/01/2019
Field of study

Most modern Issue Tracking Systems (ITSs) for open source software (OSS) projects allow users to add comments to issues. Over time, these comments accumulate into discussion threads embedded with rich information about the software project, which can potentially satisfy the diverse needs of OSS stakeholders. However, discovering and retrieving relevant information from the discussion threads is a challenging task, especially when the discussions are lengthy and the number of issues in ITSs are vast. In this paper, we address this challenge by identifying the information types presented in OSS issue discussions. Through qualitative content analysis of 15 complex issue threads across three projects hosted on GitHub, we uncovered 16 information types and created a labeled corpus containing 4656 sentences. Our investigation of supervised, automated classification techniques indicated that, when prior knowledge about the issue is available, Random Forest can effectively detect most sentence types using conversational features such as the sentence length and its position. When classifying sentences from new issues, Logistic Regression can yield satisfactory performance using textual features for certain information types, while falling short on others. Our work represents a nontrivial first step towards tools and techniques for identifying and obtaining the rich information recorded in the ITSs to support various software engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering (ICSE2019

arXiv.org e-Print Archive

Crossref

PolyPublie

Joint Modeling of Content and Discourse Relations in Dialogues

Author: Kim Joseph
Qin Kechen
Wang Lu
Publication venue
Publication date: 01/01/2017
Field of study

We present a joint modeling approach to identify salient discussion points in spoken meetings as well as to label the discourse relations between speaker turns. A variation of our model is also discussed when discourse relations are treated as latent variables. Experimental results on two popular meeting corpora show that our joint model can outperform state-of-the-art approaches for both phrase-based content selection and discourse relation prediction tasks. We also evaluate our model on predicting the consistency among team members' understanding of their group decisions. Classifiers trained with features constructed from our model achieve significant better predictive performance than the state-of-the-art.Comment: Accepted by ACL 2017. 11 page

arXiv.org e-Print Archive

Crossref

Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier

Author: Asha S.Manek.
Chandra Mohan M.
Deepa Shenoy P.
Venugopal K.R.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/02/2016
Field of study

With the rapid development of the World Wide Web, electronic word-of-mouth interaction has made consumers active participants. Nowadays, a large number of reviews posted by the consumers on the Web provide valuable information to other consumers. Such information is highly essential for decision making and hence popular among the internet users. This information is very valuable not only for prospective consumers to make decisions but also for businesses in predicting the success and sustainability. In this paper, a Gini Index based feature selection method with Support Vector Machine (SVM) classifier is proposed for sentiment classification for large movie review data set. The results show that our Gini Index method has better classification performance in terms of reduced error rate and accuracy

ePrints@Bangalore University

Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback

Author: Aharoni Roee
Bachem Olivier
Cideron Geoffrey
Dadashi Robert
Elidan Gal
Ferret Johan
Geist Matthieu
Girgin Sertan
Hassidim Avinatan
Hussenot Léonard
Keller Orgad
Momchev Nikola
Pietquin Olivier
Ramos Sabela
Roit Paul
Shani Lior
Stanczyk Piotr
Szpektor Idan
Vieillard Nino
Publication venue
Publication date: 31/05/2023
Field of study

Despite the seeming success of contemporary grounded text generation systems, they often tend to generate factually inconsistent text with respect to their input. This phenomenon is emphasized in tasks like summarization, in which the generated summaries should be corroborated by their source article. In this work, we leverage recent progress on textual entailment models to directly address this problem for abstractive summarization systems. We use reinforcement learning with reference-free, textual entailment rewards to optimize for factual consistency and explore the ensuing trade-offs, as improved consistency may come at the cost of less informative or more extractive summaries. Our results, according to both automatic metrics and human evaluation, show that our method considerably improves the faithfulness, salience, and conciseness of the generated summaries.Comment: ACL 202

arXiv.org e-Print Archive