3,261 research outputs found
Data-Driven Application Maintenance: Views from the Trenches
In this paper we present our experience during design, development, and pilot
deployments of a data-driven machine learning based application maintenance
solution. We implemented a proof of concept to address a spectrum of
interrelated problems encountered in application maintenance projects including
duplicate incident ticket identification, assignee recommendation, theme
mining, and mapping of incidents to business processes. In the context of IT
services, these problems are frequently encountered, yet there is a gap in
bringing automation and optimization. Despite long-standing research around
mining and analysis of software repositories, such research outputs are not
adopted well in practice due to the constraints these solutions impose on the
users. We discuss need for designing pragmatic solutions with low barriers to
adoption and addressing right level of complexity of problems with respect to
underlying business constraints and nature of data.Comment: Earlier version of paper appearing in proceedings of the 4th
International Workshop on Software Engineering Research and Industrial
Practice (SER&IP), IEEE Press, pp. 48-54, 201
Analysis and Detection of Information Types of Open Source Software Issue Discussions
Most modern Issue Tracking Systems (ITSs) for open source software (OSS)
projects allow users to add comments to issues. Over time, these comments
accumulate into discussion threads embedded with rich information about the
software project, which can potentially satisfy the diverse needs of OSS
stakeholders. However, discovering and retrieving relevant information from the
discussion threads is a challenging task, especially when the discussions are
lengthy and the number of issues in ITSs are vast. In this paper, we address
this challenge by identifying the information types presented in OSS issue
discussions. Through qualitative content analysis of 15 complex issue threads
across three projects hosted on GitHub, we uncovered 16 information types and
created a labeled corpus containing 4656 sentences. Our investigation of
supervised, automated classification techniques indicated that, when prior
knowledge about the issue is available, Random Forest can effectively detect
most sentence types using conversational features such as the sentence length
and its position. When classifying sentences from new issues, Logistic
Regression can yield satisfactory performance using textual features for
certain information types, while falling short on others. Our work represents a
nontrivial first step towards tools and techniques for identifying and
obtaining the rich information recorded in the ITSs to support various software
engineering activities and to satisfy the diverse needs of OSS stakeholders.Comment: 41st ACM/IEEE International Conference on Software Engineering
(ICSE2019
- …