294 research outputs found
Improving Developer Profiling and Ranking to Enhance Bug Report Assignment
Bug assignment plays a critical role in the bug fixing process. However, bug assignment can be a burden for projects receiving a large number of bug reports. If a bug is assigned to a developer who lacks sufficient expertise to appropriately address it, the software project can be adversely impacted in terms of quality, developer hours, and aggregate cost. An automated strategy that provides a list of developers ranked by suitability based on their development history and the development history of the project can help teams more quickly and more accurately identify the appropriate developer for a bug report, potentially resulting in an increase in productivity. To automate the process of assigning bug reports to the appropriate developer, several studies have employed an approach that combines natural language processing and information retrieval techniques to extract two categories of features: one targeting developers who have fixed similar bugs before and one targeting developers who have worked on source files similar to the description of the bug. As developers document their changes through their commit messages it represents another rich resource for profiling their expertise, as the language used in commit messages typically more closely matches the language used in bug reports. In this study, we have replicated the approach presented in [32] that applies a learning-to-rank technique to rank appropriate developers for each bug report. Additionally, we have extended the study by proposing an additional set of features to better profile a developer through their commit logs and through the API project descriptions referenced in their code changes. Furthermore, we explore the appropriateness of a joint recommendation approach employing a learning-to-rank technique and an ordinal regression technique. To evaluate our model, we have considered more than 10,000 bug reports with their appropriate assignees. The experimental results demonstrate the efficiency of our model in comparison with the state-of-the-art methods in recommending developers for open bug reports
Evaluating an assistant for creating bug report assignment recommenders
Software development projects receive many change requests each day and each report must be examined to decide how the request will be handled by the project. One decision that is frequently made is to which software developer to assign the change request. Efforts have been made toward semi automating this decision,with most approaches using machine learning algorithms. However, using machine learning to create an assignment recommender is a complex process that must be tailored to each individual software development project. The Creation Assistant for Easy Assignment (CASEA) tool leverages a project member’s knowledge for creating an assignment recommender. This paper presents the results of a user study using CASEA. The user study shows that users with limited project knowledge can quickly create accurate bug report assignment recommenders.Ye
DeCaf: Diagnosing and Triaging Performance Issues in Large-Scale Cloud Services
Large scale cloud services use Key Performance Indicators (KPIs) for tracking
and monitoring performance. They usually have Service Level Objectives (SLOs)
baked into the customer agreements which are tied to these KPIs. Dependency
failures, code bugs, infrastructure failures, and other problems can cause
performance regressions. It is critical to minimize the time and manual effort
in diagnosing and triaging such issues to reduce customer impact. Large volume
of logs and mixed type of attributes (categorical, continuous) in the logs
makes diagnosis of regressions non-trivial.
In this paper, we present the design, implementation and experience from
building and deploying DeCaf, a system for automated diagnosis and triaging of
KPI issues using service logs. It uses machine learning along with pattern
mining to help service owners automatically root cause and triage performance
issues. We present the learnings and results from case studies on two large
scale cloud services in Microsoft where DeCaf successfully diagnosed 10 known
and 31 unknown issues. DeCaf also automatically triages the identified issues
by leveraging historical data. Our key insights are that for any such diagnosis
tool to be effective in practice, it should a) scale to large volumes of
service logs and attributes, b) support different types of KPIs and ranking
functions, c) be integrated into the DevOps processes.Comment: To be published in the proceedings of ICSE-SEIP '20, Seoul, Republic
of Kore
Supporting Development Decisions with Software Analytics
Software practitioners make technical and business decisions based on the understanding they have of their software systems. This understanding is grounded in their own experiences, but can be augmented by studying various kinds of development artifacts, including source code, bug reports, version control meta-data, test cases, usage logs, etc. Unfortunately, the information contained in these artifacts is typically not organized in the way that is immediately useful to developers’ everyday decision making needs. To handle the large volumes of data, many practitioners and researchers have turned to analytics — that is, the use of analysis, data, and systematic reasoning for making decisions.
The thesis of this dissertation is that by employing software analytics to various development tasks and activities, we can provide software practitioners better insights into their processes, systems, products, and users, to help them make more informed data-driven decisions. While quantitative analytics can help project managers understand the big picture of their systems, plan for its future, and monitor trends, qualitative analytics can enable developers to perform their daily tasks and activities more quickly by helping them better manage high volumes of information.
To support this thesis, we provide three different examples of employing software analytics. First, we show how analysis of real-world usage data can be used to assess user dynamic behaviour and adoption trends of a software system by revealing valuable information on how software systems are used in practice.
Second, we have created a lifecycle model that synthesizes knowledge from software development artifacts, such as reported issues, source code, discussions, community contributions, etc. Lifecycle models capture the dynamic nature of how various development artifacts change over time in an annotated graphical form that can be easily understood and communicated. We demonstrate how lifecycle models can be generated and present industrial case studies where we apply these models to assess the code review process of three different projects.
Third, we present a developer-centric approach to issue tracking that aims to reduce information overload and improve developers’ situational awareness. Our approach is motivated by a grounded theory study of developer interviews, which suggests that customized views of a project’s repositories that are tailored to developer-specific tasks can help developers better track their progress and understand the surrounding technical context of their working environments. We have created a model of the kinds of information elements that developers feel are essential in completing their daily tasks, and from this model we have developed a prototype tool organized around developer-specific customized dashboards.
The results of these three studies show that software analytics can inform evidence-based decisions related to user adoption of a software project, code review processes, and improved developers’ awareness on their daily tasks and activities
Large Language Models for Software Engineering: A Systematic Literature Review
Large Language Models (LLMs) have significantly impacted numerous domains,
notably including Software Engineering (SE). Nevertheless, a well-rounded
understanding of the application, effects, and possible limitations of LLMs
within SE is still in its early stages. To bridge this gap, our systematic
literature review takes a deep dive into the intersection of LLMs and SE, with
a particular focus on understanding how LLMs can be exploited in SE to optimize
processes and outcomes. Through a comprehensive review approach, we collect and
analyze a total of 229 research papers from 2017 to 2023 to answer four key
research questions (RQs). In RQ1, we categorize and provide a comparative
analysis of different LLMs that have been employed in SE tasks, laying out
their distinctive features and uses. For RQ2, we detail the methods involved in
data collection, preprocessing, and application in this realm, shedding light
on the critical role of robust, well-curated datasets for successful LLM
implementation. RQ3 allows us to examine the specific SE tasks where LLMs have
shown remarkable success, illuminating their practical contributions to the
field. Finally, RQ4 investigates the strategies employed to optimize and
evaluate the performance of LLMs in SE, as well as the common techniques
related to prompt optimization. Armed with insights drawn from addressing the
aforementioned RQs, we sketch a picture of the current state-of-the-art,
pinpointing trends, identifying gaps in existing research, and flagging
promising areas for future study
Finding Cross-rule Optimization Bugs in Datalog Engines
Datalog is a popular and widely-used declarative logic programming language.
Datalog engines apply many cross-rule optimizations; bugs in them can cause
incorrect results. To detect such optimization bugs, we propose an automated
testing approach called Incremental Rule Evaluation (IRE), which
synergistically tackles the test oracle and test case generation problem. The
core idea behind the test oracle is to compare the results of an optimized
program and a program without cross-rule optimization; any difference indicates
a bug in the Datalog engine. Our core insight is that, for an optimized,
incrementally-generated Datalog program, we can evaluate all rules individually
by constructing a reference program to disable the optimizations that are
performed among multiple rules. Incrementally generating test cases not only
allows us to apply the test oracle for every new rule generated-we also can
ensure that every newly added rule generates a non-empty result with a given
probability and eschew recomputing already-known facts. We implemented IRE as a
tool named Deopt, and evaluated Deopt on four mature Datalog engines, namely
Souffl\'e, CozoDB, Z, and DDlog, and discovered a total of 30 bugs. Of
these, 13 were logic bugs, while the remaining were crash and error bugs. Deopt
can detect all bugs found by queryFuzz, a state-of-the-art approach. Out of the
bugs identified by Deopt, queryFuzz might be unable to detect 5. Our
incremental test case generation approach is efficient; for example, for test
cases containing 60 rules, our incremental approach can produce 1.17
(for DDlog) to 31.02 (for Souffl\'e) as many valid test cases with
non-empty results as the naive random method. We believe that the simplicity
and the generality of the approach will lead to its wide adoption in practice.Comment: The ACM SIGPLAN Conference on Object Oriented Programming, Systems,
Languages, and Applications (2024), Pasadena, California, United State
Recommended from our members
Leveraging the Power of Crowds: Automated Test Report Processing for The Maintenance of Mobile Applications
Crowdsourcing is an emerging distributed problem-solving model combining human and machine computation. It collects intelligence and knowledge from a large and diverse workforce to complete complex tasks. In the software engineering domain, crowdsourced techniques have been adopted to facilitate various tasks, such as design, testing, debugging, development, and so on. Specifically, in crowdsourced testing, crowdsourced workers are given testing tasks to perform and submit their feedback in the form of test reports. One of the key advantages of crowdsourced testing is that it is capable of providing engineers software engineers with domain knowledge and feedback from a large number of real users. Based on diverse software and hardware settings of these users, engineers can bugs that are not caught by traditional quality assurance techniques. Such benefits are particularly ideal for mobile application testing, which needs rapid development-and-deployment iterations and support diverse execution environments. However, crowdsourced testing naturally generates an overwhelming number of crowdsourced test reports, and inspecting such a large number of reports becomes a time-consuming yet inevitable task. This dissertation presents a series of techniques, tools and experiments to assist in crowdsourced report processing. These techniques are designed for improving this task in multiple aspects: 1. prioritizing crowdsourced report to assist engineers in finding as many unique bugs as possible, and as quickly as possible; 2. grouping crowdsourced report to assist engineers in identifying the representative ones in a short time; 3. summarizing the duplicate reports to provide engineers with a concise and accurate understanding of a group of reports; In the first step, I present a text-analysis-based technique to prioritize test reports for manual inspection. This technique leverages two key strategies: (1) a diversity strategy to help developers inspect a wide variety of test reports and to avoid duplicates and wasted effort on falsely classified faulty behavior, and (2) a risk-assessment strategy to help developers identify test reports that may be more likely to be fault-revealing based on past observations.Together, these two strategies form our technique to prioritize test reports in crowdsourced testing. Moreover, in the mobile testing domain, test reports often consist of more screenshots and shorter descriptive text, and thus text-analysis-based techniques may be ineffective or inapplicable. The shortage and ambiguity of natural-language text information and the well-defined screenshots of activity views within mobile applications motivate me to propose a novel technique based on using image understanding for multi-objective test-report prioritization. This technique employs the Spatial Pyramid Matching (SPM) technique to measure the similarity of the screenshots, and apply the natural-language processing technique to measure the distance between the text of test reports. Next, I design and implement CTRAS: a novel approach to leveraging duplicates to enrich the content of bug descriptions and improve the efficiency of inspecting these reports. CTRAS is capable of automatically aggregating duplicates based on both textual information and screenshots, and further summarizes the duplicate test reports into a comprehensive and comprehensible report.I validate all of these techniques on industrial data by collaborating with several companies. The results show my techniques can improve both the efficiency and effectiveness of crowdsourced test report processing. Also, I suggest settings for different usage scenarios and discuss future research directions
A market for trading software issues
The security of software is becoming increasingly important. Open source software forms much of our digital infrastructure. It, however, contains vulnerabilities which have been exploited, attracted public attention, and caused large financial damages. This article proposes a solution to shortcomings in the current economic situation of open source software development. The main idea is to introduce price signals into the peer production of software. This is achieved through a trading market for futures contracts on the status of software issues. Users, who value secure software, gain the possibility to predict outcomes and incentivize work, strengthening collaboration and information sharing in open source software development. The design of such a trading market is discussed and a prototype introduced. The feasibility of the trading market design is corroborated in a proof-of-concept implementation and simulation. Preliminary results show that the implementation works and can be used for future experiments. Several directions for future research result from this article, which contributes to peer production, software development practices, and incentives design
- …