549 research outputs found

    Supporting Source Code Search with Context-Aware and Semantics-Driven Query Reformulation

    Get PDF
    Software bugs and failures cost trillions of dollars every year, and could even lead to deadly accidents (e.g., Therac-25 accident). During maintenance, software developers fix numerous bugs and implement hundreds of new features by making necessary changes to the existing software code. Once an issue report (e.g., bug report, change request) is assigned to a developer, she chooses a few important keywords from the report as a search query, and then attempts to find out the exact locations in the software code that need to be either repaired or enhanced. As a part of this maintenance, developers also often select ad hoc queries on the fly, and attempt to locate the reusable code from the Internet that could assist them either in bug fixing or in feature implementation. Unfortunately, even the experienced developers often fail to construct the right search queries. Even if the developers come up with a few ad hoc queries, most of them require frequent modifications which cost significant development time and efforts. Thus, construction of an appropriate query for localizing the software bugs, programming concepts or even the reusable code is a major challenge. In this thesis, we overcome this query construction challenge with six studies, and develop a novel, effective code search solution (BugDoctor) that assists the developers in localizing the software code of interest (e.g., bugs, concepts and reusable code) during software maintenance. In particular, we reformulate a given search query (1) by designing novel keyword selection algorithms (e.g., CodeRank) that outperform the traditional alternatives (e.g., TF-IDF), (2) by leveraging the bug report quality paradigm and source document structures which were previously overlooked and (3) by exploiting the crowd knowledge and word semantics derived from Stack Overflow Q&A site, which were previously untapped. Our experiment using 5000+ search queries (bug reports, change requests, and ad hoc queries) suggests that our proposed approach can improve the given queries significantly through automated query reformulations. Comparison with 10+ existing studies on bug localization, concept location and Internet-scale code search suggests that our approach can outperform the state-of-the-art approaches with a significant margin

    Utilizing public repositories to improve the decision process for security defect resolution and information reuse in the development environment

    Get PDF
    Security risks are contained in solutions in software systems that could have been avoided if the design choices were analyzed by using public information security data sources. Public security sources have been shown to contain more relevant and recent information on current technologies than any textbook or research article, and these sources are often used by developers for solving software related problems. However, solutions copied from public discussion forums such as StackOverflow may contain security implications when copied directly into the developers environment. Several different methods to identify security bugs are being implemented, and recent efforts are looking into identifying security bugs from communication artifacts during software development lifecycle as well as using public security information sources to support secure design and development. The primary goal of this thesis is to investigate how to utilize public information sources to reduce security defects in software artifacts through improving the decision process for defect resolution and information reuse in the development environment. We build a data collection tool for collecting data from public information security sources and public discussion forums, construct machine learning models for classifying discussion forum posts and bug reports as security or not-security related, as well as word embedding models for finding matches between public security sources and public discussion forum posts or bug reports. The results of this thesis demonstrate that using public information security sources can provide additional validation layers for defect classification models, as well as provide additional security context for public discussion forum posts. The contributions of this thesis are to provide understanding of how public information security sources can better provide context for bug reports and discussion forums. Additionally, we provide data collection APIs for collecting datasets from these sources, and classification and word embedding models for recommending related security sources for bug reports and public discussion forum posts.Masteroppgave i Programutvikling samarbeid med HVLPROG399MAMN-PRO

    Improving Developer Profiling and Ranking to Enhance Bug Report Assignment

    Get PDF
    Bug assignment plays a critical role in the bug fixing process. However, bug assignment can be a burden for projects receiving a large number of bug reports. If a bug is assigned to a developer who lacks sufficient expertise to appropriately address it, the software project can be adversely impacted in terms of quality, developer hours, and aggregate cost. An automated strategy that provides a list of developers ranked by suitability based on their development history and the development history of the project can help teams more quickly and more accurately identify the appropriate developer for a bug report, potentially resulting in an increase in productivity. To automate the process of assigning bug reports to the appropriate developer, several studies have employed an approach that combines natural language processing and information retrieval techniques to extract two categories of features: one targeting developers who have fixed similar bugs before and one targeting developers who have worked on source files similar to the description of the bug. As developers document their changes through their commit messages it represents another rich resource for profiling their expertise, as the language used in commit messages typically more closely matches the language used in bug reports. In this study, we have replicated the approach presented in [32] that applies a learning-to-rank technique to rank appropriate developers for each bug report. Additionally, we have extended the study by proposing an additional set of features to better profile a developer through their commit logs and through the API project descriptions referenced in their code changes. Furthermore, we explore the appropriateness of a joint recommendation approach employing a learning-to-rank technique and an ordinal regression technique. To evaluate our model, we have considered more than 10,000 bug reports with their appropriate assignees. The experimental results demonstrate the efficiency of our model in comparison with the state-of-the-art methods in recommending developers for open bug reports
    • …
    corecore