344 research outputs found

    Automatic Recall of Lessons Learned for Software Project Managers

    Get PDF
    Lessons learned (LL) records constitute a software organization’s memory of successes and failures. LL are recorded within the organization repository for future reference to optimize planning, gain experience, and elevate market competitiveness. However, manually searching this repository is a daunting task, so it is often overlooked. This can lead to the repetition of previous mistakes and missing potential opportunities, which, in turn, can negatively affect the organization’s profitability and competitiveness. In this thesis, we present a novel solution that provides an automatic process to recall relevant LL and to push them to project managers. This substantially reduces the amount of time and effort required to manually search the unstructured LL repositories, and therefore, it encourages the utilization of LL. In this study, we exploit existing project artifacts to build the LL search queries on-the-fly, in order to bypass the tedious manual search process. While most of the current LL recall studies rely on case-based reasoning, they have some limitations including the need to reformat the LL repository, which is impractical, and the need for tight user involvement. This makes us the first to employ information retrieval (IR) to address the LL recall. An empirical study has been conducted to build the automatic LL recall solution and evaluate its effectiveness. In our study, we employ three of the most popular IR models to construct a solution that considers multiple classifier configurations. In addition, we have extended this study by examining the impact of the hybridization of LL classifiers on the classifiers’ performance. Furthermore, a real-world dataset of 212 LL records from 30 different software projects has been used for validation. Top-k and MAP, well-known accuracy metrics, have been used as well. The study results confirm the effectiveness of the automatic LL recall solution by a discerning accuracy of about 70%, which was increased to 74% in the case of hybridization. This eliminates the effort needed to manually search the LL repository, which positively encourages project managers to reuse the available LL knowledge – which in turn avoids old pitfalls and unleash hidden business opportunities

    Intelligent Software Tooling For Improving Software Development

    Get PDF
    Software has eaten the world with many of the necessities and quality of life services people use requiring software. Therefore, tools that improve the software development experience can have a significant impact on the world such as generating code and test cases, detecting bugs, question and answering, etc. The success of Deep Learning (DL) over the past decade has shown huge advancements in automation across many domains, including Software Development processes. One of the main reasons behind this success is the availability of large datasets such as open-source code available through GitHub or image datasets of mobile Graphical User Interfaces (GUIs) with RICO and ReDRAW to be trained on. Therefore, the central research question my dissertation explores is: In what ways can the software development process be improved through leveraging DL techniques on the vast amounts of unstructured software engineering artifacts? We coin the approaches that leverage DL to automate or augment various software development task as Intelligent Software Tools. To guide our research of these intelligent software tools, we performed a systematic literature review to understand the current landscape of research on applying DL techniques to software tasks and any gaps that exist. From this literature review, we found code generation to be one of the most studied tasks with other tasks and artifacts such as impact analysis or tasks involving images and videos to be understudied. Therefore, we set out to explore the application of DL to these understudied tasks and artifacts as well as the limitations of DL models under the well studied task code completion, a subfield in code generation. Specifically, we developed a tool for automatically detecting duplicate mobile bug reports from user submitted videos. We used the popular Convolutional Neural Network (CNN) to learn important features from a large collection of mobile screenshots. Using this model, we could then compute similarity between a newly submitted bug report and existing ones to produce a ranked list of duplicate candidates that can be reviewed by a developer. Next, we explored impact analysis, a critical software maintenance task that identifies potential adverse effects of a given code change on the larger software system. To this end, we created Athena, a novel approach to impact analysis that integrates knowledge of a software system through its call-graph along with high-level representations of the code inside the system to improve impact analysis performance. Lastly, we explored the task of code completion, which has seen heavy interest from industry and academia. Specifically, we explored various methods that modify the positional encoding scheme of the Transformer architecture for allowing these models to incorporate longer sequences of tokens when predicting completions than seen during their training as this can significantly improve training times

    User-centered Program Analysis Tools

    Get PDF
    The research and industrial communities have made great strides in developing advanced software defect detection tools based on program analysis. Most of the work in this area has focused on developing novel program analysis algorithms to find bugs more efficiently or accurately, or to find more sophisticated kinds of bugs. However, the focus on algorithms often leads to tools that are complex and difficult to actually use to debug programs. We believe that we can design better, more useful program analysis tools by taking a user-centered approach. In this dissertation, we present three possible elements of such an approach. First, we improve the user interface by designing Path Projection, a toolkit for visualizing program paths, such as call stacks, that are commonly used to explain errors. We evaluated Path Projection in a user study and found that programmers were able to verify error reports more quickly with similar accuracy, and strongly preferred Path Projection to a standard code viewer. Second, we make it easier for programmers to combine different algorithms to customize the precision or efficiency of a tool for their target programs. We designed Mix, a framework that allows programmers to apply either type checking, which is fast but imprecise, or symbolic execution, which is precise but slow, to different parts of their programs. Mix keeps its design simple by making no modifications to the constituent analyses. Instead, programmers use Mix annotations to mark blocks of code that should be typed checked or symbolically executed, and Mix automatically combines the results. We evaluated the effectiveness of Mix by implementing a prototype called Mixy for C and using it to check for null pointer errors in vsftpd. Finally, we integrate program analysis more directly into the debugging process. We designed Expositor, an interactive dynamic program analysis and debugging environment built on top of scripting and time-travel debugging. In Expositor, programmers write program analyses as scripts that analyze entire program executions, using list-like operations such as map and filter to manipulate execution traces. For efficiency, Expositor uses lazy data structures throughout its implementation to compute results on-demand, enabling a more interactive user experience. We developed a prototype of Expositor using GDB and UndoDB, and used it to debug a stack overflow and to unravel a subtle data race in Firefox

    Creating Systems and Applying Large-Scale Methods to Improve Student Remediation in Online Tutoring Systems in Real-time and at Scale

    Get PDF
    A common problem shared amongst online tutoring systems is the time-consuming nature of content creation. It has been estimated that an hour of online instruction can take up to 100-300 hours to create. Several systems have created tools to expedite content creation, such as the Cognitive Tutors Authoring Tool (CTAT) and the ASSISTments builder. Although these tools make content creation more efficient, they all still depend on the efforts of a content creator and/or past historical. These tools do not take full advantage of the power of the crowd. These issues and challenges faced by online tutoring systems provide an ideal environment to implement a solution using crowdsourcing. I created the PeerASSIST system to provide a solution to the challenges faced with tutoring content creation. PeerASSIST crowdsources the work students have done on problems inside the ASSISTments online tutoring system and redistributes that work as a form of tutoring to their peers, who are in need of assistance. Multi-objective multi-armed bandit algorithms are used to distribute student work, which balance exploring which work is good and exploiting the best currently known work. These policies are customized to run in a real-world environment with multiple asynchronous reward functions and an infinite number of actions. Inspired by major companies such as Google, Facebook, and Bing, PeerASSIST is also designed as a platform for simultaneous online experimentation in real-time and at scale. Currently over 600 teachers (grades K-12) are requiring students to show their work. Over 300,000 instances of student work have been collected from over 18,000 students across 28,000 problems. From the student work collected, 2,000 instances have been redistributed to over 550 students who needed help over the past few months. I conducted a randomized controlled experiment to evaluate the effectiveness of PeerASSIST on student performance. Other contributions include representing learning maps as Bayesian networks to model student performance, creating a machine-learning algorithm to derive student incorrect processes from their incorrect answer and the inputs of the problem, and applying Bayesian hypothesis testing to A/B experiments. We showed that learning maps can be simplified without practical loss of accuracy and that time series data is necessary to simplify learning maps if the static data is highly correlated. I also created several interventions to evaluate the effectiveness of the buggy messages generated from the machine-learned incorrect processes. The null results of these experiments demonstrate the difficulty of creating a successful tutoring and suggest that other methods of tutoring content creation (i.e. PeerASSIST) should be explored

    Supporting feature-level software maintenance

    Get PDF
    Software maintenance is the process of modifying a software system to fix defects, improve performance, add new functionality, or adapt the system to a new environment. A maintenance task is often initiated by a bug report or a request for new functionality. Bug reports typically describe problems with incorrect behaviors or functionalities. These behaviors or functionalities are known as features. Even in very well-designed systems, the source code that implements features is often not completely modularized. The delocalized nature of features makes maintaining them challenging. Since maintenance tasks are expressed in terms of features, the goal of this dissertation is to support software maintenance at the feature-level. We focus on two tasks in particular: feature location and impact analysis via feature coupling.;Feature location is the process of identifying the source code that implements a feature, and it is an essential first step to any maintenance task. There are many existing techniques for feature location that incorporate various types of analyses such as static, dynamic, and textual. In this dissertation, we recognize the advantages of leveraging several types of analyses and introduce a new approach to feature location based on combining dynamic analysis, textual analysis, and web mining algorithms applied to software. The use of web mining for feature location is a novel contribution, and we show that our new techniques based on web mining are significantly more effective than the current state of the art.;After using feature location to identify a feature\u27s source code, maintenance can be completed on that feature. Impact analysis should then be performed to revalidate the system and determine which other features may have been affected by the modifications. We define three feature coupling metrics that capture the relationship between features based on structural information, textual information, and their combination. Our novel feature coupling metrics can be used for impact analysis to quantify the strength of coupling between pairs of features. We performed three empirical studies on open-source software systems to assess the feature coupling metrics and established three major results. First, there is a moderate to strong statistically significant correlation between feature coupling and faults. Second, feature coupling can be used to correctly determine about half of the other features that would be affected by a change to a given feature. Finally, we found that the metrics align with developers\u27 opinions about pairs of features that are actually coupled
    • …
    corecore