Search CORE

5 research outputs found

Tracelab: Reproducing Empirical Software Engineering Research

Author: Moritz Evan Alexander
Publication venue: W&M ScholarWorks
Publication date: 01/01/2013
Field of study

Configuring and Assembling Information Retrieval based Solutions for Software Engineering Tasks.

Author: Dit Bogdan
Publication venue: W&M ScholarWorks
Publication date: 01/01/2015
Field of study

Information Retrieval (IR) approaches are used to leverage textual or unstructured data generated during the software development process to support various software engineering (SE) tasks (e.g., concept location, traceability link recovery, change impact analysis, etc.). Two of the most important steps for applying IR techniques to support SE tasks are preprocessing the corpus and configuring the IR technique, and these steps can significantly influence the outcome and the amount of effort developers have to spend for these maintenance tasks. We present the use of Genetic Algorithms (GAs) to automatically configure and assemble an IR process to support SE tasks. The approach named IR-GA determines the (near) optimal solution to be used for each step of the IR process without requiring any training. We applied IR-GA on three different SE tasks and the results of the study indicate that IR-GA outperforms approaches previously used in the literature, and that it does not significantly differ from an ideal upper bound that could be achieved by a supervised approach and a combinatorial approach

Boise State University - ScholarWorks

College of William & Mary: W&M Publish

Automatically Documenting Software Artifacts

Author: Li Boyang
Publication venue: W&M ScholarWorks
Publication date: 13/09/2017
Field of study

Software artifacts, such as database schema and unit test cases, constantly change during evolution and maintenance of software systems. Co-evolution of code and DB schemas in Database-Centric Applications (DCAs) often leads to two types of challenging scenarios for developers, where (i) changes to the DB schema need to be incorporated in the source code, and (ii) maintenance of a DCAs code requires understanding of how the features are implemented by relying on DB operations and corresponding schema constraints. On the other hand, the number of unit test cases often grows as new functionality is introduced into the system, and maintaining these unit tests is important to reduce the introduction of regression bugs due to outdated unit tests. Therefore, one critical artifact that developers need to be able to maintain during evolution and maintenance of software systems is up-to-date and complete documentation. In order to understand developer practices regarding documenting and maintaining these software artifacts, we designed two empirical studies both composed of (i) an online survey of contributors of open source projects and (ii) a mining-based analysis of method comments in these projects. We observed that documenting methods with database accesses and unit test cases is not a common practice. Further, motivated by the findings of the studies, we proposed three novel approaches: (i) DBScribe is an approach for automatically documenting database usages and schema constraints, (ii) UnitTestScribe is an approach for automatically documenting test cases, and (iii) TeStereo tags stereotypes for unit tests and generates html reports to improve the comprehension and browsing of unit tests in a large test suite. We evaluated our tools in the case studies with industrial developers and graduate students. In general, developers indicated that descriptions generated by the tools are complete, concise, and easy to read. The reports are useful for source code comprehension tasks as well as other tasks, such as code smell detection and source code navigation

College of William & Mary: W&M Publish

Improving Software Dependability through Documentation Analysis

Author: Wong Edmund
Publication venue: 'University of Waterloo'
Publication date: 21/01/2019
Field of study

Software documentation contains critical information that describes a system’s functionality and requirements. Documentation exists in several forms, including code comments, test plans, manual pages, and user manuals. The lack of documentation in existing software systems is an issue that impacts software maintainability and programmer productivity. Since some code bases contain a large amount of documentation, we want to leverage these existing documentation to improve software dependability. Specifically, we utilize documentation to help detect software bugs and repair corrupted files, which can reduce the number of software error and failure to improve a system’s reliability (e.g., continuity of correct service). We also generate documentation (e.g., code comment) automatically to help developers understand the source code, which helps improve a system’s maintainability (e.g., ability to undergo repairs and modifications). In this thesis, we analyze software documentation and propose two branches of work, which focuses on three types of documentation including manual pages, code comments, and user manuals. The first branch of work focuses on documentation analysis because documentation contains valuable information that describes the behavior of the program. We automatically extract constraints from documentation and apply them on a dynamic analysis symbolic execution tool to find bugs in the target software, and we extract constraints manually from documentation and apply them on a structured-file parsing application to repair corrupted PDF files. The second branch of work focuses on automatic code comment generation to improve software documentation. For documentation analysis, we propose and implement DASE and DocRepair. DASE leverages automatically extracted constraints from documentation to improve a dynamic analysis symbolic execution tool. DASE guides symbolic execution to focus the testing on execution paths that execute a program’s core functionalities using constraints learned from the documentation. We evaluated DASE on 88 programs from five mature real-world software suites to detect software bugs. DASE detects 12 previously unknown bugs that symbolic execution would fail to detect when given no input constraints, 6 of which have been confirmed by the developers. In DocRepair we perform an empirical study to study and repair corrupted PDF files. We create the first dataset of 319 corrupted PDF files and conduct an empirical study on 119 real-world corrupted PDF files to study the common types of file corruption. Based on the result of the empirical study we propose a technique called DocRepair. DocRepair’s repair algorithm includes seven repair operators that utilizes manually extracted constraints from documentation to repair corrupted files. We evaluate DocRepair against three common PDF repair tools. Amongst the 1,827 collected corrupted files from over two corpora of PDF files, DocRepair can successfully repair 354 files compared to Mutool, PDFtk, and GhostScript which repair 508, 41 and 84 respectively. We also propose a technique to combine multiple repair tools called DocRepair+, which can successfully repair 751 files. In the case where there is a lack of documentation, DASE and DocRepair+ would not work. Therefore, we propose automated documentation generation to address the issue. We propose and implement CloCom+ to generate code comments by mining both existing software repositories in GitHub and a Question and Answer site, Stack Overflow. CloCom+ generated 442 unique comments for 16 Java projects. Although CloCom+ improves on previous work, SumSlice, on automatic comment generation, the quality (evaluated on completeness, conciseness, expressiveness, and usefulness) and yield (number of generated comments) are still rather low which makes the technique not ready for real-world usage. In the future, it may be possible to combine the two proposed branches of work (documentation analysis and documentation generation) to further improve software dependability. For example, we can extract constraints from the automatically generated documentation (e.g., code comments)

University of Waterloo's Institutional Repository

Deep Learning In Software Engineering

Author: Watson Cody
Publication venue: W&M ScholarWorks
Publication date: 01/01/2020
Field of study

Software evolves and therefore requires an evolving field of Software Engineering. The evolution of software can be seen on an individual project level through the software life cycle, as well as on a collective level, as we study the trends and uses of software in the real world. As the needs and requirements of users change, so must software evolve to reflect those changes. This cycle is never ending and has led to continuous and rapid development of software projects. More importantly, it has put a great responsibility on software engineers, causing them to adopt practices and tools that allow them to increase their efficiency. However, these tools suffer the same fate as software designed for the general population; they need to change in order to reflect the user’s needs. Fortunately, the demand for this evolving software has given software engineers a plethora of data and artifacts to analyze. The challenge arises when attempting to identify and apply patterns learned from the vast amount of data. In this dissertation, we explore and develop techniques to take advantage of the vast amount of software data and to aid developers in software development tasks. Specifically, we exploit the tool of deep learning to automatically learn patterns discovered within previous software data and automatically apply those patterns to present day software development. We first set out to investigate the current impact of deep learning in software engineering by performing a systematic literature review of top tier conferences and journals. This review provides guidelines and common pitfalls for researchers to consider when implementing DL (Deep Learning) approaches in SE (Software Engineering). In addition, the review provides a research road map for areas within SE where DL could be applicable. Our next piece of work developed an approach that simultaneously learned different representations of source code for the task of clone detection. We found that the use of multiple representations, such as Identifiers, ASTs, CFGs and bytecode, can lead to the identification of similar code fragments. Through the use of deep learning strategies, we automatically learned these different representations without the requirement of hand-crafted features. Lastly, we designed a novel approach for automating the generation of assert statements through seq2seq learning, with the goal of increasing the efficiency of software testing. Given the test method and the context of the associated focal method, we automatically generated semantically and syntactically correct assert statements for a given, unseen test method. We exemplify that the techniques presented in this dissertation provide a meaningful advancement to the field of software engineering and the automation of software development tasks. We provide analytical evaluations and empirical evidence that substantiate the impact of our findings and usefulness of our approaches toward the software engineering community

College of William & Mary: W&M Publish