22 research outputs found
Controlled Experiment for Assessing the Contribution of Ontology Based Software Redocumentation Approach to Support Program Understanding
Redocumentation is an approach that is used to recover knowledge from raw software artifacts by using alternative presentations. Several legacy systems have been developed based on event-driven programming which require redocumentation. However, these existing repository and query techniques emphasize only on lexical and syntactical based queries which come with limitations in providing the semantic relationship for program understanding. We are using ontology based approach that uses both ontology reasoning and querying techniques to generate software documentation from the knowledge repository. We present a controlled experiment for the empirical evaluation on the proposed ontology based approach and implemented in a tool called Ontology Based Software Redocumentation (OBSR). In this experiment, two existing tools namely Universal Report (UR) and Microsoft Visual Studio specifically for Visual Basic (VB) programming environment have been selected to be compared with the OBSR tool. The goal is to provide experimental evidence of the viability of our approach in the context of program understanding using HTML based semantic software documentation. The experiment shows that the software maintainers are able to understand and provide significant improvement in program understanding to accomplish the maintenance task easily. We describe in detail the experiment performed, discuss its results and reflect the lesson learned from the experiment
Assessing Comment Quality in Object-Oriented Languages
Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments.
Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality.
To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010–2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists.
Our contributions provide various kinds of empirical evidence of the developer’s interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques
Assessing Comment Quality in Object-Oriented Languages
Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments.
Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality.
To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010–2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists.
Our contributions provide various kinds of empirical evidence of the developer’s interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques
Security-Pattern Recognition and Validation
The increasing and diverse number of technologies that are connected to the Internet, such as distributed enterprise systems or small electronic devices like smartphones, brings the topic IT security to the foreground. We interact daily with these technologies and spend much trust on a well-established software development process. However, security vulnerabilities appear in software on all kinds of PC(-like) platforms, and more and more vulnerabilities are published, which compromise systems and their users. Thus, software has also to be modified due to changing requirements, bugs, and security flaws and software engineers must more and more face security issues during the software design; especially maintenance programmers must deal with such use cases after a software has been released. In the domain of software development, design patterns have been proposed as the best-known solutions for recurring problems in software design. Analogously, security patterns are best practices aiming at ensuring security. This thesis develops a deeper understanding of the nature of security patterns. It focuses on their validation and detection regarding the support of reviews and maintenance activities. The landscape of security patterns is diverse. Thus, published security patterns are collected and organized to identify software-related security patterns. The description of the selected software-security patterns is assessed, and they are compared against the common design patterns described by Gamma et al. to identify differences and issues that may influence the detection of security patterns. Based on these insights and a manual detection approach, we illustrate an automatic detection method for security patterns. The approach is implemented in a tool and evaluated in a case study with 25 real-world Android applications from Google Play
Documentação automatizada de APIs com tutoriais gerados a partir do Stack Overflow
One of the most common forms of reuse is through API usage. However, one of the
main challenges to effective usage is an accessible and easy to understand documentation.
Several papers have proposed alternatives to make more understandable API documentation,
or even more detailed. However, these studies have not taken into account the
complexity of understanding of the examples to make these documentations adaptable to
different levels of experience of developers. In this work we developed and evaluated four
different methodologies to generate tutorials for APIs from the contents of Stack Overflow
and organizing them according to the complexity of understanding. The methodologies
were evaluated through tutorials generated for the Swing API. A survey was conducted
to evaluate eight different features of the generated tutorials. The overall outcome of
the tutorials was positive on several characteristics, showing the feasibility of the use of
tutorials generated automatically. In addition, the use of criteria for presentation of tutorial
elements in order of complexity, the separation of the tutorial in basic and advanced
parts, the nature of tutorial to the selected posts and existence of didactic source had
significantly different results regarding a chosen generation methodology. A second study
compared the official documentation of the Android API and tutorial generated by the
best methodology of the previous study. A controlled experiment was conducted with
students who had a first contact with the Android development. In the experiment these
students developed two tasks, one using the official documentation of Android and using
the generated tutorial. The results of this experiment showed that in most cases, the
students had the best performance in tasks when they used the tutorial proposed in this
work. The main reasons for the poor performance of students in tasks using the official
API documentation were due to lack of usage examples, as well as its difficult use.Dissertação (Mestrado)Uma das maneiras mais comuns de reuso de software é por meio de APIs. Porém, um
dos principais desafios para o uso efetivo de uma API é o acesso a uma documentação de
fácil compreensão. Vários trabalhos propuseram alternativas para tornar a documentação
de APIs mais compreensível, ou até mais detalhada. Entretanto, estes trabalhos ainda não
levaram em consideração a complexidade de entendimento dos exemplos para tornar estas
documentações adaptáveis a diferentes níveis de experiência de desenvolvedores. Neste
trabalho desenvolvemos e avaliamos quatro metodologias diferentes para gerar tutoriais
para APIs a partir do conteúdo do Stack Overflow e organizá-los conforme a complexidade
de entendimento. As metodologias foram avaliadas por meio de tutoriais gerados para
a API Swing. Foi conduzido um survey para avaliar oito diferentes características dos
tutoriais gerados. O resultado geral da avaliação dos tutoriais foi positivo nas diversas
características, mostrando a viabilidade para o uso de tutoriais gerados automaticamente.
Além disso, o uso de critérios para apresentação dos elementos do tutorial por ordem
de complexidade, a separação do tutorial em partes básica e avançada, a natureza de
tutorial para os posts selecionados e existência de código-fonte didático tiveram resultados
significantemente diferentes em relação à metodologia de geração escolhida. Um segundo
estudo comparou o uso da documentação oficial da API Android com o uso do tutorial
proposto neste trabalho. Foi realizado um experimento controlado com alunos que tiveram
um primeiro contato com o desenvolvimento Android, onde estes desenvolveram duas
tarefas básicas de programação. Os resultados deste experimento mostraram que na
maioria dos casos, os alunos tiveram melhores desempenhos nas tarefas quando utilizaram
o tutorial proposto. Os principais motivos do baixo desempenho dos alunos nas tarefas
utilizando a documentação oficial da API foram devido à falta de exemplos de uso nesta
documentação, além de sua difícil utilização
Conceptual roles of data in program: analyses and applications
Program comprehension is the prerequisite for many software evolution and maintenance tasks. Currently, the research falls short in addressing how to build tools that can use domain-specific knowledge to provide powerful capabilities for extracting valuable information for facilitating program comprehension. Such capabilities are critical for working with large and complex program where program comprehension often is not possible without the help of domain-specific knowledge.;Our research advances the state-of-art in program analysis techniques based on domain-specific knowledge. The program artifacts including variables and methods are carriers of domain concepts that provide the key to understand programs. Our program analysis is directed by domain knowledge stored as domain-specific rules. Our analysis is iterative and interactive. It is based on flexible inference rules and inter-exchangeable and extensible information storage. We designed and developed a comprehensive software environment SeeCORE based on our knowledge-centric analysis methodology. The SeeCORE tool provides multiple views and abstractions to assist in understanding complex programs. The case studies demonstrate the effectiveness of our method. We demonstrate the flexibility of our approach by analyzing two legacy programs in distinct domains
Machine Learning for Software Dependability
Dependability is an important quality of modern software but is challenging to achieve. Many software dependability techniques have been proposed to help developers improve software reliability and dependability such as defect prediction [83,96,249], bug detection [6,17, 146], program repair [51, 127, 150, 209, 261, 263], test case prioritization [152, 250], or software architecture recovery [13,42,67,111,164,240].
In this thesis, we consider how machine learning (ML) and deep learning (DL) can be used to enhanced software dependability through three examples in three different domains: automatic program repair, bug detection in electronic document readers, and software architecture recovery.
In the first work, we propose a new G&V technique—CoCoNuT, which uses ensemble learning on the combination of convolutional neural networks (CNNs) and a new context-aware neural machine translation (NMT) architecture to automatically fix bugs in multiple programming languages. To better represent the context of a bug, we introduce a new context-aware NMT architecture that represents the buggy source code and its surrounding context separately. CoCoNuT uses CNNs instead of recurrent neural networks (RNNs) since CNN layers can be stacked to extract hierarchical features and better model source code at different granularity levels (e.g., statements and functions). In addition, CoCoNuTtakes advantage of the randomness in hyperparameter tuning to build multiple models that fix different bugs and combines these models using ensemble learning to fix more bugs.CoCoNuT fixes 493 bugs, including 307 bugs that are fixed by none of the 27 techniques with which we compare.
In the second work, we present a study on the correctness of PDF documents and readers and propose an approach to detect and localize the source of such inconsistencies automatically. We evaluate our automatic approach on a large corpus of over 230Kdocuments using 11 popular readers and our experiments have detected 30 unique bugs in these readers and files.
In the third work, we compare software architecture recovery techniques to understand their effectiveness and applicability. Specifically, we study the impact of leveraging accurate symbol dependencies on the accuracy of architecture recovery techniques. In addition, we evaluate other factors of the input dependencies such as the level of granularity and the dynamic-bindings graph construction. The results of our evaluation of nine architecture recovery techniques and their variants suggest that (1) using accurate symbol dependencies has a major influence on recovery quality, and (2) more accurate recovery techniques are needed. Our results show that some of the studied architecture recovery techniques scale to very large systems, whereas others do not
Analytic Provenance for Software Reverse Engineers
Reverse engineering is a time-consuming process essential to software-security tasks such as malware analysis and vulnerability discovery. During the process, an engineer will follow multiple leads to determine how the software functions. The combination of time and possible explanations makes it difficult for the engineers to maintain a context of their findings within the overall task. Analytic provenance tools have demonstrated value in similarly complex fields that require open-ended exploration and hypothesis vetting. However, they have not been explored in the reverse engineering domain. This dissertation presents SensorRE, the first analytic provenance tool designed to support software reverse engineers. A semi-structured interview with experts led to the design and implementation of the system. We describe the visual interfaces and their integration within an existing software analysis tool. SensorRE automatically captures user\u27s sense making actions and provides a graph and storyboard view to support further analysis. User study results with both experts and graduate students demonstrate that SensorRE is easy to use and that it improved the participants\u27 exploration process