22 research outputs found

    Controlled Experiment for Assessing the Contribution of Ontology Based Software Redocumentation Approach to Support Program Understanding

    Get PDF
    Redocumentation is an approach that is used to recover knowledge from raw software artifacts by using alternative presentations. Several legacy systems have been developed based on event-driven programming which require redocumentation. However, these existing repository and query techniques emphasize only on lexical and syntactical based queries which come with limitations in providing the semantic relationship for program understanding. We are using ontology based approach that uses both ontology reasoning and querying techniques to generate software documentation from the knowledge repository. We present a controlled experiment for the empirical evaluation on the proposed ontology based approach and implemented in a tool called Ontology Based Software Redocumentation (OBSR). In this experiment, two existing tools namely Universal Report (UR) and Microsoft Visual Studio specifically for Visual Basic (VB) programming environment have been selected to be compared with the OBSR tool. The goal is to provide experimental evidence of the viability of our approach in the context of program understanding using HTML based semantic software documentation. The experiment shows that the software maintainers are able to understand and provide significant improvement in program understanding to accomplish the maintenance task easily. We describe in detail the experiment performed, discuss its results and reflect the lesson learned from the experiment

    Assessing Comment Quality in Object-Oriented Languages

    Get PDF
    Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments. Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality. To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010–2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists. Our contributions provide various kinds of empirical evidence of the developer’s interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques

    Assessing Comment Quality in Object-Oriented Languages

    Get PDF
    Previous studies have shown that high-quality code comments support developers in software maintenance and program comprehension tasks. However, the semi-structured nature of comments, several conventions to write comments, and the lack of quality assessment tools for all aspects of comments make comment evaluation and maintenance a non-trivial problem. To understand the specification of high-quality comments to build effective assessment tools, our thesis emphasizes acquiring a multi-perspective view of the comments, which can be approached by analyzing (1) the academic support for comment quality assessment, (2) developer commenting practices across languages, and (3) developer concerns about comments. Our findings regarding the academic support for assessing comment quality showed that researchers primarily focus on Java in the last decade even though the trend of using polyglot environments in software projects is increasing. Similarly, the trend of analyzing specific types of code comments (method comments, or inline comments) is increasing, but the studies rarely analyze class comments. We found 21 quality attributes that researchers consider to assess comment quality, and manual assessment is still the most commonly used technique to assess various quality attributes. Our analysis of developer commenting practices showed that developers embed a mixed level of details in class comments, ranging from high-level class overviews to low-level implementation details across programming languages. They follow style guidelines regarding what information to write in class comments but violate the structure and syntax guidelines. They primarily face problems locating relevant guidelines to write consistent and informative comments, verifying the adherence of their comments to the guidelines, and evaluating the overall state of comment quality. To help researchers and developers in building comment quality assessment tools, we contribute: (i) a systematic literature review (SLR) of ten years (2010–2020) of research on assessing comment quality, (ii) a taxonomy of quality attributes used to assess comment quality, (iii) an empirically validated taxonomy of class comment information types from three programming languages, (iv) a multi-programming-language approach to automatically identify the comment information types, (v) an empirically validated taxonomy of comment convention-related questions and recommendation from various Q&A forums, and (vi) a tool to gather discussions from multiple developer sources, such as Stack Overflow, and mailing lists. Our contributions provide various kinds of empirical evidence of the developer’s interest in reducing efforts in the software documentation process, of the limited support developers get in automatically assessing comment quality, and of the challenges they face in writing high-quality comments. This work lays the foundation for future effective comment quality assessment tools and techniques

    Security-Pattern Recognition and Validation

    Get PDF
    The increasing and diverse number of technologies that are connected to the Internet, such as distributed enterprise systems or small electronic devices like smartphones, brings the topic IT security to the foreground. We interact daily with these technologies and spend much trust on a well-established software development process. However, security vulnerabilities appear in software on all kinds of PC(-like) platforms, and more and more vulnerabilities are published, which compromise systems and their users. Thus, software has also to be modified due to changing requirements, bugs, and security flaws and software engineers must more and more face security issues during the software design; especially maintenance programmers must deal with such use cases after a software has been released. In the domain of software development, design patterns have been proposed as the best-known solutions for recurring problems in software design. Analogously, security patterns are best practices aiming at ensuring security. This thesis develops a deeper understanding of the nature of security patterns. It focuses on their validation and detection regarding the support of reviews and maintenance activities. The landscape of security patterns is diverse. Thus, published security patterns are collected and organized to identify software-related security patterns. The description of the selected software-security patterns is assessed, and they are compared against the common design patterns described by Gamma et al. to identify differences and issues that may influence the detection of security patterns. Based on these insights and a manual detection approach, we illustrate an automatic detection method for security patterns. The approach is implemented in a tool and evaluated in a case study with 25 real-world Android applications from Google Play

    Documentação automatizada de APIs com tutoriais gerados a partir do Stack Overflow

    Get PDF
    One of the most common forms of reuse is through API usage. However, one of the main challenges to effective usage is an accessible and easy to understand documentation. Several papers have proposed alternatives to make more understandable API documentation, or even more detailed. However, these studies have not taken into account the complexity of understanding of the examples to make these documentations adaptable to different levels of experience of developers. In this work we developed and evaluated four different methodologies to generate tutorials for APIs from the contents of Stack Overflow and organizing them according to the complexity of understanding. The methodologies were evaluated through tutorials generated for the Swing API. A survey was conducted to evaluate eight different features of the generated tutorials. The overall outcome of the tutorials was positive on several characteristics, showing the feasibility of the use of tutorials generated automatically. In addition, the use of criteria for presentation of tutorial elements in order of complexity, the separation of the tutorial in basic and advanced parts, the nature of tutorial to the selected posts and existence of didactic source had significantly different results regarding a chosen generation methodology. A second study compared the official documentation of the Android API and tutorial generated by the best methodology of the previous study. A controlled experiment was conducted with students who had a first contact with the Android development. In the experiment these students developed two tasks, one using the official documentation of Android and using the generated tutorial. The results of this experiment showed that in most cases, the students had the best performance in tasks when they used the tutorial proposed in this work. The main reasons for the poor performance of students in tasks using the official API documentation were due to lack of usage examples, as well as its difficult use.Dissertação (Mestrado)Uma das maneiras mais comuns de reuso de software é por meio de APIs. Porém, um dos principais desafios para o uso efetivo de uma API é o acesso a uma documentação de fácil compreensão. Vários trabalhos propuseram alternativas para tornar a documentação de APIs mais compreensível, ou até mais detalhada. Entretanto, estes trabalhos ainda não levaram em consideração a complexidade de entendimento dos exemplos para tornar estas documentações adaptáveis a diferentes níveis de experiência de desenvolvedores. Neste trabalho desenvolvemos e avaliamos quatro metodologias diferentes para gerar tutoriais para APIs a partir do conteúdo do Stack Overflow e organizá-los conforme a complexidade de entendimento. As metodologias foram avaliadas por meio de tutoriais gerados para a API Swing. Foi conduzido um survey para avaliar oito diferentes características dos tutoriais gerados. O resultado geral da avaliação dos tutoriais foi positivo nas diversas características, mostrando a viabilidade para o uso de tutoriais gerados automaticamente. Além disso, o uso de critérios para apresentação dos elementos do tutorial por ordem de complexidade, a separação do tutorial em partes básica e avançada, a natureza de tutorial para os posts selecionados e existência de código-fonte didático tiveram resultados significantemente diferentes em relação à metodologia de geração escolhida. Um segundo estudo comparou o uso da documentação oficial da API Android com o uso do tutorial proposto neste trabalho. Foi realizado um experimento controlado com alunos que tiveram um primeiro contato com o desenvolvimento Android, onde estes desenvolveram duas tarefas básicas de programação. Os resultados deste experimento mostraram que na maioria dos casos, os alunos tiveram melhores desempenhos nas tarefas quando utilizaram o tutorial proposto. Os principais motivos do baixo desempenho dos alunos nas tarefas utilizando a documentação oficial da API foram devido à falta de exemplos de uso nesta documentação, além de sua difícil utilização

    Conceptual roles of data in program: analyses and applications

    Get PDF
    Program comprehension is the prerequisite for many software evolution and maintenance tasks. Currently, the research falls short in addressing how to build tools that can use domain-specific knowledge to provide powerful capabilities for extracting valuable information for facilitating program comprehension. Such capabilities are critical for working with large and complex program where program comprehension often is not possible without the help of domain-specific knowledge.;Our research advances the state-of-art in program analysis techniques based on domain-specific knowledge. The program artifacts including variables and methods are carriers of domain concepts that provide the key to understand programs. Our program analysis is directed by domain knowledge stored as domain-specific rules. Our analysis is iterative and interactive. It is based on flexible inference rules and inter-exchangeable and extensible information storage. We designed and developed a comprehensive software environment SeeCORE based on our knowledge-centric analysis methodology. The SeeCORE tool provides multiple views and abstractions to assist in understanding complex programs. The case studies demonstrate the effectiveness of our method. We demonstrate the flexibility of our approach by analyzing two legacy programs in distinct domains

    Machine Learning for Software Dependability

    Get PDF
    Dependability is an important quality of modern software but is challenging to achieve. Many software dependability techniques have been proposed to help developers improve software reliability and dependability such as defect prediction [83,96,249], bug detection [6,17, 146], program repair [51, 127, 150, 209, 261, 263], test case prioritization [152, 250], or software architecture recovery [13,42,67,111,164,240]. In this thesis, we consider how machine learning (ML) and deep learning (DL) can be used to enhanced software dependability through three examples in three different domains: automatic program repair, bug detection in electronic document readers, and software architecture recovery. In the first work, we propose a new G&V technique—CoCoNuT, which uses ensemble learning on the combination of convolutional neural networks (CNNs) and a new context-aware neural machine translation (NMT) architecture to automatically fix bugs in multiple programming languages. To better represent the context of a bug, we introduce a new context-aware NMT architecture that represents the buggy source code and its surrounding context separately. CoCoNuT uses CNNs instead of recurrent neural networks (RNNs) since CNN layers can be stacked to extract hierarchical features and better model source code at different granularity levels (e.g., statements and functions). In addition, CoCoNuTtakes advantage of the randomness in hyperparameter tuning to build multiple models that fix different bugs and combines these models using ensemble learning to fix more bugs.CoCoNuT fixes 493 bugs, including 307 bugs that are fixed by none of the 27 techniques with which we compare. In the second work, we present a study on the correctness of PDF documents and readers and propose an approach to detect and localize the source of such inconsistencies automatically. We evaluate our automatic approach on a large corpus of over 230Kdocuments using 11 popular readers and our experiments have detected 30 unique bugs in these readers and files. In the third work, we compare software architecture recovery techniques to understand their effectiveness and applicability. Specifically, we study the impact of leveraging accurate symbol dependencies on the accuracy of architecture recovery techniques. In addition, we evaluate other factors of the input dependencies such as the level of granularity and the dynamic-bindings graph construction. The results of our evaluation of nine architecture recovery techniques and their variants suggest that (1) using accurate symbol dependencies has a major influence on recovery quality, and (2) more accurate recovery techniques are needed. Our results show that some of the studied architecture recovery techniques scale to very large systems, whereas others do not

    Analytic Provenance for Software Reverse Engineers

    Get PDF
    Reverse engineering is a time-consuming process essential to software-security tasks such as malware analysis and vulnerability discovery. During the process, an engineer will follow multiple leads to determine how the software functions. The combination of time and possible explanations makes it difficult for the engineers to maintain a context of their findings within the overall task. Analytic provenance tools have demonstrated value in similarly complex fields that require open-ended exploration and hypothesis vetting. However, they have not been explored in the reverse engineering domain. This dissertation presents SensorRE, the first analytic provenance tool designed to support software reverse engineers. A semi-structured interview with experts led to the design and implementation of the system. We describe the visual interfaces and their integration within an existing software analysis tool. SensorRE automatically captures user\u27s sense making actions and provides a graph and storyboard view to support further analysis. User study results with both experts and graduate students demonstrate that SensorRE is easy to use and that it improved the participants\u27 exploration process
    corecore