86 research outputs found

    Effectively incorporating expert knowledge in automated software remodularisation

    Get PDF
    Remodularising the components of a software system is challenging: sound design principles (e.g., coupling and cohesion) need to be balanced against developer intuition of which entities conceptually belong together. Despite this, automated approaches to remodularisation tend to ignore domain knowledge, leading to results that can be nonsensical to developers. Nevertheless, suppling such knowledge is a potentially burdensome task to perform manually. A lot information may need to be specified, particularly for large systems. Addressing these concerns, we propose the SUMO (SUpervised reMOdularisation) approach. SUMO is a technique that aims to leverage a small subset of domain knowledge about a system to produce a remodularisation that will be acceptable to a developer. With SUMO, developers refine a modularisation by iteratively supplying corrections. These corrections constrain the type of remodularisation eventually required, enabling SUMO to dramatically reduce the solution space. This in turn reduces the amount of feedback the developer needs to supply. We perform a comprehensive systematic evaluation using 100 real world subject systems. Our results show that SUMO guarantees convergence on a target remodularisation with a tractable amount of user interaction

    Using Hierarchical Latent Dirichlet Allocation to Construct Feature Tree for Program Comprehension

    Get PDF

    Fine-Grained Static Detection of Obfuscation Transforms Using Ensemble-Learning and Semantic Reasoning

    Get PDF
    International audienceThe ability to efficiently detect the software protections used is at a prime to facilitate the selection and application of adequate deob-fuscation techniques. We present a novel approach that combines semantic reasoning techniques with ensemble learning classification for the purpose of providing a static detection framework for obfuscation transformations. By contrast to existing work, we provide a methodology that can detect multiple layers of obfuscation, without depending on knowledge of the underlying functionality of the training-set used. We also extend our work to detect constructions of obfuscation transformations, thus providing a fine-grained methodology. To that end, we provide several studies for the best practices of the use of machine learning techniques for a scalable and efficient model. According to our experimental results and evaluations on obfuscators such as Tigress and OLLVM, our models have up to 91% accuracy on state-of-the-art obfuscation transformations. Our overall accuracies for their constructions are up to 100%

    Clustering Classes in Packages for Program Comprehension

    Get PDF

    Detection and analysis of near-miss clone genealogies

    Get PDF
    It is believed that identical or similar code fragments in source code, also known as code clones, have an impact on software maintenance. A clone genealogy shows how a group of clone fragments evolve with the evolution of the associated software system, and thus may provide important insights on the maintenance implications of those clone fragments. Considering the importance of studying the evolution of code clones, many studies have been conducted on this topic. However, after a decade of active research, there has been a marked lack of progress in understanding the evolution of near-miss software clones, especially where statements have been added, deleted, or modified in the copied fragments. Given that there are a significant amount of near-miss clones in the software systems, we believe that without studying the evolution of near-miss clones, one cannot have a complete picture of the clone evolution. In this thesis, we have advanced the state-of-the-art in the evolution of clone research in the context of both exact and near-miss software clones. First, we performed a large-scale empirical study to extend the existing knowledge about the evolution of exact and renamed clones where identifiers have been modified in the copied fragments. Second, we have developed a framework, gCad that can automatically extract both exact and near-miss clone genealogies across multiple versions of a program and identify their change patterns reasonably fast while maintaining high precision and recall. Third, in order to gain a broader perspective of clone evolution, we extended gCad to calculate various evolutionary metrics, and performed an in-depth empirical study on the evolution of both exact and near-miss clones in six open source software systems of two different programming languages with respect to five research questions. We discovered several interesting evolutionary phenomena of near-miss clones which either contradict with previous findings or are new. Finally, we further improved gCad, and investigated a wide range of attributes and metrics derived from both the clones themselves and their evolution histories to identify certain attributes, which developers often use to remove clones in the real world. We believe that our new insights in the evolution of near-miss clones, and about how developers approach and remove duplication, will play an important role in understanding the maintenance implications of clones and will help design better clone management systems

    An Infrastructure to Support Interoperability in Reverse Engineering

    Get PDF
    An infrastructure that supports interoperability among reverse engineering tools and other software tools is described. The three major components of the infrastructure are: (1) a hierarchy of schemas for low- and middle-level program representation graphs, (2) g4re, a tool chain for reverse engineering C++ programs, and (3) a repository of reverse engineering artifacts, including the previous two components, a test suite, and tools, GXL instances, and XSLT transformations for graphs at each level of the hierarchy. The results of two case studies that investigated the space and time costs incurred by the infrastructure are provided. The results of two empirical evaluations that were performed using the api module of g4re, and were focused on computation of object-oriented metrics and three-dimensional visualization of class template diagrams, respectively, are also provided

    ModelVars2SPL : an automated approach to reengineer model variants into software product lines

    Get PDF
    Orientadora : Profª. Drª. Silvia R. VergilioCoorientador : Prof Dr. Roberto E. Lopez-HerrejonTese (doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 11/04/2017Inclui referências : f. 74-82Área de concentração : Ciência da computaçãoResumo: Linhas de Produto de Software (LPSs) são famílias de sistemas de software relacionados que são desenvolvidos para um segmento de mercado ou domínio. Comumente, LPSs surgem de um conjunto de variantes existentes, quando a manutenção e/ou evolução individuais tornam-se complexas. Contudo, as abordagens encontradas na literatura para extração de LPS a partir de variantes existentes não dão suporte a modelos de projeto, são parcialmente automatizadas, ou não refletem restrições de domínio em termos de combinação de características. Para lidar com estas limitações, o objetivo deste trabalho é apresentar uma abordagem automatizada para fazer a reengenharia de variantes de modelos em uma LPS, chamada ModelVars2SPL (Variantes de Modelos para Linha de Produto de Software, do Inglês Model Variants to Software Product Line). A entrada para a abordagem é um conjunto de diagramas de classe Linguagem de Modelagem Unificada (UML) e uma lista de características que estes implementam. Todo o processo de reengenharia é coberto, e a saída inclui (i) um Modelo de Características, que representa a combinação de características das variantes de entrada, e (ii) uma Arquitetura de Linha de Produto, que representa uma arquitetura global com características anotadas. O processo de reengenharia da ModelVars2SPL é composto por quatro passos, sendo dois deles apoiados em técnicas baseadas em busca, e os dois outros baseados em algoritmos determinísticos. Não existe a necessidade de especialistas humanos para obter soluções. Para avaliar a abordagem proposta, foi conduzido um experimento para aferir a qualidade das soluções obtidas. A qualidade dos Modelos de Características e das Arquiteturas de Linha de Produto foi medida considerando-se o quão bem as variantes de entrada foram representadas. Além disso, a qualidade das saídas em cada passo da abordagem foi avaliada levando-se em consideração os objetivos do processo de reengenharia. Para a experimentação utilizaram-se dez estudos de caso representando dois cenários diferentes. Os resultados da avaliação mostram que a abordagem consegue obter soluções com alto grau de corretude em termos de representação das variantes de entrada, e que as saídas dos passos estão de acordo com as fases do processo de reengenharia. Com base em um exemplo de uso de uma solução mostra-se como os artefatos de LPS obtidos facilitam a atividade de manutenção. Palavras-chave: Reúso, Reengenharia, Linha de Produto de Software, Extração de LPS, Engenharia de Software Baseada em Busca.Abstract: Software Product Lines (SPLs) are families of related software systems developed for specific market segments or domains. SPLs commonly emerge from sets of existing variants when their individual maintenance and/or evolution become complex. However, current approaches for SPL extraction from existing variants do not support design models, are partially automated, or do not reflect domain constraints in terms of feature combinations. To tackle these limitations, the goal of this work is to present an automated approach to reengineer model variants into an SPL, called ModelVars2SPL (Model Variants to Software Product Line). The input of the approach is a set of Unified Modeling Language (UML) class diagrams and the list of features they implement. All the reengineering process is covered, and the output includes (i) a Feature Model, which represents the combinations of features of the input variants, and (ii) a Product Line Architecture, which represents a global architecture with feature-related annotations. The reengineering process of ModelVars2SPL is composed of four steps, two of them rely on searchbased techniques and the others are based on deterministic algorithms. There is no need for human experts for obtaining solutions. We conducted an experiment to evaluate the quality of the solutions obtained with the proposed approach. The quality of the FMs and PLAs was measured by considering how well these artifacts represent the input variants. Furthermore, we evaluate the quality of the outputs in each step of the approach taking into account the goals of the reengineering process. For the experimentation we used ten case studies representing two di_erent scenarios. The results of the evaluation show that the approach can obtain solutions with high degree of correctness in terms of representing the input variants, and that the outputs of the steps are in accordance to the phases of the reengineering process. Based on an example of use we show how the obtained FM and PLA make easier the maintenance activity. Keywords: Reuse, Reengineering, Software Product Line, SPL extraction, Search-Based Software Engineering

    Mining Architectural Information: A Systematic Mapping Study

    Full text link
    Context: Mining Software Repositories (MSR) has become an essential activity in software development. Mining architectural information to support architecting activities, such as architecture understanding and recovery, has received a significant attention in recent years. However, there is an absence of a comprehensive understanding of the state of research on mining architectural information. Objective: This work aims to identify, analyze, and synthesize the literature on mining architectural information in software repositories in terms of architectural information and sources mined, architecting activities supported, approaches and tools used, and challenges faced. Method: A Systematic Mapping Study (SMS) has been conducted on the literature published between January 2006 and November 2021. Results: Of the 79 primary studies finally selected, 8 categories of architectural information have been mined, among which architectural description is the most mined architectural information; 12 architecting activities can be supported by the mined architectural information, among which architecture understanding is the most supported activity; 81 approaches and 52 tools were proposed and employed in mining architectural information; and 4 types of challenges in mining architectural information were identified. Conclusions: This SMS provides researchers with promising future directions and help practitioners be aware of what approaches and tools can be used to mine what architectural information from what sources to support various architecting activities.Comment: 68 pages, 5 images, 15 tables, Manuscript submitted to a Journal (2022

    An enhanced performance model for metamorphic computer virus classification and detectioN

    Get PDF
    Metamorphic computer virus employs various code mutation techniques to change its code to become new generations. These generations have similar behavior and functionality and yet, they could not be detected by most commercial antivirus because their solutions depend on a signature database and make use of string signature-based detection methods. However, the antivirus detection engine can be avoided by metamorphism techniques. The purpose of this study is to develop a performance model based on computer virus classification and detection. The model would also be able to examine portable executable files that would classify and detect metamorphic computer viruses. A Hidden Markov Model implemented on portable executable files was employed to classify and detect the metamorphic viruses. This proposed model that produce common virus statistical patterns was evaluated by comparing the results with previous related works and famous commercial antiviruses. This was done by investigating the metamorphic computer viruses and their features, and the existing classifications and detection methods. Specifically, this model was applied on binary format of portable executable files and it was able to classify if the files belonged to a virus family. Besides that, the performance of the model, practically implemented and tested, was also evaluated based on detection rate and overall accuracy. The findings indicated that the proposed model is able to classify and detect the metamorphic virus variants in portable executable file format with a high average of 99.7% detection rate. The implementation of the model is proven useful and applicable for antivirus programs
    corecore