22,571 research outputs found

    A patterns based reverse engineering approach for java source code

    Get PDF
    The ever increasing number of platforms and languages available to software developers means that the software industry is reaching high levels of complexity. Model Driven Architecture (MDA) presents a solution to the problem of improving software development processes in this changing and complex environment. MDA driven development is based on models definition and transformation. Design patterns provide a means to reuse proven solutions during development. Identifying design patterns in the models of a MDA approach helps their understanding, but also the identification of good practices during analysis. However, when analyzing or maintaining code that has not been developed according to MDA principles, or that has been changed independently from the models, the need arises to reverse engineer the models from the code prior to patterns' identification. The approach presented herein consists in transforming source code into models, and infer design patterns from these models. Erich Gamma's cataloged patterns provide us a starting point for the pattern inference process. MapIt, the tool which implements these functionalities is described.This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-015095

    Concrete Syntax with Black Box Parsers

    Get PDF
    Context: Meta programming consists for a large part of matching, analyzing, and transforming syntax trees. Many meta programming systems process abstract syntax trees, but this requires intimate knowledge of the structure of the data type describing the abstract syntax. As a result, meta programming is error-prone, and meta programs are not resilient to evolution of the structure of such ASTs, requiring invasive, fault-prone change to these programs. Inquiry: Concrete syntax patterns alleviate this problem by allowing the meta programmer to match and create syntax trees using the actual syntax of the object language. Systems supporting concrete syntax patterns, however, require a concrete grammar of the object language in their own formalism. Creating such grammars is a costly and error-prone process, especially for realistic languages such as Java and C++. Approach: In this paper we present Concretely, a technique to extend meta programming systems with pluggable concrete syntax patterns, based on external, black box parsers. We illustrate Concretely in the context of Rascal, an open-source meta programming system and language workbench, and show how to reuse existing parsers for Java, JavaScript, and C++. Furthermore, we propose Tympanic, a DSL to declaratively map external AST structures to Rascal's internal data structures. Tympanic allows implementors of Concretely to solve the impedance mismatch between object-oriented class hierarchies in Java and Rascal's algebraic data types. Both the algebraic data type and AST marshalling code is automatically generated. Knowledge: The conceptual architecture of Concretely and Tympanic supports the reuse of pre-existing, external parsers, and their AST representation in meta programming systems that feature concrete syntax patterns for matching and constructing syntax trees. As such this opens up concrete syntax pattern matching for a host of realistic languages for which writing a grammar from scratch is time consuming and error-prone, but for which industry-strength parsers exist in the wild. Grounding: We evaluate Concretely in terms of source lines of code (SLOC), relative to the size of the AST data type and marshalling code. We show that for real programming languages such as C++ and Java, adding support for concrete syntax patterns takes an effort only in the order of dozens of SLOC. Similarly, we evaluate Tympanic in terms of SLOC, showing an order of magnitude of reduction in SLOC compared to manual implementation of the AST data types and marshalling code. Importance: Meta programming has applications in reverse engineering, reengineering, source code analysis, static analysis, software renovation, domain-specific language engineering, and many others. Processing of syntax trees is central to all of these tasks. Concrete syntax patterns improve the practice of constructing meta programs. The combination of Concretely and Tympanic has the potential to make concrete syntax patterns available with very little effort, thereby improving and promoting the application of meta programming in the general software engineering context

    An extensible benchmark and tooling for comparing reverse engineering approaches

    Get PDF
    Various tools exist to reverse engineer software source code and generate design information, such as UML projections. Each has specific strengths and weaknesses, however no standardised benchmark exists that can be used to evaluate and compare their performance and effectiveness in a systematic manner. To facilitate such comparison in this paper we introduce the Reverse Engineering to Design Benchmark (RED-BM), which consists of a comprehensive set of Java-based targets for reverse engineering and a formal set of performance measures with which tools and approaches can be analysed and ranked. When used to evaluate 12 industry standard tools performance figures range from 8.82\% to 100\% demonstrating the ability of the benchmark to differentiate between tools. To aid the comparison, analysis and further use of reverse engineering XMI output we have developed a parser which can interpret the XMI output format of the most commonly used reverse engineering applications, and is used in a number of tools

    Structured Review of the Evidence for Effects of Code Duplication on Software Quality

    Get PDF
    This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)

    Structured Review of Code Clone Literature

    Get PDF
    This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

    Recursion Aware Modeling and Discovery For Hierarchical Software Event Log Analysis (Extended)

    Get PDF
    This extended paper presents 1) a novel hierarchy and recursion extension to the process tree model; and 2) the first, recursion aware process model discovery technique that leverages hierarchical information in event logs, typically available for software systems. This technique allows us to analyze the operational processes of software systems under real-life conditions at multiple levels of granularity. The work can be positioned in-between reverse engineering and process mining. An implementation of the proposed approach is available as a ProM plugin. Experimental results based on real-life (software) event logs demonstrate the feasibility and usefulness of the approach and show the huge potential to speed up discovery by exploiting the available hierarchy.Comment: Extended version (14 pages total) of the paper Recursion Aware Modeling and Discovery For Hierarchical Software Event Log Analysis. This Technical Report version includes the guarantee proofs for the proposed discovery algorithm

    Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild

    Get PDF
    In this paper, we seek to better understand Android obfuscation and depict a holistic view of the usage of obfuscation through a large-scale investigation in the wild. In particular, we focus on four popular obfuscation approaches: identifier renaming, string encryption, Java reflection, and packing. To obtain the meaningful statistical results, we designed efficient and lightweight detection models for each obfuscation technique and applied them to our massive APK datasets (collected from Google Play, multiple third-party markets, and malware databases). We have learned several interesting facts from the result. For example, malware authors use string encryption more frequently, and more apps on third-party markets than Google Play are packed. We are also interested in the explanation of each finding. Therefore we carry out in-depth code analysis on some Android apps after sampling. We believe our study will help developers select the most suitable obfuscation approach, and in the meantime help researchers improve code analysis systems in the right direction

    An Extended Stable Marriage Problem Algorithm for Clone Detection

    Full text link
    Code cloning negatively affects industrial software and threatens intellectual property. This paper presents a novel approach to detecting cloned software by using a bijective matching technique. The proposed approach focuses on increasing the range of similarity measures and thus enhancing the precision of the detection. This is achieved by extending a well-known stable-marriage problem (SMP) and demonstrating how matches between code fragments of different files can be expressed. A prototype of the proposed approach is provided using a proper scenario, which shows a noticeable improvement in several features of clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table

    BCFA: Bespoke Control Flow Analysis for CFA at Scale

    Full text link
    Many data-driven software engineering tasks such as discovering programming patterns, mining API specifications, etc., perform source code analysis over control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be expensive and performance of the analysis heavily depends on the underlying CFG traversal strategy. State-of-the-art analysis frameworks use a fixed traversal strategy. We argue that a single traversal strategy does not fit all kinds of analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a control flow analysis (CFA) and a large number of CFGs, BCFA selects the most efficient traversal strategy for each CFG. BCFA extracts a set of properties of the CFA by analyzing the code of the CFA and combines it with properties of the CFG, such as branching factor and cyclicity, for selecting the optimal traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a set of representative static analyses that mainly involve traversing CFGs and two large datasets containing 287 thousand and 162 million CFGs. Our results show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page
    • โ€ฆ
    corecore