22,571 research outputs found
A patterns based reverse engineering approach for java source code
The ever increasing number of platforms and languages available to software developers means that the software industry is reaching high levels of complexity. Model Driven Architecture (MDA) presents a solution to the problem of improving software development processes in this changing and complex environment. MDA driven development is based on models definition and transformation. Design patterns provide a means to reuse proven solutions during development. Identifying design patterns in the models of a MDA approach helps their understanding, but also the identification of good practices during analysis. However, when analyzing or maintaining code that has not been developed according to MDA principles, or that has been changed independently from the models, the need arises to reverse engineer the models from the code prior to patterns' identification. The approach presented herein consists in transforming source code into models, and infer design patterns from these models. Erich Gamma's cataloged patterns provide us a starting point for the pattern inference process. MapIt, the tool which implements these functionalities is described.This work is funded by ERDF - European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT Fundacao para a Ciencia e a Tecnologia (Portuguese Foundation for Science and Technology) within project FCOMP-01-0124-FEDER-015095
Concrete Syntax with Black Box Parsers
Context: Meta programming consists for a large part of matching, analyzing,
and transforming syntax trees. Many meta programming systems process abstract
syntax trees, but this requires intimate knowledge of the structure of the data
type describing the abstract syntax. As a result, meta programming is
error-prone, and meta programs are not resilient to evolution of the structure
of such ASTs, requiring invasive, fault-prone change to these programs.
Inquiry: Concrete syntax patterns alleviate this problem by allowing the meta
programmer to match and create syntax trees using the actual syntax of the
object language. Systems supporting concrete syntax patterns, however, require
a concrete grammar of the object language in their own formalism. Creating such
grammars is a costly and error-prone process, especially for realistic
languages such as Java and C++. Approach: In this paper we present Concretely,
a technique to extend meta programming systems with pluggable concrete syntax
patterns, based on external, black box parsers. We illustrate Concretely in the
context of Rascal, an open-source meta programming system and language
workbench, and show how to reuse existing parsers for Java, JavaScript, and
C++. Furthermore, we propose Tympanic, a DSL to declaratively map external AST
structures to Rascal's internal data structures. Tympanic allows implementors
of Concretely to solve the impedance mismatch between object-oriented class
hierarchies in Java and Rascal's algebraic data types. Both the algebraic data
type and AST marshalling code is automatically generated. Knowledge: The
conceptual architecture of Concretely and Tympanic supports the reuse of
pre-existing, external parsers, and their AST representation in meta
programming systems that feature concrete syntax patterns for matching and
constructing syntax trees. As such this opens up concrete syntax pattern
matching for a host of realistic languages for which writing a grammar from
scratch is time consuming and error-prone, but for which industry-strength
parsers exist in the wild. Grounding: We evaluate Concretely in terms of source
lines of code (SLOC), relative to the size of the AST data type and marshalling
code. We show that for real programming languages such as C++ and Java, adding
support for concrete syntax patterns takes an effort only in the order of
dozens of SLOC. Similarly, we evaluate Tympanic in terms of SLOC, showing an
order of magnitude of reduction in SLOC compared to manual implementation of
the AST data types and marshalling code. Importance: Meta programming has
applications in reverse engineering, reengineering, source code analysis,
static analysis, software renovation, domain-specific language engineering, and
many others. Processing of syntax trees is central to all of these tasks.
Concrete syntax patterns improve the practice of constructing meta programs.
The combination of Concretely and Tympanic has the potential to make concrete
syntax patterns available with very little effort, thereby improving and
promoting the application of meta programming in the general software
engineering context
An extensible benchmark and tooling for comparing reverse engineering approaches
Various tools exist to reverse engineer software source code and generate design information, such as UML projections. Each has specific strengths and weaknesses, however no standardised benchmark exists that can be used to evaluate and compare their performance and effectiveness in a systematic manner. To facilitate such comparison in this paper we introduce the Reverse Engineering to Design Benchmark (RED-BM), which consists of a comprehensive set of Java-based targets for reverse engineering and a formal set of performance measures with which tools and approaches can be analysed and ranked. When used to evaluate 12 industry standard tools performance figures range from 8.82\% to 100\% demonstrating the ability of the benchmark to differentiate between tools. To aid the comparison, analysis and further use of reverse engineering XMI output we have developed a parser which can interpret the XMI output format of the most commonly used reverse engineering applications, and is used in a number of tools
Structured Review of the Evidence for Effects of Code Duplication on Software Quality
This report presents the detailed steps and results of a structured review of code clone literature. The aim of the review is to investigate the evidence for the claim that code duplication has a negative effect on code changeability. This report contains only the details of the review for which there is not enough place to include them in the companion paper published at a conference (Hordijk, Ponisio et al. 2009 - Harmfulness of Code Duplication - A Structured Review of the Evidence)
Structured Review of Code Clone Literature
This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other
Recursion Aware Modeling and Discovery For Hierarchical Software Event Log Analysis (Extended)
This extended paper presents 1) a novel hierarchy and recursion extension to
the process tree model; and 2) the first, recursion aware process model
discovery technique that leverages hierarchical information in event logs,
typically available for software systems. This technique allows us to analyze
the operational processes of software systems under real-life conditions at
multiple levels of granularity. The work can be positioned in-between reverse
engineering and process mining. An implementation of the proposed approach is
available as a ProM plugin. Experimental results based on real-life (software)
event logs demonstrate the feasibility and usefulness of the approach and show
the huge potential to speed up discovery by exploiting the available hierarchy.Comment: Extended version (14 pages total) of the paper Recursion Aware
Modeling and Discovery For Hierarchical Software Event Log Analysis. This
Technical Report version includes the guarantee proofs for the proposed
discovery algorithm
Understanding Android Obfuscation Techniques: A Large-Scale Investigation in the Wild
In this paper, we seek to better understand Android obfuscation and depict a
holistic view of the usage of obfuscation through a large-scale investigation
in the wild. In particular, we focus on four popular obfuscation approaches:
identifier renaming, string encryption, Java reflection, and packing. To obtain
the meaningful statistical results, we designed efficient and lightweight
detection models for each obfuscation technique and applied them to our massive
APK datasets (collected from Google Play, multiple third-party markets, and
malware databases). We have learned several interesting facts from the result.
For example, malware authors use string encryption more frequently, and more
apps on third-party markets than Google Play are packed. We are also interested
in the explanation of each finding. Therefore we carry out in-depth code
analysis on some Android apps after sampling. We believe our study will help
developers select the most suitable obfuscation approach, and in the meantime
help researchers improve code analysis systems in the right direction
An Extended Stable Marriage Problem Algorithm for Clone Detection
Code cloning negatively affects industrial software and threatens
intellectual property. This paper presents a novel approach to detecting cloned
software by using a bijective matching technique. The proposed approach focuses
on increasing the range of similarity measures and thus enhancing the precision
of the detection. This is achieved by extending a well-known stable-marriage
problem (SMP) and demonstrating how matches between code fragments of different
files can be expressed. A prototype of the proposed approach is provided using
a proper scenario, which shows a noticeable improvement in several features of
clone detection such as scalability and accuracy.Comment: 20 pages, 10 figures, 6 table
BCFA: Bespoke Control Flow Analysis for CFA at Scale
Many data-driven software engineering tasks such as discovering programming
patterns, mining API specifications, etc., perform source code analysis over
control flow graphs (CFGs) at scale. Analyzing millions of CFGs can be
expensive and performance of the analysis heavily depends on the underlying CFG
traversal strategy. State-of-the-art analysis frameworks use a fixed traversal
strategy. We argue that a single traversal strategy does not fit all kinds of
analyses and CFGs and propose bespoke control flow analysis (BCFA). Given a
control flow analysis (CFA) and a large number of CFGs, BCFA selects the most
efficient traversal strategy for each CFG. BCFA extracts a set of properties of
the CFA by analyzing the code of the CFA and combines it with properties of the
CFG, such as branching factor and cyclicity, for selecting the optimal
traversal strategy. We have implemented BCFA in Boa, and evaluated BCFA using a
set of representative static analyses that mainly involve traversing CFGs and
two large datasets containing 287 thousand and 162 million CFGs. Our results
show that BCFA can speedup the large scale analyses by 1%-28%. Further, BCFA
has low overheads; less than 0.2%, and low misprediction rate; less than 0.01%.Comment: 12 page
- โฆ