40,890 research outputs found

    An extensible benchmark and tooling for comparing reverse engineering approaches

    Get PDF
    Various tools exist to reverse engineer software source code and generate design information, such as UML projections. Each has specific strengths and weaknesses, however no standardised benchmark exists that can be used to evaluate and compare their performance and effectiveness in a systematic manner. To facilitate such comparison in this paper we introduce the Reverse Engineering to Design Benchmark (RED-BM), which consists of a comprehensive set of Java-based targets for reverse engineering and a formal set of performance measures with which tools and approaches can be analysed and ranked. When used to evaluate 12 industry standard tools performance figures range from 8.82\% to 100\% demonstrating the ability of the benchmark to differentiate between tools. To aid the comparison, analysis and further use of reverse engineering XMI output we have developed a parser which can interpret the XMI output format of the most commonly used reverse engineering applications, and is used in a number of tools

    SourcererCC: Scaling Code Clone Detection to Big Code

    Full text link
    Despite a decade of active research, there is a marked lack in clone detectors that scale to very large repositories of source code, in particular for detecting near-miss clones where significant editing activities may take place in the cloned code. We present SourcererCC, a token-based clone detector that targets three clone types, and exploits an index to achieve scalability to large inter-project repositories using a standard workstation. SourcererCC uses an optimized inverted-index to quickly query the potential clones of a given code block. Filtering heuristics based on token ordering are used to significantly reduce the size of the index, the number of code-block comparisons needed to detect the clones, as well as the number of required token-comparisons needed to judge a potential clone. We evaluate the scalability, execution time, recall and precision of SourcererCC, and compare it to four publicly available and state-of-the-art tools. To measure recall, we use two recent benchmarks, (1) a large benchmark of real clones, BigCloneBench, and (2) a Mutation/Injection-based framework of thousands of fine-grained artificial clones. We find SourcererCC has both high recall and precision, and is able to scale to a large inter-project repository (250MLOC) using a standard workstation.Comment: Accepted for publication at ICSE'16 (preprint, unrevised

    BEFRIEND - a benchmark for evaluating reverse engineering tools

    Get PDF
    Reverse engineering tools analyze the source code of a software system and produce various results, which usually point back to the original source code. Such tools are e.g. design pattern miners, duplicated code detectors and coding rule violation checkers. Most of the time these tools present their results in different formats, which makes them very difficult to compare. In this paper, we present work in progress towards implementing a benchmark called BEFRIEND (BEnchmark For Reverse engInEering tools workiNg on source coDe) with which the outputs of reverse engineering tools can be easily and efficiently evaluated and compared. It supports different kinds of tool families, programming languages and software systems, and it enables the users to define their own evaluation criteria. Furthermore, it is a freely available web-application open to the community. We hope that in the future it will be accepted and used by the community members to evaluate and compare their tools with each other

    Evaluating and Improving Reverse Engineering Tools

    Get PDF
    Developers tend to leave some important steps and actions (e.g. properly designing the system's architecture, code review and testing) out of the software development process, and use risky practices (e.g. the copy-paste technique) so that the software can be released as fast as possible. However, these practices may turn out to be critical from the viewpoint of maintainability of the software system. In such cases, a cost-effective solution might be to re-engineer the system. Re-engineering consists of two stages, namely reverse-engineering information from the current system and, based on this information, forward-engineering the system to a new form. In this way, successful re-engineering significantly depends on the reverse engineering phase. Therefore, it is vital to guarantee correctness, and to improve the results of the reverse engineering step. Otherwise, the re-engineering of the software system could fail due to the bad results of reverse engineering. The above issues motivated us to develop a method which extends and improves one of our reverse engineering tools, and to develop benchmarks and to perform experiments on evaluating and comparing reverse engineering tools

    Automatic differentiation in machine learning: a survey

    Get PDF
    Derivatives, mostly in the form of gradients and Hessians, are ubiquitous in machine learning. Automatic differentiation (AD), also called algorithmic differentiation or simply "autodiff", is a family of techniques similar to but more general than backpropagation for efficiently and accurately evaluating derivatives of numeric functions expressed as computer programs. AD is a small but established field with applications in areas including computational fluid dynamics, atmospheric sciences, and engineering design optimization. Until very recently, the fields of machine learning and AD have largely been unaware of each other and, in some cases, have independently discovered each other's results. Despite its relevance, general-purpose AD has been missing from the machine learning toolbox, a situation slowly changing with its ongoing adoption under the names "dynamic computational graphs" and "differentiable programming". We survey the intersection of AD and machine learning, cover applications where AD has direct relevance, and address the main implementation techniques. By precisely defining the main differentiation techniques and their interrelationships, we aim to bring clarity to the usage of the terms "autodiff", "automatic differentiation", and "symbolic differentiation" as these are encountered more and more in machine learning settings.Comment: 43 pages, 5 figure
    • …
    corecore