1,013 research outputs found

    Structured Review of Code Clone Literature

    Get PDF
    This report presents the results of a structured review of code clone literature. The aim of the review is to assemble a conceptual model of clone-related concepts which helps us to reason about clones. This conceptual model unifies clone concepts from a wide range of literature, so that findings about clones can be compared with each other

    Stack Overflow in Github: Any Snippets There?

    Full text link
    When programmers look for how to achieve certain programming tasks, Stack Overflow is a popular destination in search engine results. Over the years, Stack Overflow has accumulated an impressive knowledge base of snippets of code that are amply documented. We are interested in studying how programmers use these snippets of code in their projects. Can we find Stack Overflow snippets in real projects? When snippets are used, is this copy literal or does it suffer adaptations? And are these adaptations specializations required by the idiosyncrasies of the target artifact, or are they motivated by specific requirements of the programmer? The large-scale study presented on this paper analyzes 909k non-fork Python projects hosted on Github, which contain 290M function definitions, and 1.9M Python snippets captured in Stack Overflow. Results are presented as quantitative analysis of block-level code cloning intra and inter Stack Overflow and GitHub, and as an analysis of programming behaviors through the qualitative analysis of our findings.Comment: 14th International Conference on Mining Software Repositories, 11 page

    Reverse Proxy Framework using Sanitization Technique for Intrusion Prevention in Database

    Full text link
    With the increasing importance of the internet in our day to day life, data security in web application has become very crucial. Ever increasing on line and real time transaction services have led to manifold rise in the problems associated with the database security. Attacker uses illegal and unauthorized approaches to hijack the confidential information like username, password and other vital details. Hence the real time transaction requires security against web based attacks. SQL injection and cross site scripting attack are the most common application layer attack. The SQL injection attacker pass SQL statement through a web applications input fields, URL or hidden parameters and get access to the database or update it. The attacker take a benefit from user provided data in such a way that the users input is handled as a SQL code. Using this vulnerability an attacker can execute SQL commands directly on the database. SQL injection attacks are most serious threats which take users input and integrate it into SQL query. Reverse Proxy is a technique which is used to sanitize the users inputs that may transform into a database attack. In this technique a data redirector program redirects the users input to the proxy server before it is sent to the application server. At the proxy server, data cleaning algorithm is triggered using a sanitizing application. In this framework we include detection and sanitization of the tainted information being sent to the database and innovate a new prototype.Comment: 9 pages, 6 figures, 3 tables; CIIT 2013 International Conference, Mumba

    Recommending Stack Overflow Posts for Fixing Runtime Exceptions using Failure Scenario Matching

    Full text link
    Using online Q&A forums, such as Stack Overflow (SO), for guidance to resolve program bugs, among other development issues, is commonplace in modern software development practice. Runtime exceptions (RE) is one such important class of bugs that is actively discussed on SO. In this work we present a technique and prototype tool called MAESTRO that can automatically recommend an SO post that is most relevant to a given Java RE in a developer's code. MAESTRO compares the exception-generating program scenario in the developer's code with that discussed in an SO post and returns the post with the closest match. To extract and compare the exception scenario effectively, MAESTRO first uses the answer code snippets in a post to implicate a subset of lines in the post's question code snippet as responsible for the exception and then compares these lines with the developer's code in terms of their respective Abstract Program Graph (APG) representations. The APG is a simplified and abstracted derivative of an abstract syntax tree, proposed in this work, that allows an effective comparison of the functionality embodied in the high-level program structure, while discarding many of the low-level syntactic or semantic differences. We evaluate MAESTRO on a benchmark of 78 instances of Java REs extracted from the top 500 Java projects on GitHub and show that MAESTRO can return either a highly relevant or somewhat relevant SO post corresponding to the exception instance in 71% of the cases, compared to relevant posts returned in only 8% - 44% instances, by four competitor tools based on state-of-the-art techniques. We also conduct a user experience study of MAESTRO with 10 Java developers, where the participants judge MAESTRO reporting a highly relevant or somewhat relevant post in 80% of the instances. In some cases the post is judged to be even better than the one manually found by the participant

    Effective Compiler Error Message Enhancement for Novice Programming Students

    Get PDF
    Programming is an essential skill that all computing students must master. However programming can be difficult to learn. Compiler error messages are crucial for correcting errors, but are often difficult to understand and pose a barrier to progress for many novices. High frequencies of errors, particularly repeated errors, have been shown to be indicators of students who are struggling with learning to program. This study involves a custom IDE that enhances Java compiler error messages, intended to be more useful to novices than those supplied by the compiler. The effectiveness of this approach was tested in an empirical control/intervention study of approximately 200 students generating almost 50,000 errors. The design allows for direct comparisons between enhanced and non-enhanced error messages. Results show that the intervention group experienced reductions in the number of overall errors, errors per student, and several repeated error metrics. This work is important for two reasons. First, the effects of error message enhancement have been recently debated in the literature. This study provides substantial evidence that it can be effective. Second, these results should be generalizable at least in part, to other programming languages, students and institutions, as we show that the control group of this study is comparable to several others using Java and other languages

    Towards Automating Precision Studies of Clone Detectors

    Full text link
    Current research in clone detection suffers from poor ecosystems for evaluating precision of clone detection tools. Corpora of labeled clones are scarce and incomplete, making evaluation labor intensive and idiosyncratic, and limiting inter tool comparison. Precision-assessment tools are simply lacking. We present a semi-automated approach to facilitate precision studies of clone detection tools. The approach merges automatic mechanisms of clone classification with manual validation of clone pairs. We demonstrate that the proposed automatic approach has a very high precision and it significantly reduces the number of clone pairs that need human validation during precision experiments. Moreover, we aggregate the individual effort of multiple teams into a single evolving dataset of labeled clone pairs, creating an important asset for software clone research.Comment: Accepted to be published in the 41st ACM/IEEE International Conference on Software Engineerin

    A comparison of code similarity analysers

    Get PDF
    Copying and pasting of source code is a common activity in software engineering. Often, the code is not copied as it is and it may be modified for various purposes; e.g. refactoring, bug fixing, or even software plagiarism. These code modifications could affect the performance of code similarity analysers including code clone and plagiarism detectors to some certain degree. We are interested in two types of code modification in this study: pervasive modifications, i.e. transformations that may have a global effect, and local modifications, i.e. code changes that are contained in a single method or code block. We evaluate 30 code similarity detection techniques and tools using five experimental scenarios for Java source code. These are (1) pervasively modified code, created with tools for source code and bytecode obfuscation, and boiler-plate code, (2) source code normalisation through compilation and decompilation using different decompilers, (3) reuse of optimal configurations over different data sets, (4) tool evaluation using ranked-based measures, and (5) local + global code modifications. Our experimental results show that in the presence of pervasive modifications, some of the general textual similarity measures can offer similar performance to specialised code similarity tools, whilst in the presence of boiler-plate code, highly specialised source code similarity detection techniques and tools outperform textual similarity measures. Our study strongly validates the use of compilation/decompilation as a normalisation technique. Its use reduced false classifications to zero for three of the tools. Moreover, we demonstrate that optimal configurations are very sensitive to a specific data set. After directly applying optimal configurations derived from one data set to another, the tools perform poorly on the new data set. The code similarity analysers are thoroughly evaluated not only based on several well-known pair-based and query-based error measures but also on each specific type of pervasive code modification. This broad, thorough study is the largest in existence and potentially an invaluable guide for future users of similarity detection in source code
    • โ€ฆ
    corecore