24,639 research outputs found

    Choosing Code Segments to Exclude from Code Similarity Detection

    Get PDF
    When student programs are compared for similarity as a step in the detection of academic misconduct, certain segments of code are always sure to be similar but are no cause for suspicion. Some of these segments are boilerplate code (e.g. public static void main String [] args) and some will be code that was provided to students as part of the assessment specification. This working group explores these and other types of code that are legitimately common in student assessments and can therefore be excluded from similarity checking. From their own institutions, working group members collected assessment submissions that together encompass a wide variety of assessment tasks in a wide variety of programming languages. The submissions were analysed to determine what sorts of code segment arose frequently in each assessment task. The group has found that common code can arise in programming assessment tasks when it is required for compilation purposes; when it reflects an intuitive way to undertake part or all of the task in question; when it can be legitimately copied from external sources; and when it has been suggested by people with whom many of the students have been in contact. A further finding is that the nature and size of the common code fragments vary with course level and with task complexity. An informal survey of programming educators confirms the group's findings and gives some reasons why various educators include code when setting programming assignments.Peer reviewe

    Algorithm Diversity for Resilient Systems

    Full text link
    Diversity can significantly increase the resilience of systems, by reducing the prevalence of shared vulnerabilities and making vulnerabilities harder to exploit. Work on software diversity for security typically creates variants of a program using low-level code transformations. This paper is the first to study algorithm diversity for resilience. We first describe how a method based on high-level invariants and systematic incrementalization can be used to create algorithm variants. Executing multiple variants in parallel and comparing their outputs provides greater resilience than executing one variant. To prevent different parallel schedules from causing variants' behaviors to diverge, we present a synchronized execution algorithm for DistAlgo, an extension of Python for high-level, precise, executable specifications of distributed algorithms. We propose static and dynamic metrics for measuring diversity. An experimental evaluation of algorithm diversity combined with implementation-level diversity for several sequential algorithms and distributed algorithms shows the benefits of algorithm diversity

    CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines

    Get PDF
    Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective. The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines. From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Identifying developers’ habits and expectations in copy and paste programming practice

    Full text link
    Máster Universitario en Investigación e Innovación en Inteligencia Computacional y Sistemas InteractivosBoth novice and experienced developers rely more and more in external sources of code to include into their programs by copy and paste code snippets. This behavior differs from the traditional software design approach where cohesion was achieved via a conscious design effort. Due to this fact, it is essential to know how copy and paste programming practices are actually carried out, so that IDEs (Integrated Development Environments) and code recommenders can be designed to fit with developer expectations and habit

    A user-centred evaluation of DisCERN: discovering counterfactuals for code vulnerability detection and correction.

    Get PDF
    Counterfactual explanations highlight actionable knowledge which helps to understand how a machine learning model outcome could be altered to a more favourable outcome. Understanding actionable corrections in source code analysis can be critical to proactively mitigate security attacks that are caused by known vulnerabilities. In this paper, we present the DisCERN explainer for discovering counterfactuals for code vulnerability correction. Given a vulnerable code segment, DisCERN finds counterfactual (i.e. non-vulnerable) code segments and recommends actionable corrections. DisCERN uses feature attribution knowledge to identify potentially vulnerable code statements. Subsequently, it applies a substitution-focused correction, suggesting suitable fixes by analysing the nearest-unlike neighbour. Overall, DisCERN aims to identify vulnerabilities and correct them while preserving both the code syntax and the original functionality of the code. A user study evaluated the utility of counterfactuals for vulnerability detection and correction compared to more commonly used feature attribution explainers. The study revealed that counterfactuals foster positive shifts in mental models, effectively guiding users toward making vulnerability corrections. Furthermore, counterfactuals significantly reduced the cognitive load when detecting and correcting vulnerabilities in complex code segments. Despite these benefits, the user study showed that feature attribution explanations are still more widely accepted than counterfactuals, possibly due to the greater familiarity with the former and the novelty of the latter. These findings encourage further research and development into counterfactual explanations, as they demonstrate the potential for acceptability over time among developers as a reliable resource for both coding and training

    Rotation Periods of 34,030 Kepler Main-Sequence Stars: The Full Autocorrelation Sample

    Full text link
    We analyzed 3 years of data from the Kepler space mission to derive rotation periods of main-sequence stars below 6500 K. Our automated autocorrelation-based method detected rotation periods between 0.2 and 70 days for 34,030 (25.6%) of the 133,030 main-sequence Kepler targets (excluding known eclipsing binaries and Kepler Objects of Interest), making this the largest sample of stellar rotation periods to date. In this paper we consider the detailed features of the now well-populated period-temperature distribution and demonstrate that the period bimodality, first seen by McQuillan, Aigrain & Mazeh (2013) in the M-dwarf sample, persists to higher masses, becoming less visible above 0.6 M_sun. We show that these results are globally consistent with the existing ground-based rotation-period data and find that the upper envelope of the period distribution is broadly consistent with a gyrochronological age of 4.5 Gyrs, based on the isochrones of Barnes (2007), Mamajek & Hillenbrand (2008) and Meibom et al. (2009). We also performed a detailed comparison of our results to those of Reinhold et al. (2013) and Nielsen et al. (2013), who have measured rotation periods of field stars observed by Kepler. We examined the amplitude of periodic variability for the stars with detected rotation periods, and found a typical range between ~950 ppm (5th percentile) and ~22,700 ppm (95th percentile), with a median of ~5,600 ppm. We found typically higher amplitudes for shorter periods and lower effective temperatures, with an excess of low-amplitude stars above ~5400 K.Comment: Accepted ApJS 20th Feb 2014, submitted 13th Jan 2014. 15 pages, 12 Figures, 6 Tables. Tables 1 & 2 are available in their entirety in a machine-readable form in the online supplementary material or from http://www.astro.tau.ac.il/~amy
    corecore