24,639 research outputs found
Choosing Code Segments to Exclude from Code Similarity Detection
When student programs are compared for similarity as a step in the detection of academic misconduct, certain segments of code are always sure to be similar but are no cause for suspicion. Some of these segments are boilerplate code (e.g. public static void main String [] args) and some will be code that was provided to students as part of the assessment specification. This working group explores these and other types of code that are legitimately common in student assessments and can therefore be excluded from similarity checking. From their own institutions, working group members collected assessment submissions that together encompass a wide variety of assessment tasks in a wide variety of programming languages. The submissions were analysed to determine what sorts of code segment arose frequently in each assessment task. The group has found that common code can arise in programming assessment tasks when it is required for compilation purposes; when it reflects an intuitive way to undertake part or all of the task in question; when it can be legitimately copied from external sources; and when it has been suggested by people with whom many of the students have been in contact. A further finding is that the nature and size of the common code fragments vary with course level and with task complexity. An informal survey of programming educators confirms the group's findings and gives some reasons why various educators include code when setting programming assignments.Peer reviewe
Algorithm Diversity for Resilient Systems
Diversity can significantly increase the resilience of systems, by reducing
the prevalence of shared vulnerabilities and making vulnerabilities harder to
exploit. Work on software diversity for security typically creates variants of
a program using low-level code transformations. This paper is the first to
study algorithm diversity for resilience. We first describe how a method based
on high-level invariants and systematic incrementalization can be used to
create algorithm variants. Executing multiple variants in parallel and
comparing their outputs provides greater resilience than executing one variant.
To prevent different parallel schedules from causing variants' behaviors to
diverge, we present a synchronized execution algorithm for DistAlgo, an
extension of Python for high-level, precise, executable specifications of
distributed algorithms. We propose static and dynamic metrics for measuring
diversity. An experimental evaluation of algorithm diversity combined with
implementation-level diversity for several sequential algorithms and
distributed algorithms shows the benefits of algorithm diversity
CHORUS Deliverable 2.1: State of the Art on Multimedia Search Engines
Based on the information provided by European projects and national initiatives related to multimedia search as well as domains experts that participated in the CHORUS Think-thanks and workshops, this document reports on the state of the art related to multimedia content search from, a technical, and socio-economic perspective.
The technical perspective includes an up to date view on content based indexing and retrieval technologies, multimedia search in the context of mobile devices and peer-to-peer networks, and an overview of current evaluation and benchmark inititiatives to measure the performance of multimedia search engines.
From a socio-economic perspective we inventorize the impact and legal consequences of these technical advances and point out future directions of research
A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges
Measuring and evaluating source code similarity is a fundamental software
engineering activity that embraces a broad range of applications, including but
not limited to code recommendation, duplicate code, plagiarism, malware, and
smell detection. This paper proposes a systematic literature review and
meta-analysis on code similarity measurement and evaluation techniques to shed
light on the existing approaches and their characteristics in different
applications. We initially found over 10000 articles by querying four digital
libraries and ended up with 136 primary studies in the field. The studies were
classified according to their methodology, programming languages, datasets,
tools, and applications. A deep investigation reveals 80 software tools,
working with eight different techniques on five application domains. Nearly 49%
of the tools work on Java programs and 37% support C and C++, while there is no
support for many programming languages. A noteworthy point was the existence of
12 datasets related to source code similarity measurement and duplicate codes,
of which only eight datasets were publicly accessible. The lack of reliable
datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm
languages are the main challenges in the field. Emerging applications of code
similarity measurement concentrate on the development phase in addition to the
maintenance.Comment: 49 pages, 10 figures, 6 table
Identifying developers’ habits and expectations in copy and paste programming practice
Máster Universitario en Investigación e Innovación en
Inteligencia Computacional y Sistemas InteractivosBoth novice and experienced developers rely more and more in external
sources of code to include into their programs by copy and paste code snippets. This
behavior differs from the traditional software design approach where cohesion was
achieved via a conscious design effort. Due to this fact, it is essential to know how copy
and paste programming practices are actually carried out, so that IDEs (Integrated
Development Environments) and code recommenders can be designed to fit with
developer expectations and habit
A user-centred evaluation of DisCERN: discovering counterfactuals for code vulnerability detection and correction.
Counterfactual explanations highlight actionable knowledge which helps to understand how a machine learning model outcome could be altered to a more favourable outcome. Understanding actionable corrections in source code analysis can be critical to proactively mitigate security attacks that are caused by known vulnerabilities. In this paper, we present the DisCERN explainer for discovering counterfactuals for code vulnerability correction. Given a vulnerable code segment, DisCERN finds counterfactual (i.e. non-vulnerable) code segments and recommends actionable corrections. DisCERN uses feature attribution knowledge to identify potentially vulnerable code statements. Subsequently, it applies a substitution-focused correction, suggesting suitable fixes by analysing the nearest-unlike neighbour. Overall, DisCERN aims to identify vulnerabilities and correct them while preserving both the code syntax and the original functionality of the code. A user study evaluated the utility of counterfactuals for vulnerability detection and correction compared to more commonly used feature attribution explainers. The study revealed that counterfactuals foster positive shifts in mental models, effectively guiding users toward making vulnerability corrections. Furthermore, counterfactuals significantly reduced the cognitive load when detecting and correcting vulnerabilities in complex code segments. Despite these benefits, the user study showed that feature attribution explanations are still more widely accepted than counterfactuals, possibly due to the greater familiarity with the former and the novelty of the latter. These findings encourage further research and development into counterfactual explanations, as they demonstrate the potential for acceptability over time among developers as a reliable resource for both coding and training
Rotation Periods of 34,030 Kepler Main-Sequence Stars: The Full Autocorrelation Sample
We analyzed 3 years of data from the Kepler space mission to derive rotation
periods of main-sequence stars below 6500 K. Our automated
autocorrelation-based method detected rotation periods between 0.2 and 70 days
for 34,030 (25.6%) of the 133,030 main-sequence Kepler targets (excluding known
eclipsing binaries and Kepler Objects of Interest), making this the largest
sample of stellar rotation periods to date. In this paper we consider the
detailed features of the now well-populated period-temperature distribution and
demonstrate that the period bimodality, first seen by McQuillan, Aigrain &
Mazeh (2013) in the M-dwarf sample, persists to higher masses, becoming less
visible above 0.6 M_sun. We show that these results are globally consistent
with the existing ground-based rotation-period data and find that the upper
envelope of the period distribution is broadly consistent with a
gyrochronological age of 4.5 Gyrs, based on the isochrones of Barnes (2007),
Mamajek & Hillenbrand (2008) and Meibom et al. (2009). We also performed a
detailed comparison of our results to those of Reinhold et al. (2013) and
Nielsen et al. (2013), who have measured rotation periods of field stars
observed by Kepler. We examined the amplitude of periodic variability for the
stars with detected rotation periods, and found a typical range between ~950
ppm (5th percentile) and ~22,700 ppm (95th percentile), with a median of ~5,600
ppm. We found typically higher amplitudes for shorter periods and lower
effective temperatures, with an excess of low-amplitude stars above ~5400 K.Comment: Accepted ApJS 20th Feb 2014, submitted 13th Jan 2014. 15 pages, 12
Figures, 6 Tables. Tables 1 & 2 are available in their entirety in a
machine-readable form in the online supplementary material or from
http://www.astro.tau.ac.il/~amy
- …