11,757 research outputs found
A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges
Measuring and evaluating source code similarity is a fundamental software
engineering activity that embraces a broad range of applications, including but
not limited to code recommendation, duplicate code, plagiarism, malware, and
smell detection. This paper proposes a systematic literature review and
meta-analysis on code similarity measurement and evaluation techniques to shed
light on the existing approaches and their characteristics in different
applications. We initially found over 10000 articles by querying four digital
libraries and ended up with 136 primary studies in the field. The studies were
classified according to their methodology, programming languages, datasets,
tools, and applications. A deep investigation reveals 80 software tools,
working with eight different techniques on five application domains. Nearly 49%
of the tools work on Java programs and 37% support C and C++, while there is no
support for many programming languages. A noteworthy point was the existence of
12 datasets related to source code similarity measurement and duplicate codes,
of which only eight datasets were publicly accessible. The lack of reliable
datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm
languages are the main challenges in the field. Emerging applications of code
similarity measurement concentrate on the development phase in addition to the
maintenance.Comment: 49 pages, 10 figures, 6 table
Towards A Practical High-Assurance Systems Programming Language
Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation.
Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code.
To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process
Evaluation Methodologies in Software Protection Research
Man-at-the-end (MATE) attackers have full control over the system on which
the attacked software runs, and try to break the confidentiality or integrity
of assets embedded in the software. Both companies and malware authors want to
prevent such attacks. This has driven an arms race between attackers and
defenders, resulting in a plethora of different protection and analysis
methods. However, it remains difficult to measure the strength of protections
because MATE attackers can reach their goals in many different ways and a
universally accepted evaluation methodology does not exist. This survey
systematically reviews the evaluation methodologies of papers on obfuscation, a
major class of protections against MATE attacks. For 572 papers, we collected
113 aspects of their evaluation methodologies, ranging from sample set types
and sizes, over sample treatment, to performed measurements. We provide
detailed insights into how the academic state of the art evaluates both the
protections and analyses thereon. In summary, there is a clear need for better
evaluation methodologies. We identify nine challenges for software protection
evaluations, which represent threats to the validity, reproducibility, and
interpretation of research results in the context of MATE attacks
Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques
The rapid growth of demanding applications in domains applying multimedia
processing and machine learning has marked a new era for edge and cloud
computing. These applications involve massive data and compute-intensive tasks,
and thus, typical computing paradigms in embedded systems and data centers are
stressed to meet the worldwide demand for high performance. Concurrently, the
landscape of the semiconductor field in the last 15 years has constituted power
as a first-class design concern. As a result, the community of computing
systems is forced to find alternative design approaches to facilitate
high-performance and/or power-efficient computing. Among the examined
solutions, Approximate Computing has attracted an ever-increasing interest,
with research works applying approximations across the entire traditional
computing stack, i.e., at software, hardware, and architectural levels. Over
the last decade, there is a plethora of approximation techniques in software
(programs, frameworks, compilers, runtimes, languages), hardware (circuits,
accelerators), and architectures (processors, memories). The current article is
Part I of our comprehensive survey on Approximate Computing, and it reviews its
motivation, terminology and principles, as well it classifies and presents the
technical details of the state-of-the-art software and hardware approximation
techniques.Comment: Under Review at ACM Computing Survey
Graph Neural Networks For Mapping Variables Between Programs -- Extended Version
Automated program analysis is a pivotal research domain in many areas of
Computer Science -- Formal Methods and Artificial Intelligence, in particular.
Due to the undecidability of the problem of program equivalence, comparing two
programs is highly challenging. Typically, in order to compare two programs, a
relation between both programs' sets of variables is required. Thus, mapping
variables between two programs is useful for a panoply of tasks such as program
equivalence, program analysis, program repair, and clone detection. In this
work, we propose using graph neural networks (GNNs) to map the set of variables
between two programs based on both programs' abstract syntax trees (ASTs). To
demonstrate the strength of variable mappings, we present three use-cases of
these mappings on the task of program repair to fix well-studied and recurrent
bugs among novice programmers in introductory programming assignments (IPAs).
Experimental results on a dataset of 4166 pairs of incorrect/correct programs
show that our approach correctly maps 83% of the evaluation dataset. Moreover,
our experiments show that the current state-of-the-art on program repair,
greatly dependent on the programs' structure, can only repair about 72% of the
incorrect programs. In contrast, our approach, which is solely based on
variable mappings, can repair around 88.5%.Comment: Extended version of "Graph Neural Networks For Mapping Variables
Between Programs", paper accepted at ECAI 2023. Github:
https://github.com/pmorvalho/ecai23-GNNs-for-mapping-variables-between-programs.
11 pages, 5 figures, 4 tables and 3 listing
IR Design for Application-Specific Natural Language: A Case Study on Traffic Data
In the realm of software applications in the transportation industry,
Domain-Specific Languages (DSLs) have enjoyed widespread adoption due to their
ease of use and various other benefits. With the ceaseless progress in computer
performance and the rapid development of large-scale models, the possibility of
programming using natural language in specified applications - referred to as
Application-Specific Natural Language (ASNL) - has emerged. ASNL exhibits
greater flexibility and freedom, which, in turn, leads to an increase in
computational complexity for parsing and a decrease in processing performance.
To tackle this issue, our paper advances a design for an intermediate
representation (IR) that caters to ASNL and can uniformly process
transportation data into graph data format, improving data processing
performance. Experimental comparisons reveal that in standard data query
operations, our proposed IR design can achieve a speed improvement of over
forty times compared to direct usage of standard XML format data
A next-generation liquid xenon observatory for dark matter and neutrino physics
The nature of dark matter and properties of neutrinos are among the most pressing issues in contemporary particle physics. The dual-phase xenon time-projection chamber is the leading technology to cover the available parameter space for weakly interacting massive particles, while featuring extensive sensitivity to many alternative dark matter candidates. These detectors can also study neutrinos through neutrinoless double-beta decay and through a variety of astrophysical sources. A next-generation xenon-based detector will therefore be a true multi-purpose observatory to significantly advance particle physics, nuclear physics, astrophysics, solar physics, and cosmology. This review article presents the science cases for such a detector
System with Context-free Session Types
We study increasingly expressive type systems, from -- an extension
of the polymorphic lambda calculus with equirecursive types -- to
-- the higher-order polymorphic lambda calculus with
equirecursive types and context-free session types. Type equivalence is given
by a standard bisimulation defined over a novel labelled transition system for
types. Our system subsumes the contractive fragment of as
studied in the literature. Decidability results for type equivalence of the
various type languages are obtained from the translation of types into objects
of an appropriate computational model: finite-state automata, simple grammars
and deterministic pushdown automata. We show that type equivalence is decidable
for a significant fragment of the type language. We further propose a
message-passing, concurrent functional language equipped with the expressive
type language and show that it enjoys preservation and absence of runtime
errors for typable processes.Comment: 38 pages, 13 figure
L’utilisation de dictionnaires de langues algonquiennes comme sources ethnographiques : étude de cas sur la culture légale et les pratiques juridiques illinoises au tournant du XVIIIe siècle
Abstract: Miami-Illinois is an Algonquian language that, in the 18th century, was to be heard to the south of the Great Lakes. In the 19th century, forced removals and the reservation system led to cultural and linguistic fragmentation amongst Miami-Illinois speakers. Against this current, language revitalization efforts began in the mid-1990s and, due to these, Miami-Illinois is again a spoken language. A wealth of documentation is now available through the Indigenous Languages Digital Archive. This thesis draws on this resource, in conjunction with other archival and published sources, to elicit an understanding of the legal culture of the people called Illinois by the French, particularly that of the Kaskaskias (kaahkaahkiaki), as it was in the early 18th century. Building on linguistic and historical sources, this work explores their jurispractice in relation to cases that have been preserved in the archives of the French overseas empire. Three points are addressed, namely: (i) how the Illinois (and Myaamia) thought about justice in the early 18th century; (ii) how the Miami-Illinois–and–French dictionaries can provide a new depth of understanding about this; and (iii) the limits to our ability to elicit abstract concepts from a fragmentary historical record.Au XVIIIe siècle, le miami-illinois (langue de la famille algonquienne) était traditionnellement parlée au sud des Grands Lacs. Le XIXe siècle a vu une fragmentation culturelle et linguistique parmi les locuteurs du miami-illinois. Des efforts de revitalisation de la langue ont commencé vers 1995. Grâce à ce travail minutieux, le miami-illinois est redevenu une langue parlée. Il y a maintenant une grande richesse de matériels linguistiques disponibles grâce à la banque de données Indigenous Languages Digital Archive. Cette thèse a pour but d’explorer la culture juridique des Illinois au début du XVIIIe siècle, avec une attention particulière pour le peuple kaskaskia (kaahkaahkiaki). En étudiant des cas réels préservés dans les archives de la Nouvelle France et en se basant sur des sources linguistiques et historiques, ce texte explore les pratiques juridiques des Illinois. Trois points sont étudiés : (i) la conception de la justice des Illinois (et Myaamia) au début du XVIIIe siècle; (ii) comment les dictionnaires jésuites en miami-illinois et français peuvent améliorer notre connaissance et compréhension de cette période; et (iii) les limites de la recherche dans l’histoire des idées dans les archives historiques incomplètes
- …