11,757 research outputs found

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Towards A Practical High-Assurance Systems Programming Language

    Full text link
    Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation. Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code. To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process

    Evaluation Methodologies in Software Protection Research

    Full text link
    Man-at-the-end (MATE) attackers have full control over the system on which the attacked software runs, and try to break the confidentiality or integrity of assets embedded in the software. Both companies and malware authors want to prevent such attacks. This has driven an arms race between attackers and defenders, resulting in a plethora of different protection and analysis methods. However, it remains difficult to measure the strength of protections because MATE attackers can reach their goals in many different ways and a universally accepted evaluation methodology does not exist. This survey systematically reviews the evaluation methodologies of papers on obfuscation, a major class of protections against MATE attacks. For 572 papers, we collected 113 aspects of their evaluation methodologies, ranging from sample set types and sizes, over sample treatment, to performed measurements. We provide detailed insights into how the academic state of the art evaluates both the protections and analyses thereon. In summary, there is a clear need for better evaluation methodologies. We identify nine challenges for software protection evaluations, which represent threats to the validity, reproducibility, and interpretation of research results in the context of MATE attacks

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Graph Neural Networks For Mapping Variables Between Programs -- Extended Version

    Full text link
    Automated program analysis is a pivotal research domain in many areas of Computer Science -- Formal Methods and Artificial Intelligence, in particular. Due to the undecidability of the problem of program equivalence, comparing two programs is highly challenging. Typically, in order to compare two programs, a relation between both programs' sets of variables is required. Thus, mapping variables between two programs is useful for a panoply of tasks such as program equivalence, program analysis, program repair, and clone detection. In this work, we propose using graph neural networks (GNNs) to map the set of variables between two programs based on both programs' abstract syntax trees (ASTs). To demonstrate the strength of variable mappings, we present three use-cases of these mappings on the task of program repair to fix well-studied and recurrent bugs among novice programmers in introductory programming assignments (IPAs). Experimental results on a dataset of 4166 pairs of incorrect/correct programs show that our approach correctly maps 83% of the evaluation dataset. Moreover, our experiments show that the current state-of-the-art on program repair, greatly dependent on the programs' structure, can only repair about 72% of the incorrect programs. In contrast, our approach, which is solely based on variable mappings, can repair around 88.5%.Comment: Extended version of "Graph Neural Networks For Mapping Variables Between Programs", paper accepted at ECAI 2023. Github: https://github.com/pmorvalho/ecai23-GNNs-for-mapping-variables-between-programs. 11 pages, 5 figures, 4 tables and 3 listing

    IR Design for Application-Specific Natural Language: A Case Study on Traffic Data

    Full text link
    In the realm of software applications in the transportation industry, Domain-Specific Languages (DSLs) have enjoyed widespread adoption due to their ease of use and various other benefits. With the ceaseless progress in computer performance and the rapid development of large-scale models, the possibility of programming using natural language in specified applications - referred to as Application-Specific Natural Language (ASNL) - has emerged. ASNL exhibits greater flexibility and freedom, which, in turn, leads to an increase in computational complexity for parsing and a decrease in processing performance. To tackle this issue, our paper advances a design for an intermediate representation (IR) that caters to ASNL and can uniformly process transportation data into graph data format, improving data processing performance. Experimental comparisons reveal that in standard data query operations, our proposed IR design can achieve a speed improvement of over forty times compared to direct usage of standard XML format data

    A next-generation liquid xenon observatory for dark matter and neutrino physics

    Get PDF
    The nature of dark matter and properties of neutrinos are among the most pressing issues in contemporary particle physics. The dual-phase xenon time-projection chamber is the leading technology to cover the available parameter space for weakly interacting massive particles, while featuring extensive sensitivity to many alternative dark matter candidates. These detectors can also study neutrinos through neutrinoless double-beta decay and through a variety of astrophysical sources. A next-generation xenon-based detector will therefore be a true multi-purpose observatory to significantly advance particle physics, nuclear physics, astrophysics, solar physics, and cosmology. This review article presents the science cases for such a detector

    System FωμF^\mu_\omega with Context-free Session Types

    Full text link
    We study increasingly expressive type systems, from FμF^\mu -- an extension of the polymorphic lambda calculus with equirecursive types -- to Fωμ;F^{\mu;}_\omega -- the higher-order polymorphic lambda calculus with equirecursive types and context-free session types. Type equivalence is given by a standard bisimulation defined over a novel labelled transition system for types. Our system subsumes the contractive fragment of FωμF^\mu_\omega as studied in the literature. Decidability results for type equivalence of the various type languages are obtained from the translation of types into objects of an appropriate computational model: finite-state automata, simple grammars and deterministic pushdown automata. We show that type equivalence is decidable for a significant fragment of the type language. We further propose a message-passing, concurrent functional language equipped with the expressive type language and show that it enjoys preservation and absence of runtime errors for typable processes.Comment: 38 pages, 13 figure

    L’utilisation de dictionnaires de langues algonquiennes comme sources ethnographiques : étude de cas sur la culture légale et les pratiques juridiques illinoises au tournant du XVIIIe siècle

    Get PDF
    Abstract: Miami-Illinois is an Algonquian language that, in the 18th century, was to be heard to the south of the Great Lakes. In the 19th century, forced removals and the reservation system led to cultural and linguistic fragmentation amongst Miami-Illinois speakers. Against this current, language revitalization efforts began in the mid-1990s and, due to these, Miami-Illinois is again a spoken language. A wealth of documentation is now available through the Indigenous Languages Digital Archive. This thesis draws on this resource, in conjunction with other archival and published sources, to elicit an understanding of the legal culture of the people called Illinois by the French, particularly that of the Kaskaskias (kaahkaahkiaki), as it was in the early 18th century. Building on linguistic and historical sources, this work explores their jurispractice in relation to cases that have been preserved in the archives of the French overseas empire. Three points are addressed, namely: (i) how the Illinois (and Myaamia) thought about justice in the early 18th century; (ii) how the Miami-Illinois–and–French dictionaries can provide a new depth of understanding about this; and (iii) the limits to our ability to elicit abstract concepts from a fragmentary historical record.Au XVIIIe siècle, le miami-illinois (langue de la famille algonquienne) était traditionnellement parlée au sud des Grands Lacs. Le XIXe siècle a vu une fragmentation culturelle et linguistique parmi les locuteurs du miami-illinois. Des efforts de revitalisation de la langue ont commencé vers 1995. Grâce à ce travail minutieux, le miami-illinois est redevenu une langue parlée. Il y a maintenant une grande richesse de matériels linguistiques disponibles grâce à la banque de données Indigenous Languages Digital Archive. Cette thèse a pour but d’explorer la culture juridique des Illinois au début du XVIIIe siècle, avec une attention particulière pour le peuple kaskaskia (kaahkaahkiaki). En étudiant des cas réels préservés dans les archives de la Nouvelle France et en se basant sur des sources linguistiques et historiques, ce texte explore les pratiques juridiques des Illinois. Trois points sont étudiés : (i) la conception de la justice des Illinois (et Myaamia) au début du XVIIIe siècle; (ii) comment les dictionnaires jésuites en miami-illinois et français peuvent améliorer notre connaissance et compréhension de cette période; et (iii) les limites de la recherche dans l’histoire des idées dans les archives historiques incomplètes
    • …
    corecore