12,068 research outputs found

    A systematic literature review on source code similarity measurement and clone detection: techniques, applications, and challenges

    Full text link
    Measuring and evaluating source code similarity is a fundamental software engineering activity that embraces a broad range of applications, including but not limited to code recommendation, duplicate code, plagiarism, malware, and smell detection. This paper proposes a systematic literature review and meta-analysis on code similarity measurement and evaluation techniques to shed light on the existing approaches and their characteristics in different applications. We initially found over 10000 articles by querying four digital libraries and ended up with 136 primary studies in the field. The studies were classified according to their methodology, programming languages, datasets, tools, and applications. A deep investigation reveals 80 software tools, working with eight different techniques on five application domains. Nearly 49% of the tools work on Java programs and 37% support C and C++, while there is no support for many programming languages. A noteworthy point was the existence of 12 datasets related to source code similarity measurement and duplicate codes, of which only eight datasets were publicly accessible. The lack of reliable datasets, empirical evaluations, hybrid methods, and focuses on multi-paradigm languages are the main challenges in the field. Emerging applications of code similarity measurement concentrate on the development phase in addition to the maintenance.Comment: 49 pages, 10 figures, 6 table

    Towards A Practical High-Assurance Systems Programming Language

    Full text link
    Writing correct and performant low-level systems code is a notoriously demanding job, even for experienced developers. To make the matter worse, formally reasoning about their correctness properties introduces yet another level of complexity to the task. It requires considerable expertise in both systems programming and formal verification. The development can be extremely costly due to the sheer complexity of the systems and the nuances in them, if not assisted with appropriate tools that provide abstraction and automation. Cogent is designed to alleviate the burden on developers when writing and verifying systems code. It is a high-level functional language with a certifying compiler, which automatically proves the correctness of the compiled code and also provides a purely functional abstraction of the low-level program to the developer. Equational reasoning techniques can then be used to prove functional correctness properties of the program on top of this abstract semantics, which is notably less laborious than directly verifying the C code. To make Cogent a more approachable and effective tool for developing real-world systems, we further strengthen the framework by extending the core language and its ecosystem. Specifically, we enrich the language to allow users to control the memory representation of algebraic data types, while retaining the automatic proof with a data layout refinement calculus. We repurpose existing tools in a novel way and develop an intuitive foreign function interface, which provides users a seamless experience when using Cogent in conjunction with native C. We augment the Cogent ecosystem with a property-based testing framework, which helps developers better understand the impact formal verification has on their programs and enables a progressive approach to producing high-assurance systems. Finally we explore refinement type systems, which we plan to incorporate into Cogent for more expressiveness and better integration of systems programmers with the verification process

    Evaluation Methodologies in Software Protection Research

    Full text link
    Man-at-the-end (MATE) attackers have full control over the system on which the attacked software runs, and try to break the confidentiality or integrity of assets embedded in the software. Both companies and malware authors want to prevent such attacks. This has driven an arms race between attackers and defenders, resulting in a plethora of different protection and analysis methods. However, it remains difficult to measure the strength of protections because MATE attackers can reach their goals in many different ways and a universally accepted evaluation methodology does not exist. This survey systematically reviews the evaluation methodologies of papers on obfuscation, a major class of protections against MATE attacks. For 572 papers, we collected 113 aspects of their evaluation methodologies, ranging from sample set types and sizes, over sample treatment, to performed measurements. We provide detailed insights into how the academic state of the art evaluates both the protections and analyses thereon. In summary, there is a clear need for better evaluation methodologies. We identify nine challenges for software protection evaluations, which represent threats to the validity, reproducibility, and interpretation of research results in the context of MATE attacks

    Approximate Computing Survey, Part I: Terminology and Software & Hardware Approximation Techniques

    Full text link
    The rapid growth of demanding applications in domains applying multimedia processing and machine learning has marked a new era for edge and cloud computing. These applications involve massive data and compute-intensive tasks, and thus, typical computing paradigms in embedded systems and data centers are stressed to meet the worldwide demand for high performance. Concurrently, the landscape of the semiconductor field in the last 15 years has constituted power as a first-class design concern. As a result, the community of computing systems is forced to find alternative design approaches to facilitate high-performance and/or power-efficient computing. Among the examined solutions, Approximate Computing has attracted an ever-increasing interest, with research works applying approximations across the entire traditional computing stack, i.e., at software, hardware, and architectural levels. Over the last decade, there is a plethora of approximation techniques in software (programs, frameworks, compilers, runtimes, languages), hardware (circuits, accelerators), and architectures (processors, memories). The current article is Part I of our comprehensive survey on Approximate Computing, and it reviews its motivation, terminology and principles, as well it classifies and presents the technical details of the state-of-the-art software and hardware approximation techniques.Comment: Under Review at ACM Computing Survey

    Eunomia: Enabling User-specified Fine-Grained Search in Symbolically Executing WebAssembly Binaries

    Full text link
    Although existing techniques have proposed automated approaches to alleviate the path explosion problem of symbolic execution, users still need to optimize symbolic execution by applying various searching strategies carefully. As existing approaches mainly support only coarse-grained global searching strategies, they cannot efficiently traverse through complex code structures. In this paper, we propose Eunomia, a symbolic execution technique that allows users to specify local domain knowledge to enable fine-grained search. In Eunomia, we design an expressive DSL, Aes, that lets users precisely pinpoint local searching strategies to different parts of the target program. To further optimize local searching strategies, we design an interval-based algorithm that automatically isolates the context of variables for different local searching strategies, avoiding conflicts between local searching strategies for the same variable. We implement Eunomia as a symbolic execution platform targeting WebAssembly, which enables us to analyze applications written in various languages (like C and Go) but can be compiled into WebAssembly. To the best of our knowledge, Eunomia is the first symbolic execution engine that supports the full features of the WebAssembly runtime. We evaluate Eunomia with a dedicated microbenchmark suite for symbolic execution and six real-world applications. Our evaluation shows that Eunomia accelerates bug detection in real-world applications by up to three orders of magnitude. According to the results of a comprehensive user study, users can significantly improve the efficiency and effectiveness of symbolic execution by writing a simple and intuitive Aes script. Besides verifying six known real-world bugs, Eunomia also detected two new zero-day bugs in a popular open-source project, Collections-C.Comment: Accepted by ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA) 202

    DATA AUGMENTATION FOR SYNTHETIC APERTURE RADAR USING ALPHA BLENDING AND DEEP LAYER TRAINING

    Get PDF
    Human-based object detection in synthetic aperture RADAR (SAR) imagery is complex and technical, laboriously slow but time critical—the perfect application for machine learning (ML). Training an ML network for object detection requires very large image datasets with imbedded objects that are accurately and precisely labeled. Unfortunately, no such SAR datasets exist. Therefore, this paper proposes a method to synthesize wide field of view (FOV) SAR images by combining two existing datasets: SAMPLE, which is composed of both real and synthetic single-object chips, and MSTAR Clutter, which is composed of real wide-FOV SAR images. Synthetic objects are extracted from SAMPLE using threshold-based segmentation before being alpha-blended onto patches from MSTAR Clutter. To validate the novel synthesis method, individual object chips are created and classified using a simple convolutional neural network (CNN); testing is performed against the measured SAMPLE subset. A novel technique is also developed to investigate training activity in deep layers. The proposed data augmentation technique produces a 17% increase in the accuracy of measured SAR image classification. This improvement shows that any residual artifacts from segmentation and blending do not negatively affect ML, which is promising for future use in wide-area SAR synthesis.Outstanding ThesisMajor, United States Air ForceApproved for public release. Distribution is unlimited

    Solidification behavior of high nitrogen stainless steels and establishment of a one-dimensional heat transfer framework

    Get PDF
    Duplex stainless steel (DSS) has excellent corrosion resistance and mechanical properties due to its dual-phase structure. The solidification process is the key to determining the structure of materials, and an in-depth investigation of solidification can help us better understand the properties of materials. The melting and solidification processes of S32101 DSS were investigated using high temperature confocal microscopy (HTCM)

    Gaussian Control Barrier Functions : A Gaussian Process based Approach to Safety for Robots

    Get PDF
    In recent years, the need for safety of autonomous and intelligent robots has increased. Today, as robots are being increasingly deployed in closer proximity to humans, there is an exigency for safety since human lives may be at risk, e.g., self-driving vehicles or surgical robots. The objective of this thesis is to present a safety framework for dynamical systems that leverages tools from control theory and machine learning. More formally, the thesis presents a data-driven framework for designing safety function candidates which ensure properties of forward invariance. The potential benefits of the results presented in this thesis are expected to help applications such as safe exploration, collision avoidance problems, manipulation tasks, and planning, to name some. We utilize Gaussian processes (GP) to place a prior on the desired safety function candidate, which is to be utilized as a control barrier function (CBF). The resultant formulation is called Gaussian CBFs and they reside in a reproducing kernel Hilbert space. A key concept behind Gaussian CBFs is the incorporation of both safety belief as well as safety uncertainty, which former barrier function formulations did not consider. This is achieved by using robust posterior estimates from a GP where the posterior mean and variance serve as surrogates for the safety belief and uncertainty respectively. We synthesize safe controllers by framing a convex optimization problem where the kernel-based representation of GPs allows computing the derivatives in closed-form analytically. Finally, in addition to the theoretical and algorithmic frameworks in this thesis, we rigorously test our methods in hardware on a quadrotor platform. The platform used is a Crazyflie 2.1 which is a versatile palm-sized quadrotor. We provide our insights and detailed discussions on the hardware implementations which will be useful for large-scale deployment of the techniques presented in this dissertation.Ph.D

    A Methodology to Enable Concurrent Trade Space Exploration of Space Campaigns and Transportation Systems

    Get PDF
    Space exploration campaigns detail the ways and means to achieve goals for our human spaceflight programs. Significant strategic, financial, and programmatic investments over long timescales are required to execute them, and therefore must be justified to decision makers. To make an informed down-selection, many alternative campaign designs are presented at the conceptual-level, as a set and sequence of individual missions to perform that meets the goals and constraints of the campaign, either technical or programmatic. Each mission is executed by in-space transportation systems, which deliver either crew or cargo payloads to various destinations. Design of each of these transportation systems is highly dependent on campaign goals and even small changes in subsystem design parameters can prompt significant changes in the overall campaign strategy. However, the current state of the art describes campaign and vehicle design processes that are generally performed independently, which limits the ability to assess these sensitive impacts. The objective of this research is to establish a methodology for space exploration campaign design that represents transportation systems as a collection of subsystems and integrates its design process to enable concurrent trade space exploration. More specifically, the goal is to identify existing campaign and vehicle design processes to use as a foundation for improvement and eventual integration. In the past two decades, researchers have adopted terrestrial logistics and supply chain optimization processes to the space campaign design problem by accounting for the challenges that accompany space travel. Fundamentally, a space campaign is formulated as a network design problem where destinations, such as orbits or surfaces of planetary bodies, are represented as nodes with the routes between them as arcs. The objective of this design problem is to optimize the flow of commodities within network using available transport systems. Given the dynamic nature and the number of commodities involved, each campaign can be modeled as a time-expanded, generalized multi-commodity network flow and solved using a mixed integer programming algorithm. To address the challenge of modeling complex concept of operations (ConOps), this formulation was extended to include paths as a set of arcs, further enabling the inclusion of vehicle stacks and payload transfers in the campaign optimization process. Further, with the focus of transportation system within this research, the typical fixed orbital nodes in the logistics network are modified to represent ranges of orbits, categorized by their characteristic energy. This enables the vehicle design process to vary each orbit in the mission as it desires to find the best one per vehicle. By extension, once integrated, arc costs of dV and dT are updated each iteration. Once campaign goals and external constraints are included, the formulated campaign design process generates alternatives at the conceptual level, where each one identifies the optimal set and sequence of missions to perform. Representing transportation systems as a collection of subsystems introduces challenges in the design of each vehicle, with a high degree of coupling between each subsystem as well as the driving mission. Additionally, sizing of each subsystem can have many inputs and outputs linked across the system, resulting in a complex, multi-disciplinary analysis, and optimization problem. By leveraging the ontology within the Dynamic Rocket Equation Tool, DYREQT, this problem can be solved rapidly by defining each system as a hierarchy of elements and subelements, the latter corresponding to external subsystem-level sizing models. DYREQT also enables the construction of individual missions as a series of events, which can be directly driven and generated by the mission set found by the campaign optimization process. This process produces sized vehicles iteratively by using the mission input, subsystem level sizing models, and the ideal rocket equation. By conducting a literature review of campaign and vehicle design processes, the different pieces of the overall methodology are identified, but not the structure. The specific iterative solver, the corresponding convergence criteria, and initialization scheme are the primary areas for experimentation of this thesis. Using NASA’s reference 3-element Human Landing System campaign, the results of these experiments show that the methodology performs best with the vehicle sizing and synthesis process initializing and a path guess that minimizes dV. Further, a converged solution is found faster using non-linear Gauss Seidel fixed point iteration over Jacobi and set of convergence criteria that covers vehicle masses and mission data. To show improvement over the state of the art, and how it enables concurrent trade studies, this methodology is used at scale in a demonstration using NASA’s Design Reference Architecture 5.0. The LH2 Nuclear Thermal Propulsion (NTP) option is traded with NH3and H2O at the vehicle-level as a way to show the impacts of alternative propellants on the vehicle sizing and campaign strategy. Martian surface stay duration is traded at the campaign-level through two options: long-stay and short-stay. The methodology was able to produce four alternative campaigns over the course of two weeks, which provided data about the launch and aggregation strategy, mission profiles, high-level figures of merit, and subsystem-level vehicle sizes for each alternative. Expectedly, with their lower specific impulses, alternative NTP propellants showed significant growth in the overall mass required to execute each campaign, subsequently represented the number of drop tanks and launches. Further, the short-stay campaign option showed a similar overall mass required compared to its long-stay counterpart, but higher overall costs even given the fewer elements required. Both trade studies supported the overall hypothesis and that integrating the campaign and vehicle design processes addresses the coupling between then and directly shows the impacts of their sensitivities on each other. As a result, the research objective was fulfilled by producing a methodology that was able to address the key gaps identified in the current state of the art.Ph.D
    • …
    corecore