876 research outputs found
Recommended from our members
Uncovering Features in Behaviorally Similar Programs
The detection of similar code can support many so ware engineering tasks such as program understanding and program classification. Many excellent approaches have been proposed to detect programs having similar syntactic features. However, these approaches are unable to identify programs dynamically or statistically close to each other, which we call behaviorally similar programs. We believe the detection of behaviorally similar programs can enhance or even automate the tasks relevant to program classification. In this thesis, we will discuss our current approaches to identify programs having similar behavioral features in multiple perspectives.
We first discuss how to detect programs having similar functionality. While the definition of a programâs functionality is undecidable, we use inputs and outputs (I/Os) of programs as the proxy of their functionality. We then use I/Os of programs as a behavioral feature to detect which programs are functionally similar: two programs are functionally similar if they share similar inputs and outputs. This approach has been studied and developed in the C language to detect functionally equivalent programs having equivalent I/Os. Nevertheless, some natural problems in Object Oriented languages, such as input generation and comparisons between application-specific data types, hinder the development of this approach. We propose a new technique, in-vivo detection, which uses existing and meaningful inputs to drive applications systematically and then applies a novel similarity model considering both inputs and outputs of programs, to detect functionally similar programs. We develop the tool, HitoshiIO, based on our in-vivo detection. In the subjects that we study, HitoshiIO correctly detect 68.4% of functionally similar programs, where its false positive rate is only 16.6%.
In addition to functional I/Os of programs, we attempt to discover programs having similar execution behavior. Again, the execution behavior of a program can be undecidable, so we use instructions executed at run-time as a behavioral feature of a program. We create DyCLINK, which observes program executions and encodes them in dynamic instruction graphs. A vertex in a dynamic instruction graph is an instruction and an edge is a type of dependency between two instructions. The problem to detect which programs have similar executions can then be reduced to a problem of solving inexact graph isomorphism. We propose a link analysis based algorithm, LinkSub, which vectorizes each dynamic instruction graph by the importance of every instruction, to solve this graph isomorphism problem efficiently. In a K Nearest Neighbor (KNN) based program classification experiment, DyCLINK achieves 90 + % precision.
Because HitoshiIO and DyCLINK both rely on dynamic analysis to expose program behavior, they have better capability to locate and search for behaviorally similar programs than traditional static analysis tools. However, they suffer from some common problems of dynamic analysis, such as input generation and run-time overhead. These problems may make our approaches challenging to scale. Thus, we create the system, Macneto, which integrates static analysis with machine topic modeling and deep learning to approximate program behaviors from their binaries without truly executing programs. In our deobfuscation experiments considering two commercial obfuscators that alter lexical information and syntax in programs, Macneto achieves 90 + % precision, where the groundtruth is that the behavior of a program before and after obfuscation should be the same.
In this thesis, we offer a more extensive view of similar programs than the traditional definitions. While the traditional definitions of similar programs mostly use static features, such as syntax and lexical information, we propose to leverage the power of dynamic analysis and machine learning models to trace/collect behavioral features of pro- grams. These behavioral features of programs can then apply to detect behaviorally similar programs. We believe the techniques we invented in this thesis to detect behaviorally similar programs can improve the development of software engineering and security applications, such as code search and deobfuscation
Tagungsband zum 21. Kolloquium Programmiersprachen und Grundlagen der Programmierung
Das 21. Kolloquium Programmiersprachen und Grundlagen der Programmierung (KPS 2021) setzt eine traditionelle Reihe von Arbeitstagungen fort, die 1980 von den Forschungsgruppen der Professoren Friedrich L. Bauer (TU MĂŒnchen), Klaus Indermark (RWTH Aachen) und Hans Langmaack(CAU Kiel) ins Leben gerufen wurde.Die Veranstaltung ist ein offenes Forum fĂŒr alle interessierten deutschsprachigen Wissenschaftlerinnen und Wissenschaftler zum zwanglosen Austausch neuer Ideen und Ergebnisse aus den Forschungsbereichen Entwurf und Implementierung von Programmiersprachen sowie Grundlagen und Methodik des Programmierens. Dieser Tagungsband enthĂ€lt die wissenschaftlichen BeitrĂ€ge,die bei dem 21. Kolloquium dieser Tagungsreihe prĂ€sentiert wurden, welches vom 27. bis 29. September 2021 in Kiel stattfand und von der Arbeitsgruppe Programmiersprachen und Ăbersetzerkonstruktion der Christian-Albrechts-UniversitĂ€t zu Kiel organisiert wurde
Towards a Unifying Framework for Tuning Analysis Precision by Program Transformation
Static and dynamic program analyses attempt to extract useful information on programâs behaviours. Static analysis uses an abstract model of programs to reason on their runtime behaviour without actually running them, while dynamic analysis reasons on a test set of real program executions. For this reason, the precision of static analysis is limited by the presence of false positives (executions allowed by the abstract model that cannot happen at runtime), while the precision of dynamic analysis is limited by the presence of false negatives (real executions that are not in the test set). Researchers have developed many analysis techniques and tools in the attempt to increase the precision of program verification. Software protection is an interesting scenario where programs need to be protected from adversaries that use program analysis to understand their inner working and then exploit this knowledge to perform some illicit actions. Program analysis plays a dual role in program verification and software protection: in program verification we want the analysis to be as precise as possible, while in software protection we want to degrade the results of analysis as much as possible. Indeed, in software protection researchers usually recur to a special class of program transformations, called code obfuscation, to modify a program in order to make it more difficult to analyse while preserving its intended functionality. In this setting, it is interesting to study how program transformations that preserve the intended behaviour of programs can affect the precision of both static and dynamic analysis. While some works have been done in order to formalise the efficiency of code obfuscation in degrading static analysis and in the possibility of transforming programs in order to avoid or increase false positives, less attention has been posed to formalise the relation between program transformations and false negatives in dynamic analysis. In this work we are setting the scene for a formal investigation of the syntactic and semantic program features that affect the presence of false negatives in dynamic analysis. We believe that this understanding would be useful for improving the precision of existing dynamic analysis tools and in the design of program transformations that complicate the dynamic analysis
The Design, modeling and simulation of switching fabrics: For an ATM network switch
The requirements of today\u27s telecommunication systems to support high bandwidth and added flexibility brought about the expansion of (Asynchronous Transfer Mode) ATM as a new method of high-speed data transmission. Various analytical and simulation methods may be used to estimate the performance of ATM switches. Analytical methods considerably limit the range of parameters to be evaluated due to extensive formulae used and time consuming iterations. They are not as effective for large networks because of excessive computations that do not scale linearly with network size. One the other hand, simulation-based methods allow determining a bigger range of performance parameters in a shorter amount of time even for large networks. A simulation model, however, is more elaborate in terms of implementation. Instead of using formulae to obtain results, it has to operate software or hardware modules requiring a certain amount of effort to create. In this work simulation is accomplished by utilizing the ATM library - an object oriented software tool, which uses software chips for building ATM switches. The distinguishing feature of this approach is cut-through routing realized on the bit level abstraction treating ATM protocol data units, called cells, as groups of 424 bits. The arrival events of cells to the system are not instantaneous contrary to commonly used methods of simulation that consider cells as instant messages. The simulation was run for basic multistage interconnection network types with varying source arrival rate and buffer sizes producing a set of graphs of cell delays, throughput, cell loss probability, and queue sizes. The techniques of rearranging and sorting were considered in the simulation. The results indicate that better performance is always achieved by bringing additional stages of elements to the switching system
Responsible Composition and Optimization of Integration Processes under Correctness Preserving Guarantees
Enterprise Application Integration deals with the problem of connecting
heterogeneous applications, and is the centerpiece of current on-premise, cloud
and device integration scenarios. For integration scenarios, structurally
correct composition of patterns into processes and improvements of integration
processes are crucial. In order to achieve this, we formalize compositions of
integration patterns based on their characteristics, and describe optimization
strategies that help to reduce the model complexity, and improve the process
execution efficiency using design time techniques. Using the formalism of timed
DB-nets - a refinement of Petri nets - we model integration logic features such
as control- and data flow, transactional data storage, compensation and
exception handling, and time aspects that are present in reoccurring solutions
as separate integration patterns. We then propose a realization of optimization
strategies using graph rewriting, and prove that the optimizations we consider
preserve both structural and functional correctness. We evaluate the
improvements on a real-world catalog of pattern compositions, containing over
900 integration processes, and illustrate the correctness properties in case
studies based on two of these processes.Comment: 37 page
- âŠ