5 research outputs found

    Table of Contents

    Get PDF

    Integrating the Local Property and Topological Structure in the Minimum Spanning Tree Brain Functional Network for Classification of Early Mild Cognitive Impairment

    Get PDF
    Abnormalities in the brain connectivity in patients with neurodegenerative diseases, such as early mild cognitive impairment (EMCI), have been widely reported. Current research shows that the combination of multiple features of the threshold connectivity network can improve the classification accuracy of diseases. However, in the construction of the threshold connectivity network, the selection of the threshold is very important, and an unreasonable setting can seriously affect the final classification results. Recent neuroscience research suggests that the minimum spanning tree (MST) brain functional network is helpful, as it avoids the methodological biases while comparing networks. In this paper, by employing the multikernel method, we propose a framework to integrate the multiple properties of the MST brain functional network for improving the classification performance. Initially, the Kruskal algorithm was used to construct an unbiased MST brain functional network. Subsequently, the vector kernel and graph kernel were used to quantify the two different complementary properties of the network, such as the local connectivity property and the topological property. Finally, the multikernel support vector machine (SVM) was adopted to combine the two different kernels for EMCI classification. We tested the performance of our proposed method for Alzheimer's Disease Neuroimaging Initiative (ANDI) datasets. The results showed that our method achieved a significant performance improvement, with the classification accuracy of 85%. The abnormal brain regions included the right hippocampus, left parahippocampal gyrus, left posterior cingulate gyrus, middle temporal gyrus, and other regions that are known to be important in the EMCI. Our results suggested that, combining the multiple features of the MST brain functional connectivity offered a better classification performance in the EMCI

    Efficient, Scalable, and Accurate Program Fingerprinting in Binary Code

    Get PDF
    Why was this binary written? Which compiler was used? Which free software packages did the developer use? Which sections of the code were borrowed? Who wrote the binary? These questions are of paramount importance to security analysts and reverse engineers, and binary fingerprinting approaches may provide valuable insights that can help answer them. This thesis advances the state of the art by addressing some of the most fundamental problems in program fingerprinting for binary code, notably, reusable binary code discovery, fingerprinting free open source software packages, and authorship attribution. First, to tackle the problem of discovering reusable binary code, we employ a technique for identifying reused functions by matching traces of a novel representation of binary code known as the semantic integrated graph. This graph enhances the control flow graph, the register flow graph, and the function call graph, key concepts from classical program analysis, and merges them with other structural information to create a joint data structure. Second, we approach the problem of fingerprinting free open source software (FOSS) packages by proposing a novel resilient and efficient system that incorporates three components. The first extracts the syntactical features of functions by considering opcode frequencies and performing a hidden Markov model statistical test. The second applies a neighborhood hash graph kernel to random walks derived from control flow graphs, with the goal of extracting the semantics of the functions. The third applies the z-score to normalized instructions to extract the behavior of the instructions in a function. Then, the components are integrated using a Bayesian network model which synthesizes the results to determine the FOSS function, making it possible to detect user-related functions. Third, with these elements now in place, we present a framework capable of decoupling binary program functionality from the coding habits of authors. To capture coding habits, the framework leverages a set of features that are based on collections of functionalityindependent choices made by authors during coding. Finally, it is well known that techniques such as refactoring and code transformations can significantly alter the structure of code, even for simple programs. Applying such techniques or changing the compiler and compilation settings can significantly affect the accuracy of available binary analysis tools, which severely limits their practicability, especially when applied to malware. To address these issues, we design a technique that extracts the semantics of binary code in terms of both data and control flow. The proposed technique allows more robust binary analysis because the extracted semantics of the binary code is generally immune from code transformation, refactoring, and varying the compilers or compilation settings. Specifically, it employs data-flow analysis to extract the semantic flow of the registers as well as the semantic components of the control flow graph, which are then synthesized into a novel representation called the semantic flow graph (SFG). We evaluate the framework on large-scale datasets extracted from selected open source C++ projects on GitHub, Google Code Jam events, Planet Source Code contests, and students’ programming projects and found that it outperforms existing methods in several respects. First, it is able to detect the reused functions. Second, it can identify FOSS packages in real-world projects and reused binary functions with high precision. Third, it decouples authorship from functionality so that it can be applied to real malware binaries to automatically generate evidence of similar coding habits. Fourth, compared to existing research contributions, it successfully attributes a larger number of authors with a significantly higher accuracy. Finally, the new framework is more robust than previous methods in the sense that there is no significant drop in accuracy when the code is subjected to refactoring techniques, code transformation methods, and different compilers
    corecore