5 research outputs found
Integrating the Local Property and Topological Structure in the Minimum Spanning Tree Brain Functional Network for Classification of Early Mild Cognitive Impairment
Abnormalities in the brain connectivity in patients with neurodegenerative diseases, such as early mild cognitive impairment (EMCI), have been widely reported. Current research shows that the combination of multiple features of the threshold connectivity network can improve the classification accuracy of diseases. However, in the construction of the threshold connectivity network, the selection of the threshold is very important, and an unreasonable setting can seriously affect the final classification results. Recent neuroscience research suggests that the minimum spanning tree (MST) brain functional network is helpful, as it avoids the methodological biases while comparing networks. In this paper, by employing the multikernel method, we propose a framework to integrate the multiple properties of the MST brain functional network for improving the classification performance. Initially, the Kruskal algorithm was used to construct an unbiased MST brain functional network. Subsequently, the vector kernel and graph kernel were used to quantify the two different complementary properties of the network, such as the local connectivity property and the topological property. Finally, the multikernel support vector machine (SVM) was adopted to combine the two different kernels for EMCI classification. We tested the performance of our proposed method for Alzheimer's Disease Neuroimaging Initiative (ANDI) datasets. The results showed that our method achieved a significant performance improvement, with the classification accuracy of 85%. The abnormal brain regions included the right hippocampus, left parahippocampal gyrus, left posterior cingulate gyrus, middle temporal gyrus, and other regions that are known to be important in the EMCI. Our results suggested that, combining the multiple features of the MST brain functional connectivity offered a better classification performance in the EMCI
Efficient, Scalable, and Accurate Program Fingerprinting in Binary Code
Why was this binary written? Which compiler was used? Which free software
packages did the developer use? Which sections of the code were borrowed? Who wrote
the binary? These questions are of paramount importance to security analysts and reverse
engineers, and binary fingerprinting approaches may provide valuable insights that can
help answer them. This thesis advances the state of the art by addressing some of the
most fundamental problems in program fingerprinting for binary code, notably, reusable
binary code discovery, fingerprinting free open source software packages, and authorship
attribution.
First, to tackle the problem of discovering reusable binary code, we employ a technique
for identifying reused functions by matching traces of a novel representation of binary
code known as the semantic integrated graph. This graph enhances the control flow
graph, the register flow graph, and the function call graph, key concepts from classical program analysis, and merges them with other structural information to create a joint data
structure. Second, we approach the problem of fingerprinting free open source software
(FOSS) packages by proposing a novel resilient and efficient system that incorporates
three components. The first extracts the syntactical features of functions by considering
opcode frequencies and performing a hidden Markov model statistical test. The second
applies a neighborhood hash graph kernel to random walks derived from control flow
graphs, with the goal of extracting the semantics of the functions. The third applies the
z-score to normalized instructions to extract the behavior of the instructions in a function.
Then, the components are integrated using a Bayesian network model which synthesizes
the results to determine the FOSS function, making it possible to detect user-related functions.
Third, with these elements now in place, we present a framework capable of decoupling
binary program functionality from the coding habits of authors. To capture coding habits,
the framework leverages a set of features that are based on collections of functionalityindependent
choices made by authors during coding. Finally, it is well known that techniques
such as refactoring and code transformations can significantly alter the structure
of code, even for simple programs. Applying such techniques or changing the compiler
and compilation settings can significantly affect the accuracy of available binary analysis
tools, which severely limits their practicability, especially when applied to malware. To
address these issues, we design a technique that extracts the semantics of binary code in terms of both data and control flow. The proposed technique allows more robust binary
analysis because the extracted semantics of the binary code is generally immune
from code transformation, refactoring, and varying the compilers or compilation settings.
Specifically, it employs data-flow analysis to extract the semantic flow of the registers as
well as the semantic components of the control flow graph, which are then synthesized
into a novel representation called the semantic flow graph (SFG).
We evaluate the framework on large-scale datasets extracted from selected open source
C++ projects on GitHub, Google Code Jam events, Planet Source Code contests, and students’
programming projects and found that it outperforms existing methods in several
respects. First, it is able to detect the reused functions. Second, it can identify FOSS
packages in real-world projects and reused binary functions with high precision. Third, it
decouples authorship from functionality so that it can be applied to real malware binaries
to automatically generate evidence of similar coding habits. Fourth, compared to existing
research contributions, it successfully attributes a larger number of authors with a significantly
higher accuracy. Finally, the new framework is more robust than previous methods
in the sense that there is no significant drop in accuracy when the code is subjected to
refactoring techniques, code transformation methods, and different compilers