27,504 research outputs found
Building Program Vector Representations for Deep Learning
Deep learning has made significant breakthroughs in various fields of
artificial intelligence. Advantages of deep learning include the ability to
capture highly complicated features, weak involvement of human engineering,
etc. However, it is still virtually impossible to use deep learning to analyze
programs since deep architectures cannot be trained effectively with pure back
propagation. In this pioneering paper, we propose the "coding criterion" to
build program vector representations, which are the premise of deep learning
for program analysis. Our representation learning approach directly makes deep
learning a reality in this new field. We evaluate the learned vector
representations both qualitatively and quantitatively. We conclude, based on
the experiments, the coding criterion is successful in building program
representations. To evaluate whether deep learning is beneficial for program
analysis, we feed the representations to deep neural networks, and achieve
higher accuracy in the program classification task than "shallow" methods, such
as logistic regression and the support vector machine. This result confirms the
feasibility of deep learning to analyze programs. It also gives primary
evidence of its success in this new field. We believe deep learning will become
an outstanding technique for program analysis in the near future.Comment: This paper was submitted to ICSE'1
Code Flows: Visualizing Structural Evolution of Source Code
Understanding detailed changes done to source code is of great importance in software maintenance. We present Code Flows, a method to visualize the evolution of source code geared to the understanding of fine and mid-level scale changes across several file versions. We enhance an existing visual metaphor to depict software structure changes with techniques that emphasize both following unchanged code as well as detecting and highlighting important events such as code drift, splits, merges, insertions and deletions. The method is illustrated with the analysis of a real-world C++ code system.
On the Reverse Engineering of the Citadel Botnet
Citadel is an advanced information-stealing malware which targets financial
information. This malware poses a real threat against the confidentiality and
integrity of personal and business data. A joint operation was recently
conducted by the FBI and the Microsoft Digital Crimes Unit in order to take
down Citadel command-and-control servers. The operation caused some disruption
in the botnet but has not stopped it completely. Due to the complex structure
and advanced anti-reverse engineering techniques, the Citadel malware analysis
process is both challenging and time-consuming. This allows cyber criminals to
carry on with their attacks while the analysis is still in progress. In this
paper, we present the results of the Citadel reverse engineering and provide
additional insight into the functionality, inner workings, and open source
components of the malware. In order to accelerate the reverse engineering
process, we propose a clone-based analysis methodology. Citadel is an offspring
of a previously analyzed malware called Zeus; thus, using the former as a
reference, we can measure and quantify the similarities and differences of the
new variant. Two types of code analysis techniques are provided in the
methodology, namely assembly to source code matching and binary clone
detection. The methodology can help reduce the number of functions requiring
manual analysis. The analysis results prove that the approach is promising in
Citadel malware analysis. Furthermore, the same approach is applicable to
similar malware analysis scenarios.Comment: 10 pages, 17 figures. This is an updated / edited version of a paper
appeared in FPS 201
Code Flows: Visualizing Structural Evolution of Source Code
Understanding detailed changes done to source code is of great importance in software maintenance. We present Code Flows, a method to visualize the evolution of source code geared to the understanding of fine and mid-level scale changes across several file versions. We enhance an existing visual metaphor to depict software structure changes with techniques that emphasize both following unchanged code as well as detecting and highlighting important events such as code drift, splits, merges, insertions and deletions. The method is illustrated with the analysis of a real-world C++ code system.
On the Feasibility of Transfer-learning Code Smells using Deep Learning
Context: A substantial amount of work has been done to detect smells in
source code using metrics-based and heuristics-based methods. Machine learning
methods have been recently applied to detect source code smells; however, the
current practices are considered far from mature. Objective: First, explore the
feasibility of applying deep learning models to detect smells without extensive
feature engineering, just by feeding the source code in tokenized form. Second,
investigate the possibility of applying transfer-learning in the context of
deep learning models for smell detection. Method: We use existing metric-based
state-of-the-art methods for detecting three implementation smells and one
design smell in C# code. Using these results as the annotated gold standard, we
train smell detection models on three different deep learning architectures.
These architectures use Convolution Neural Networks (CNNs) of one or two
dimensions, or Recurrent Neural Networks (RNNs) as their principal hidden
layers. For the first objective of our study, we perform training and
evaluation on C# samples, whereas for the second objective, we train the models
from C# code and evaluate the models over Java code samples. We perform the
experiments with various combinations of hyper-parameters for each model.
Results: We find it feasible to detect smells using deep learning methods. Our
comparative experiments find that there is no clearly superior method between
CNN-1D and CNN-2D. We also observe that performance of the deep learning models
is smell-specific. Our transfer-learning experiments show that
transfer-learning is definitely feasible for implementation smells with
performance comparable to that of direct-learning. This work opens up a new
paradigm to detect code smells by transfer-learning especially for the
programming languages where the comprehensive code smell detection tools are
not available
- …