14,963 research outputs found
Feature Engineering for Predictive Modeling using Reinforcement Learning
Feature engineering is a crucial step in the process of predictive modeling.
It involves the transformation of given feature space, typically using
mathematical functions, with the objective of reducing the modeling error for a
given target. However, there is no well-defined basis for performing effective
feature engineering. It involves domain knowledge, intuition, and most of all,
a lengthy process of trial and error. The human attention involved in
overseeing this process significantly influences the cost of model generation.
We present a new framework to automate feature engineering. It is based on
performance driven exploration of a transformation graph, which systematically
and compactly enumerates the space of given options. A highly efficient
exploration strategy is derived through reinforcement learning on past
examples
CryptoKnight:generating and modelling compiled cryptographic primitives
Cryptovirological augmentations present an immediate, incomparable threat. Over the last decade, the substantial proliferation of crypto-ransomware has had widespread consequences for consumers and organisations alike. Established preventive measures perform well, however, the problem has not ceased. Reverse engineering potentially malicious software is a cumbersome task due to platform eccentricities and obfuscated transmutation mechanisms, hence requiring smarter, more efficient detection strategies. The following manuscript presents a novel approach for the classification of cryptographic primitives in compiled binary executables using deep learning. The model blueprint, a Dynamic Convolutional Neural Network (DCNN), is fittingly configured to learn from variable-length control flow diagnostics output from a dynamic trace. To rival the size and variability of equivalent datasets, and to adequately train our model without risking adverse exposure, a methodology for the procedural generation of synthetic cryptographic binaries is defined, using core primitives from OpenSSL with multivariate obfuscation, to draw a vastly scalable distribution. The library, CryptoKnight, rendered an algorithmic pool of AES, RC4, Blowfish, MD5 and RSA to synthesise combinable variants which automatically fed into its core model. Converging at 96% accuracy, CryptoKnight was successfully able to classify the sample pool with minimal loss and correctly identified the algorithm in a real-world crypto-ransomware applicatio
- …