22,070 research outputs found

    Type Learning for Binaries and its Applications

    Get PDF

    Reviewer Integration and Performance Measurement for Malware Detection

    Full text link
    We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system's ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 percentage points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3% of our entire dataset.Comment: 20 papers, 11 figures, accepted at the 13th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2016

    Gaia Eclipsing Binary and Multiple Systems. A study of detectability and classification of eclipsing binaries with Gaia

    Full text link
    In the new era of large-scale astronomical surveys, automated methods of analysis and classification of bulk data are a fundamental tool for fast and efficient production of deliverables. This becomes ever more imminent as we enter the Gaia era. We investigate the potential detectability of eclipsing binaries with Gaia using a data set of all Kepler eclipsing binaries sampled with Gaia cadence and folded with the Kepler period. The performance of fitting methods is evaluated with comparison to real Kepler data parameters and a classification scheme is proposed for the potentially detectable sources based on the geometry of the light curve fits. The polynomial chain (polyfit) and two-Gaussian models are used for light curve fitting of the data set. Classification is performed with a combination of the t-SNE (t-distrubuted Stochastic Neighbor Embedding) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithms. We find that approximately 68% of Kepler Eclipsing Binary sources are potentially detectable by Gaia when folded with the Kepler period and propose a classification scheme of the detectable sources based on the morphological type indicative of the light curve, with subclasses that reflect the properties of the fitted model (presence and visibility of eclipses, their width, depth, etc.).Comment: 9 pages, 18 figures, accepted for publication in Astronomy & Astrophysic

    CryptoKnight:generating and modelling compiled cryptographic primitives

    Get PDF
    Cryptovirological augmentations present an immediate, incomparable threat. Over the last decade, the substantial proliferation of crypto-ransomware has had widespread consequences for consumers and organisations alike. Established preventive measures perform well, however, the problem has not ceased. Reverse engineering potentially malicious software is a cumbersome task due to platform eccentricities and obfuscated transmutation mechanisms, hence requiring smarter, more efficient detection strategies. The following manuscript presents a novel approach for the classification of cryptographic primitives in compiled binary executables using deep learning. The model blueprint, a Dynamic Convolutional Neural Network (DCNN), is fittingly configured to learn from variable-length control flow diagnostics output from a dynamic trace. To rival the size and variability of equivalent datasets, and to adequately train our model without risking adverse exposure, a methodology for the procedural generation of synthetic cryptographic binaries is defined, using core primitives from OpenSSL with multivariate obfuscation, to draw a vastly scalable distribution. The library, CryptoKnight, rendered an algorithmic pool of AES, RC4, Blowfish, MD5 and RSA to synthesise combinable variants which automatically fed into its core model. Converging at 96% accuracy, CryptoKnight was successfully able to classify the sample pool with minimal loss and correctly identified the algorithm in a real-world crypto-ransomware applicatio
    • …
    corecore