14,069 research outputs found

    Reviewer Integration and Performance Measurement for Malware Detection

    Full text link
    We present and evaluate a large-scale malware detection system integrating machine learning with expert reviewers, treating reviewers as a limited labeling resource. We demonstrate that even in small numbers, reviewers can vastly improve the system's ability to keep pace with evolving threats. We conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years and containing 1.1 million binaries with 778GB of raw feature data. Without reviewer assistance, we achieve 72% detection at a 0.5% false positive rate, performing comparable to the best vendors on VirusTotal. Given a budget of 80 accurate reviews daily, we improve detection to 89% and are able to detect 42% of malicious binaries undetected upon initial submission to VirusTotal. Additionally, we identify a previously unnoticed temporal inconsistency in the labeling of training datasets. We compare the impact of training labels obtained at the same time training data is first seen with training labels obtained months later. We find that using training labels obtained well after samples appear, and thus unavailable in practice for current training data, inflates measured detection by almost 20 percentage points. We release our cluster-based implementation, as well as a list of all hashes in our evaluation and 3% of our entire dataset.Comment: 20 papers, 11 figures, accepted at the 13th Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA 2016

    Evolutionary algorithm-based analysis of gravitational microlensing lightcurves

    Full text link
    A new algorithm developed to perform autonomous fitting of gravitational microlensing lightcurves is presented. The new algorithm is conceptually simple, versatile and robust, and parallelises trivially; it combines features of extant evolutionary algorithms with some novel ones, and fares well on the problem of fitting binary-lens microlensing lightcurves, as well as on a number of other difficult optimisation problems. Success rates in excess of 90% are achieved when fitting synthetic though noisy binary-lens lightcurves, allowing no more than 20 minutes per fit on a desktop computer; this success rate is shown to compare very favourably with that of both a conventional (iterated simplex) algorithm, and a more state-of-the-art, artificial neural network-based approach. As such, this work provides proof of concept for the use of an evolutionary algorithm as the basis for real-time, autonomous modelling of microlensing events. Further work is required to investigate how the algorithm will fare when faced with more complex and realistic microlensing modelling problems; it is, however, argued here that the use of parallel computing platforms, such as inexpensive graphics processing units, should allow fitting times to be constrained to under an hour, even when dealing with complicated microlensing models. In any event, it is hoped that this work might stimulate some interest in evolutionary algorithms, and that the algorithm described here might prove useful for solving microlensing and/or more general model-fitting problems.Comment: 14 pages, 3 figures; accepted for publication in MNRA

    Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting

    Get PDF
    For person re-identification, existing deep networks often focus on representation learning. However, without transfer learning, the learned model is fixed as is, which is not adaptable for handling various unseen scenarios. In this paper, beyond representation learning, we consider how to formulate person image matching directly in deep feature maps. We treat image matching as finding local correspondences in feature maps, and construct query-adaptive convolution kernels on the fly to achieve local matching. In this way, the matching process and results are interpretable, and this explicit matching is more generalizable than representation features to unseen scenarios, such as unknown misalignments, pose or viewpoint changes. To facilitate end-to-end training of this architecture, we further build a class memory module to cache feature maps of the most recent samples of each class, so as to compute image matching losses for metric learning. Through direct cross-dataset evaluation, the proposed Query-Adaptive Convolution (QAConv) method gains large improvements over popular learning methods (about 10%+ mAP), and achieves comparable results to many transfer learning methods. Besides, a model-free temporal cooccurrence based score weighting method called TLift is proposed, which improves the performance to a further extent, achieving state-of-the-art results in cross-dataset person re-identification. Code is available at https://github.com/ShengcaiLiao/QAConv.Comment: This is the ECCV 2020 version, including the appendi

    Likelihood-ratio ranking of gravitational-wave candidates in a non-Gaussian background

    Get PDF
    We describe a general approach to detection of transient gravitational-wave signals in the presence of non-Gaussian background noise. We prove that under quite general conditions, the ratio of the likelihood of observed data to contain a signal to the likelihood of it being a noise fluctuation provides optimal ranking for the candidate events found in an experiment. The likelihood-ratio ranking allows us to combine different kinds of data into a single analysis. We apply the general framework to the problem of unifying the results of independent experiments and the problem of accounting for non-Gaussian artifacts in the searches for gravitational waves from compact binary coalescence in LIGO data. We show analytically and confirm through simulations that in both cases the likelihood ratio statistic results in an improved analysis.Comment: 10 pages, 6 figure
    • …
    corecore