14,069 research outputs found
Reviewer Integration and Performance Measurement for Malware Detection
We present and evaluate a large-scale malware detection system integrating
machine learning with expert reviewers, treating reviewers as a limited
labeling resource. We demonstrate that even in small numbers, reviewers can
vastly improve the system's ability to keep pace with evolving threats. We
conduct our evaluation on a sample of VirusTotal submissions spanning 2.5 years
and containing 1.1 million binaries with 778GB of raw feature data. Without
reviewer assistance, we achieve 72% detection at a 0.5% false positive rate,
performing comparable to the best vendors on VirusTotal. Given a budget of 80
accurate reviews daily, we improve detection to 89% and are able to detect 42%
of malicious binaries undetected upon initial submission to VirusTotal.
Additionally, we identify a previously unnoticed temporal inconsistency in the
labeling of training datasets. We compare the impact of training labels
obtained at the same time training data is first seen with training labels
obtained months later. We find that using training labels obtained well after
samples appear, and thus unavailable in practice for current training data,
inflates measured detection by almost 20 percentage points. We release our
cluster-based implementation, as well as a list of all hashes in our evaluation
and 3% of our entire dataset.Comment: 20 papers, 11 figures, accepted at the 13th Conference on Detection
of Intrusions and Malware & Vulnerability Assessment (DIMVA 2016
Evolutionary algorithm-based analysis of gravitational microlensing lightcurves
A new algorithm developed to perform autonomous fitting of gravitational
microlensing lightcurves is presented. The new algorithm is conceptually
simple, versatile and robust, and parallelises trivially; it combines features
of extant evolutionary algorithms with some novel ones, and fares well on the
problem of fitting binary-lens microlensing lightcurves, as well as on a number
of other difficult optimisation problems. Success rates in excess of 90% are
achieved when fitting synthetic though noisy binary-lens lightcurves, allowing
no more than 20 minutes per fit on a desktop computer; this success rate is
shown to compare very favourably with that of both a conventional (iterated
simplex) algorithm, and a more state-of-the-art, artificial neural
network-based approach. As such, this work provides proof of concept for the
use of an evolutionary algorithm as the basis for real-time, autonomous
modelling of microlensing events. Further work is required to investigate how
the algorithm will fare when faced with more complex and realistic microlensing
modelling problems; it is, however, argued here that the use of parallel
computing platforms, such as inexpensive graphics processing units, should
allow fitting times to be constrained to under an hour, even when dealing with
complicated microlensing models. In any event, it is hoped that this work might
stimulate some interest in evolutionary algorithms, and that the algorithm
described here might prove useful for solving microlensing and/or more general
model-fitting problems.Comment: 14 pages, 3 figures; accepted for publication in MNRA
Recommended from our members
Local search: A guide for the information retrieval practitioner
There are a number of combinatorial optimisation problems in information retrieval in which the use of local search methods are worthwhile. The purpose of this paper is to show how local search can be used to solve some well known tasks in information retrieval (IR), how previous research in the field is piecemeal, bereft of a structure and methodologically flawed, and to suggest more rigorous ways of applying local search methods to solve IR problems. We provide a query based taxonomy for analysing the use of local search in IR tasks and an overview of issues such as fitness functions, statistical significance and test collections when conducting experiments on combinatorial optimisation problems. The paper gives a guide on the pitfalls and problems for IR practitioners who wish to use local search to solve their research issues, and gives practical advice on the use of such methods. The query based taxonomy is a novel structure which can be used by the IR practitioner in order to examine the use of local search in IR
Interpretable and Generalizable Person Re-Identification with Query-Adaptive Convolution and Temporal Lifting
For person re-identification, existing deep networks often focus on
representation learning. However, without transfer learning, the learned model
is fixed as is, which is not adaptable for handling various unseen scenarios.
In this paper, beyond representation learning, we consider how to formulate
person image matching directly in deep feature maps. We treat image matching as
finding local correspondences in feature maps, and construct query-adaptive
convolution kernels on the fly to achieve local matching. In this way, the
matching process and results are interpretable, and this explicit matching is
more generalizable than representation features to unseen scenarios, such as
unknown misalignments, pose or viewpoint changes. To facilitate end-to-end
training of this architecture, we further build a class memory module to cache
feature maps of the most recent samples of each class, so as to compute image
matching losses for metric learning. Through direct cross-dataset evaluation,
the proposed Query-Adaptive Convolution (QAConv) method gains large
improvements over popular learning methods (about 10%+ mAP), and achieves
comparable results to many transfer learning methods. Besides, a model-free
temporal cooccurrence based score weighting method called TLift is proposed,
which improves the performance to a further extent, achieving state-of-the-art
results in cross-dataset person re-identification. Code is available at
https://github.com/ShengcaiLiao/QAConv.Comment: This is the ECCV 2020 version, including the appendi
Likelihood-ratio ranking of gravitational-wave candidates in a non-Gaussian background
We describe a general approach to detection of transient gravitational-wave
signals in the presence of non-Gaussian background noise. We prove that under
quite general conditions, the ratio of the likelihood of observed data to
contain a signal to the likelihood of it being a noise fluctuation provides
optimal ranking for the candidate events found in an experiment. The
likelihood-ratio ranking allows us to combine different kinds of data into a
single analysis. We apply the general framework to the problem of unifying the
results of independent experiments and the problem of accounting for
non-Gaussian artifacts in the searches for gravitational waves from compact
binary coalescence in LIGO data. We show analytically and confirm through
simulations that in both cases the likelihood ratio statistic results in an
improved analysis.Comment: 10 pages, 6 figure
- …