9,241 research outputs found
Evading Classifiers by Morphing in the Dark
Learning-based systems have been shown to be vulnerable to evasion through
adversarial data manipulation. These attacks have been studied under
assumptions that the adversary has certain knowledge of either the target model
internals, its training dataset or at least classification scores it assigns to
input samples. In this paper, we investigate a much more constrained and
realistic attack scenario wherein the target classifier is minimally exposed to
the adversary, revealing on its final classification decision (e.g., reject or
accept an input sample). Moreover, the adversary can only manipulate malicious
samples using a blackbox morpher. That is, the adversary has to evade the
target classifier by morphing malicious samples "in the dark". We present a
scoring mechanism that can assign a real-value score which reflects evasion
progress to each sample based on the limited information available. Leveraging
on such scoring mechanism, we propose an evasion method -- EvadeHC -- and
evaluate it against two PDF malware detectors, namely PDFRate and Hidost. The
experimental evaluation demonstrates that the proposed evasion attacks are
effective, attaining evasion rate on the evaluation dataset.
Interestingly, EvadeHC outperforms the known classifier evasion technique that
operates based on classification scores output by the classifiers. Although our
evaluations are conducted on PDF malware classifier, the proposed approaches
are domain-agnostic and is of wider application to other learning-based
systems
Binary Search Tree and Its Applications: A Survey
Binary search trees used as a data structure for rapid access to stored data. Arrays, vectors and linked lists data structures are limited by the trade-off between ability to perform fast search and resize easily. Complete and nearly complete binary search trees are of particularly significance. New version of insert-delete pair maintains random binary tree in a manner where all grandparents in tree always have both sub-trees full. In worst case binary search tree reduce to a linear link list, so reducing such search to sequential. In particular, we obtain a BST data structure that is O(log log n) competitive, satisfies the working set bound, dynamic ?nger bound and unified bound with an additive O(log log n) factor, and performs each access in worst-case O(log n) time
COMET: A Recipe for Learning and Using Large Ensembles on Massive Data
COMET is a single-pass MapReduce algorithm for learning on large-scale data.
It builds multiple random forest ensembles on distributed blocks of data and
merges them into a mega-ensemble. This approach is appropriate when learning
from massive-scale data that is too large to fit on a single machine. To get
the best accuracy, IVoting should be used instead of bagging to generate the
training subset for each decision tree in the random forest. Experiments with
two large datasets (5GB and 50GB compressed) show that COMET compares favorably
(in both accuracy and training time) to learning on a subsample of data using a
serial algorithm. Finally, we propose a new Gaussian approach for lazy ensemble
evaluation which dynamically decides how many ensemble members to evaluate per
data point; this can reduce evaluation cost by 100X or more
Synthesizing Short-Circuiting Validation of Data Structure Invariants
This paper presents incremental verification-validation, a novel approach for
checking rich data structure invariants expressed as separation logic
assertions. Incremental verification-validation combines static verification of
separation properties with efficient, short-circuiting dynamic validation of
arbitrarily rich data constraints. A data structure invariant checker is an
inductive predicate in separation logic with an executable interpretation; a
short-circuiting checker is an invariant checker that stops checking whenever
it detects at run time that an assertion for some sub-structure has been fully
proven statically. At a high level, our approach does two things: it statically
proves the separation properties of data structure invariants using a static
shape analysis in a standard way but then leverages this proof in a novel
manner to synthesize short-circuiting dynamic validation of the data
properties. As a consequence, we enable dynamic validation to make up for
imprecision in sound static analysis while simultaneously leveraging the static
verification to make the remaining dynamic validation efficient. We show
empirically that short-circuiting can yield asymptotic improvements in dynamic
validation, with low overhead over no validation, even in cases where static
verification is incomplete
- …