232 research outputs found
Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing
Deep neural networks (DNN) have been shown to be useful in a wide range of
applications. However, they are also known to be vulnerable to adversarial
samples. By transforming a normal sample with some carefully crafted human
imperceptible perturbations, even highly accurate DNN make wrong decisions.
Multiple defense mechanisms have been proposed which aim to hinder the
generation of such adversarial samples. However, a recent work show that most
of them are ineffective. In this work, we propose an alternative approach to
detect adversarial samples at runtime. Our main observation is that adversarial
samples are much more sensitive than normal samples if we impose random
mutations on the DNN. We thus first propose a measure of `sensitivity' and show
empirically that normal samples and adversarial samples have distinguishable
sensitivity. We then integrate statistical hypothesis testing and model
mutation testing to check whether an input sample is likely to be normal or
adversarial at runtime by measuring its sensitivity. We evaluated our approach
on the MNIST and CIFAR10 datasets. The results show that our approach detects
adversarial samples generated by state-of-the-art attacking methods efficiently
and accurately.Comment: Accepted by ICSE 201
PASS-JOIN: A Partition-based Method for Similarity Joins
As an essential operation in data cleaning, the similarity join has attracted
considerable attention from the database community. In this paper, we study
string similarity joins with edit-distance constraints, which find similar
string pairs from two large sets of strings whose edit distance is within a
given threshold. Existing algorithms are efficient either for short strings or
for long strings, and there is no algorithm that can efficiently and adaptively
support both short strings and long strings. To address this problem, we
propose a partition-based method called Pass-Join. Pass-Join partitions a
string into a set of segments and creates inverted indices for the segments.
Then for each string, Pass-Join selects some of its substrings and uses the
selected substrings to find candidate pairs using the inverted indices. We
devise efficient techniques to select the substrings and prove that our method
can minimize the number of selected substrings. We develop novel pruning
techniques to efficiently verify the candidate pairs. Experimental results show
that our algorithms are efficient for both short strings and long strings, and
outperform state-of-the-art methods on real datasets.Comment: VLDB201
Unsupervised String Transformation Learning for Entity Consolidation
Data integration has been a long-standing challenge in data management with
many applications. A key step in data integration is entity consolidation. It
takes a collection of clusters of duplicate records as input and produces a
single "golden record" for each cluster, which contains the canonical value for
each attribute. Truth discovery and data fusion methods, as well as Master Data
Management (MDM) systems, can be used for entity consolidation. However, to
achieve better results, the variant values (i.e., values that are logically the
same with different formats) in the clusters need to be consolidated before
applying these methods.
For this purpose, we propose a data-driven method to standardize the variant
values based on two observations: (1) the variant values usually can be
transformed to the same representation (e.g., "Mary Lee" and "Lee, Mary") and
(2) the same transformation often appears repeatedly across different clusters
(e.g., transpose the first and last name). Our approach first uses an
unsupervised method to generate groups of value pairs that can be transformed
in the same way (i.e., they share a transformation). Then the groups are
presented to a human for verification and the approved ones are used to
standardize the data. In a real-world dataset with 17,497 records, our method
achieved 75% recall and 99.5% precision in standardizing variant values by
asking a human 100 yes/no questions, which completely outperformed a state of
the art data wrangling tool
Asymptotic Soft Filter Pruning for Deep Convolutional Neural Networks
Deeper and wider Convolutional Neural Networks (CNNs) achieve superior
performance but bring expensive computation cost. Accelerating such
over-parameterized neural network has received increased attention. A typical
pruning algorithm is a three-stage pipeline, i.e., training, pruning, and
retraining. Prevailing approaches fix the pruned filters to zero during
retraining, and thus significantly reduce the optimization space. Besides, they
directly prune a large number of filters at first, which would cause
unrecoverable information loss. To solve these problems, we propose an
Asymptotic Soft Filter Pruning (ASFP) method to accelerate the inference
procedure of the deep neural networks. First, we update the pruned filters
during the retraining stage. As a result, the optimization space of the pruned
model would not be reduced but be the same as that of the original model. In
this way, the model has enough capacity to learn from the training data.
Second, we prune the network asymptotically. We prune few filters at first and
asymptotically prune more filters during the training procedure. With
asymptotic pruning, the information of the training set would be gradually
concentrated in the remaining filters, so the subsequent training and pruning
process would be stable. Experiments show the effectiveness of our ASFP on
image classification benchmarks. Notably, on ILSVRC-2012, our ASFP reduces more
than 40% FLOPs on ResNet-50 with only 0.14% top-5 accuracy degradation, which
is higher than the soft filter pruning (SFP) by 8%.Comment: Extended Journal Version of arXiv:1808.0686
- …