247 research outputs found

    Adversarial Sample Detection for Deep Neural Network through Model Mutation Testing

    Full text link
    Deep neural networks (DNN) have been shown to be useful in a wide range of applications. However, they are also known to be vulnerable to adversarial samples. By transforming a normal sample with some carefully crafted human imperceptible perturbations, even highly accurate DNN make wrong decisions. Multiple defense mechanisms have been proposed which aim to hinder the generation of such adversarial samples. However, a recent work show that most of them are ineffective. In this work, we propose an alternative approach to detect adversarial samples at runtime. Our main observation is that adversarial samples are much more sensitive than normal samples if we impose random mutations on the DNN. We thus first propose a measure of `sensitivity' and show empirically that normal samples and adversarial samples have distinguishable sensitivity. We then integrate statistical hypothesis testing and model mutation testing to check whether an input sample is likely to be normal or adversarial at runtime by measuring its sensitivity. We evaluated our approach on the MNIST and CIFAR10 datasets. The results show that our approach detects adversarial samples generated by state-of-the-art attacking methods efficiently and accurately.Comment: Accepted by ICSE 201

    PASS-JOIN: A Partition-based Method for Similarity Joins

    Full text link
    As an essential operation in data cleaning, the similarity join has attracted considerable attention from the database community. In this paper, we study string similarity joins with edit-distance constraints, which find similar string pairs from two large sets of strings whose edit distance is within a given threshold. Existing algorithms are efficient either for short strings or for long strings, and there is no algorithm that can efficiently and adaptively support both short strings and long strings. To address this problem, we propose a partition-based method called Pass-Join. Pass-Join partitions a string into a set of segments and creates inverted indices for the segments. Then for each string, Pass-Join selects some of its substrings and uses the selected substrings to find candidate pairs using the inverted indices. We devise efficient techniques to select the substrings and prove that our method can minimize the number of selected substrings. We develop novel pruning techniques to efficiently verify the candidate pairs. Experimental results show that our algorithms are efficient for both short strings and long strings, and outperform state-of-the-art methods on real datasets.Comment: VLDB201

    PTE: Axiomatic Semantics based Compiler Testing

    Full text link
    The correctness of a compiler affects the correctness of every program written in the language, and thus must be thoroughly evaluated. Existing automatic compiler testing methods however either rely on weak oracles (e.g., a program behaves the same if only dead code is modified), or require substantial initial effort (e.g., having a complete operational language semantics). While the former prevents a comprehensive correctness evaluation, the latter makes those methods irrelevant in practice. In this work, we propose an axiomatic semantics based approach for testing compilers, called PTE. The idea is to incrementally develop a set of ``axioms'' capturing anecdotes of the language semantics in the form of \emph{(\textbf{p}recondition, \textbf{t}ransformation, \textbf{e}xpectation) triples, which allows us to test the compiler automatically.} Such axioms are written in the same language whose compiler is under test, and can be developed either based on the language specification, or by generalizing the bug reports. PTE has been applied to a newly developed compiler (i.e., Cangjie) and a mature compiler (i.e., Java), and successfully identified 42 implementation bugs and 9 potential language design issues

    Unsupervised String Transformation Learning for Entity Consolidation

    Full text link
    Data integration has been a long-standing challenge in data management with many applications. A key step in data integration is entity consolidation. It takes a collection of clusters of duplicate records as input and produces a single "golden record" for each cluster, which contains the canonical value for each attribute. Truth discovery and data fusion methods, as well as Master Data Management (MDM) systems, can be used for entity consolidation. However, to achieve better results, the variant values (i.e., values that are logically the same with different formats) in the clusters need to be consolidated before applying these methods. For this purpose, we propose a data-driven method to standardize the variant values based on two observations: (1) the variant values usually can be transformed to the same representation (e.g., "Mary Lee" and "Lee, Mary") and (2) the same transformation often appears repeatedly across different clusters (e.g., transpose the first and last name). Our approach first uses an unsupervised method to generate groups of value pairs that can be transformed in the same way (i.e., they share a transformation). Then the groups are presented to a human for verification and the approved ones are used to standardize the data. In a real-world dataset with 17,497 records, our method achieved 75% recall and 99.5% precision in standardizing variant values by asking a human 100 yes/no questions, which completely outperformed a state of the art data wrangling tool
    • …
    corecore