6 research outputs found

    What If We Simply Swap the Two Text Fragments? A Straightforward yet Effective Way to Test the Robustness of Methods to Confounding Signals in Nature Language Inference Tasks

    Full text link
    Nature language inference (NLI) task is a predictive task of determining the inference relationship of a pair of natural language sentences. With the increasing popularity of NLI, many state-of-the-art predictive models have been proposed with impressive performances. However, several works have noticed the statistical irregularities in the collected NLI data set that may result in an over-estimated performance of these models and proposed remedies. In this paper, we further investigate the statistical irregularities, what we refer as confounding factors, of the NLI data sets. With the belief that some NLI labels should preserve under swapping operations, we propose a simple yet effective way (swapping the two text fragments) of evaluating the NLI predictive models that naturally mitigate the observed problems. Further, we continue to train the predictive models with our swapping manner and propose to use the deviation of the model's evaluation performances under different percentages of training text fragments to be swapped to describe the robustness of a predictive model. Our evaluation metrics leads to some interesting understandings of recent published NLI methods. Finally, we also apply the swapping operation on NLI models to see the effectiveness of this straightforward method in mitigating the confounding factor problems in training generic sentence embeddings for other NLP transfer tasks.Comment: 8 pages, to appear at AAAI 1

    Documentation-Guided Fuzzing for Testing Deep Learning API Functions

    Get PDF
    Widely-used deep learning (DL) libraries demand reliability. Thus, it is integral to test DL libraries’ API functions. Despite the effectiveness of fuzz testing, there are few techniques that are specialized in fuzzing API functions of DL libraries. To fill this gap, we design and implement a fuzzing technique called DocTer for API functions of DL libraries. Fuzzing DL API functions is challenging because many API functions expect structured inputs that follow DL-specific constraints. If a fuzzer is (1) unaware of these constraints or (2) incapable of using these constraints to fuzz, it is practically impossible to generate valid inputs, i.e., inputs that follow these DL-specific constraints, to explore deep to test the core functionality of API functions. DocTer extracts DL-specific constraints from API documents and uses these constraints to guide the fuzzing to generate valid inputs automatically. DocTer also generates inputs that violate these constraints to test the input validity checking code. To reduce manual effort, DocTer applies a sequential pattern mining technique on API documents to help DocTer users create rules to extract constraints from API documents automatically. Our evaluation on three popular DL libraries (TensorFlow, PyTorch, and MXNet) shows that DocTer’s accuracy in extracting input constraints is 82.2-90.5%. DocTer detects 46 bugs, while a baseline fuzzer without input constraints detects only 19 bugs. Most (33) of the 46 bugs are previously unknown, 26 of which have been fixed or confirmed by developers after we report them. In addition, DocTer detects 37 inconsistencies within documents, including 25 fixed or confirmed after we report them
    corecore