27,288 research outputs found

    A Novel Framework for Robustness Analysis of Visual QA Models

    Full text link
    Deep neural networks have been playing an essential role in many computer vision tasks including Visual Question Answering (VQA). Until recently, the study of their accuracy was the main focus of research but now there is a trend toward assessing the robustness of these models against adversarial attacks by evaluating their tolerance to varying noise levels. In VQA, adversarial attacks can target the image and/or the proposed main question and yet there is a lack of proper analysis of the later. In this work, we propose a flexible framework that focuses on the language part of VQA that uses semantically relevant questions, dubbed basic questions, acting as controllable noise to evaluate the robustness of VQA models. We hypothesize that the level of noise is positively correlated to the similarity of a basic question to the main question. Hence, to apply noise on any given main question, we rank a pool of basic questions based on their similarity by casting this ranking task as a LASSO optimization problem. Then, we propose a novel robustness measure, R_score, and two large-scale basic question datasets (BQDs) in order to standardize robustness analysis for VQA models.Comment: Accepted by the Thirty-Third AAAI Conference on Artificial Intelligence, (AAAI-19), as an oral pape

    STARC: Structured Annotations for Reading Comprehension

    Full text link
    We present STARC (Structured Annotations for Reading Comprehension), a new annotation framework for assessing reading comprehension with multiple choice questions. Our framework introduces a principled structure for the answer choices and ties them to textual span annotations. The framework is implemented in OneStopQA, a new high-quality dataset for evaluation and analysis of reading comprehension in English. We use this dataset to demonstrate that STARC can be leveraged for a key new application for the development of SAT-like reading comprehension materials: automatic annotation quality probing via span ablation experiments. We further show that it enables in-depth analyses and comparisons between machine and human reading comprehension behavior, including error distributions and guessing ability. Our experiments also reveal that the standard multiple choice dataset in NLP, RACE, is limited in its ability to measure reading comprehension. 47% of its questions can be guessed by machines without accessing the passage, and 18% are unanimously judged by humans as not having a unique correct answer. OneStopQA provides an alternative test set for reading comprehension which alleviates these shortcomings and has a substantially higher human ceiling performance.Comment: ACL 2020. OneStopQA dataset, STARC guidelines and human experiments data are available at https://github.com/berzak/onestop-q

    How software engineering research aligns with design science: A review

    Full text link
    Background: Assessing and communicating software engineering research can be challenging. Design science is recognized as an appropriate research paradigm for applied research but is seldom referred to in software engineering. Applying the design science lens to software engineering research may improve the assessment and communication of research contributions. Aim: The aim of this study is 1) to understand whether the design science lens helps summarize and assess software engineering research contributions, and 2) to characterize different types of design science contributions in the software engineering literature. Method: In previous research, we developed a visual abstract template, summarizing the core constructs of the design science paradigm. In this study, we use this template in a review of a set of 38 top software engineering publications to extract and analyze their design science contributions. Results: We identified five clusters of papers, classifying them according to their alignment with the design science paradigm. Conclusions: The design science lens helps emphasize the theoretical contribution of research output---in terms of technological rules---and reflect on the practical relevance, novelty, and rigor of the rules proposed by the research.Comment: 32 pages, 10 figure
    • …
    corecore