27,288 research outputs found
A Novel Framework for Robustness Analysis of Visual QA Models
Deep neural networks have been playing an essential role in many computer
vision tasks including Visual Question Answering (VQA). Until recently, the
study of their accuracy was the main focus of research but now there is a trend
toward assessing the robustness of these models against adversarial attacks by
evaluating their tolerance to varying noise levels. In VQA, adversarial attacks
can target the image and/or the proposed main question and yet there is a lack
of proper analysis of the later. In this work, we propose a flexible framework
that focuses on the language part of VQA that uses semantically relevant
questions, dubbed basic questions, acting as controllable noise to evaluate the
robustness of VQA models. We hypothesize that the level of noise is positively
correlated to the similarity of a basic question to the main question. Hence,
to apply noise on any given main question, we rank a pool of basic questions
based on their similarity by casting this ranking task as a LASSO optimization
problem. Then, we propose a novel robustness measure, R_score, and two
large-scale basic question datasets (BQDs) in order to standardize robustness
analysis for VQA models.Comment: Accepted by the Thirty-Third AAAI Conference on Artificial
Intelligence, (AAAI-19), as an oral pape
STARC: Structured Annotations for Reading Comprehension
We present STARC (Structured Annotations for Reading Comprehension), a new
annotation framework for assessing reading comprehension with multiple choice
questions. Our framework introduces a principled structure for the answer
choices and ties them to textual span annotations. The framework is implemented
in OneStopQA, a new high-quality dataset for evaluation and analysis of reading
comprehension in English. We use this dataset to demonstrate that STARC can be
leveraged for a key new application for the development of SAT-like reading
comprehension materials: automatic annotation quality probing via span ablation
experiments. We further show that it enables in-depth analyses and comparisons
between machine and human reading comprehension behavior, including error
distributions and guessing ability. Our experiments also reveal that the
standard multiple choice dataset in NLP, RACE, is limited in its ability to
measure reading comprehension. 47% of its questions can be guessed by machines
without accessing the passage, and 18% are unanimously judged by humans as not
having a unique correct answer. OneStopQA provides an alternative test set for
reading comprehension which alleviates these shortcomings and has a
substantially higher human ceiling performance.Comment: ACL 2020. OneStopQA dataset, STARC guidelines and human experiments
data are available at https://github.com/berzak/onestop-q
How software engineering research aligns with design science: A review
Background: Assessing and communicating software engineering research can be
challenging. Design science is recognized as an appropriate research paradigm
for applied research but is seldom referred to in software engineering.
Applying the design science lens to software engineering research may improve
the assessment and communication of research contributions. Aim: The aim of
this study is 1) to understand whether the design science lens helps summarize
and assess software engineering research contributions, and 2) to characterize
different types of design science contributions in the software engineering
literature. Method: In previous research, we developed a visual abstract
template, summarizing the core constructs of the design science paradigm. In
this study, we use this template in a review of a set of 38 top software
engineering publications to extract and analyze their design science
contributions. Results: We identified five clusters of papers, classifying them
according to their alignment with the design science paradigm. Conclusions: The
design science lens helps emphasize the theoretical contribution of research
output---in terms of technological rules---and reflect on the practical
relevance, novelty, and rigor of the rules proposed by the research.Comment: 32 pages, 10 figure
- …