28 research outputs found
Adversarial Black-Box Attacks on Automatic Speech Recognition Systems using Multi-Objective Evolutionary Optimization
Fooling deep neural networks with adversarial input have exposed a
significant vulnerability in the current state-of-the-art systems in multiple
domains. Both black-box and white-box approaches have been used to either
replicate the model itself or to craft examples which cause the model to fail.
In this work, we propose a framework which uses multi-objective evolutionary
optimization to perform both targeted and un-targeted black-box attacks on
Automatic Speech Recognition (ASR) systems. We apply this framework on two ASR
systems: Deepspeech and Kaldi-ASR, which increases the Word Error Rates (WER)
of these systems by upto 980%, indicating the potency of our approach. During
both un-targeted and targeted attacks, the adversarial samples maintain a high
acoustic similarity of 0.98 and 0.97 with the original audio.Comment: Published in Interspeech 201
Model-based annotation of coreference
Humans do not make inferences over texts, but over models of what texts are
about. When annotators are asked to annotate coreferent spans of text, it is
therefore a somewhat unnatural task. This paper presents an alternative in
which we preprocess documents, linking entities to a knowledge base, and turn
the coreference annotation task -- in our case limited to pronouns -- into an
annotation task where annotators are asked to assign pronouns to entities.
Model-based annotation is shown to lead to faster annotation and higher
inter-annotator agreement, and we argue that it also opens up for an
alternative approach to coreference resolution. We present two new coreference
benchmark datasets, for English Wikipedia and English teacher-student
dialogues, and evaluate state-of-the-art coreference resolvers on them.Comment: To appear in LREC 202
A Visual Programming Paradigm for Abstract Deep Learning Model Development
Deep learning is one of the fastest growing technologies in computer science
with a plethora of applications. But this unprecedented growth has so far been
limited to the consumption of deep learning experts. The primary challenge
being a steep learning curve for learning the programming libraries and the
lack of intuitive systems enabling non-experts to consume deep learning.
Towards this goal, we study the effectiveness of a no-code paradigm for
designing deep learning models. Particularly, a visual drag-and-drop interface
is found more efficient when compared with the traditional programming and
alternative visual programming paradigms. We conduct user studies of different
expertise levels to measure the entry level barrier and the developer load
across different programming paradigms. We obtain a System Usability Scale
(SUS) of 90 and a NASA Task Load index (TLX) score of 21 for the proposed
visual programming compared to 68 and 52, respectively, for the traditional
programming methods
Sanskrit Sandhi Splitting using seq2(seq)^2
In Sanskrit, small words (morphemes) are combined to form compound words
through a process known as Sandhi. Sandhi splitting is the process of splitting
a given compound word into its constituent morphemes. Although rules governing
word splitting exists in the language, it is highly challenging to identify the
location of the splits in a compound word. Though existing Sandhi splitting
systems incorporate these pre-defined splitting rules, they have a low accuracy
as the same compound word might be broken down in multiple ways to provide
syntactically correct splits.
In this research, we propose a novel deep learning architecture called Double
Decoder RNN (DD-RNN), which (i) predicts the location of the split(s) with 95%
accuracy, and (ii) predicts the constituent words (learning the Sandhi
splitting rules) with 79.5% accuracy, outperforming the state-of-art by 20%.
Additionally, we show the generalization capability of our deep learning model,
by showing competitive results in the problem of Chinese word segmentation, as
well.Comment: Accepted in EMNLP 201
Ellipsis Resolution as Question Answering: An Evaluation
Most, if not all forms of ellipsis (e.g., so does Mary) are similar to
reading comprehension questions (what does Mary do), in that in order to
resolve them, we need to identify an appropriate text span in the preceding
discourse. Following this observation, we present an alternative approach for
English ellipsis resolution relying on architectures developed for question
answering (QA). We present both single-task models, and joint models trained on
auxiliary QA and coreference resolution datasets, clearly outperforming the
current state of the art for Sluice Ellipsis (from 70.00 to 86.01 F1) and Verb
Phrase Ellipsis (from 72.89 to 78.66 F1).Comment: To appear in EACL 202