22 research outputs found
A Context Aware Approach for Generating Natural Language Attacks
We study an important task of attacking natural language processing models in
a black box setting. We propose an attack strategy that crafts semantically
similar adversarial examples on text classification and entailment tasks. Our
proposed attack finds candidate words by considering the information of both
the original word and its surrounding context. It jointly leverages masked
language modelling and next sentence prediction for context understanding. In
comparison to attacks proposed in prior literature, we are able to generate
high quality adversarial examples that do significantly better both in terms of
success rate and word perturbation percentage.Comment: Accepted as Student Poster at AAAI 202
Generating Natural Language Attacks in a Hard Label Black Box Setting
We study an important and challenging task of attacking natural language
processing models in a hard label black box setting. We propose a
decision-based attack strategy that crafts high quality adversarial examples on
text classification and entailment tasks. Our proposed attack strategy
leverages population-based optimization algorithm to craft plausible and
semantically similar adversarial examples by observing only the top label
predicted by the target model. At each iteration, the optimization procedure
allow word replacements that maximizes the overall semantic similarity between
the original and the adversarial text. Further, our approach does not rely on
using substitute models or any kind of training data. We demonstrate the
efficacy of our proposed approach through extensive experimentation and
ablation studies on five state-of-the-art target models across seven benchmark
datasets. In comparison to attacks proposed in prior literature, we are able to
achieve a higher success rate with lower word perturbation percentage that too
in a highly restricted setting.Comment: Accepted at AAAI 2021 (Main Conference
Evaluating Generalizability of Deep Learning Models Using Indian-COVID-19 CT Dataset
Computer tomography (CT) have been routinely used for the diagnosis of lung
diseases and recently, during the pandemic, for detecting the infectivity and
severity of COVID-19 disease. One of the major concerns in using ma-chine
learning (ML) approaches for automatic processing of CT scan images in clinical
setting is that these methods are trained on limited and biased sub-sets of
publicly available COVID-19 data. This has raised concerns regarding the
generalizability of these models on external datasets, not seen by the model
during training. To address some of these issues, in this work CT scan images
from confirmed COVID-19 data obtained from one of the largest public
repositories, COVIDx CT 2A were used for training and internal vali-dation of
machine learning models. For the external validation we generated
Indian-COVID-19 CT dataset, an open-source repository containing 3D CT volumes
and 12096 chest CT images from 288 COVID-19 patients from In-dia. Comparative
performance evaluation of four state-of-the-art machine learning models, viz.,
a lightweight convolutional neural network (CNN), and three other CNN based
deep learning (DL) models such as VGG-16, ResNet-50 and Inception-v3 in
classifying CT images into three classes, viz., normal, non-covid pneumonia,
and COVID-19 is carried out on these two datasets. Our analysis showed that the
performance of all the models is comparable on the hold-out COVIDx CT 2A test
set with 90% - 99% accuracies (96% for CNN), while on the external
Indian-COVID-19 CT dataset a drop in the performance is observed for all the
models (8% - 19%). The traditional ma-chine learning model, CNN performed the
best on the external dataset (accu-racy 88%) in comparison to the deep learning
models, indicating that a light-weight CNN is better generalizable on unseen
data. The data and code are made available at https://github.com/aleesuss/c19
FPrep: Fuzzy clustering driven efficient automated pre-processing for fuzzy association rule mining
Abstract. Conventional Association Rule Mining (ARM) algorithms usually deal with datasets with binary values, and expect any numerical values to be converted to binary ones using sharp partitions, like Age = 25 to 60. In order to mitigate this constraint, Fuzzy logic is used to convert quantitative values of attributes to binary ones, so as to eliminate any loss of information arising due to sharp partitioning, especially at partition boundaries, and then generate fuzzy association rules. But, before any fuzzy ARM algorithm can be used, the original dataset (with crisp attributes) needs to be transformed into a form with fuzzy attributes. This paper describes a methodology, called FPrep, to do this pre-processing, which first involves using fuzzy clustering to generate fuzzy partitions, and then uses these partitions to get a fuzzy version (with fuzzy records) of the original dataset. Ultimately, the fuzzy data (fuzzy records) are represented in a standard manner such that they can be used as input to any kind of fuzzy ARM algorithm, irrespective of how it works and processes fuzzy data. We also show that FPrep is much faster than other such comparable transformation techniques, which in turn depend on non-fuzzy techniques, like hard clustering (CLARANS and CURE). Moreover, we illustrate the quality of the fuzzy partitions generated using FPrep, and the number of frequent itemsets generated by a fuzzy ARM algorithm when preceded by FPrep
On the Efficiency of Association-rule Mining Algorithms
In this paper, we first focus our attention on the question of how much space remains for performance improvement over current association rule mining algorithms. Our strategy is to compare their performance against an "Oracle algorithm" that knows in advance the identities of all frequent itemsets in the database and only needs to gather their actual supports to complete the mining process. Our experimental results show that current mining algorithms do not perform uniformly well with respect to the Oracle for all database characteristics and support thresholds. In many cases there is a substantial gap between the Oracle 's performance and that of the current mining algorithms. Second, we present a new mining algorithm, called ARMOR, that is constructed by making minimal changes to the Oracle algorithm. ARMOR consistently performs within a factor of two of the Oracle on both real and synthetic datasets over practical ranges of support specifications