44 research outputs found
Robust filtering schemes for machine learning systems to defend Adversarial Attack
Robust filtering schemes for machine learning systems to defend Adversarial Attac
Determining Sequence of Image Processing Technique (IPT) to Detect Adversarial Attacks
Developing secure machine learning models from adversarial examples is
challenging as various methods are continually being developed to generate
adversarial attacks. In this work, we propose an evolutionary approach to
automatically determine Image Processing Techniques Sequence (IPTS) for
detecting malicious inputs. Accordingly, we first used a diverse set of attack
methods including adaptive attack methods (on our defense) to generate
adversarial samples from the clean dataset. A detection framework based on a
genetic algorithm (GA) is developed to find the optimal IPTS, where the
optimality is estimated by different fitness measures such as Euclidean
distance, entropy loss, average histogram, local binary pattern and loss
functions. The "image difference" between the original and processed images is
used to extract the features, which are then fed to a classification scheme in
order to determine whether the input sample is adversarial or clean. This paper
described our methodology and performed experiments using multiple data-sets
tested with several adversarial attacks. For each attack-type and dataset, it
generates unique IPTS. A set of IPTS selected dynamically in testing time which
works as a filter for the adversarial attack. Our empirical experiments
exhibited promising results indicating the approach can efficiently be used as
processing for any AI model
CIDMP: Completely Interpretable Detection of Malaria Parasite in Red Blood Cells using Lower-dimensional Feature Space
Predicting if red blood cells (RBC) are infected with the malaria parasite is
an important problem in Pathology. Recently, supervised machine learning
approaches have been used for this problem, and they have had reasonable
success. In particular, state-of-the-art methods such as Convolutional Neural
Networks automatically extract increasingly complex feature hierarchies from
the image pixels. While such generalized automatic feature extraction methods
have significantly reduced the burden of feature engineering in many domains,
for niche tasks such as the one we consider in this paper, they result in two
major problems. First, they use a very large number of features (that may or
may not be relevant) and therefore training such models is computationally
expensive. Further, more importantly, the large feature-space makes it very
hard to interpret which features are truly important for predictions. Thus, a
criticism of such methods is that learning algorithms pose opaque black boxes
to its users, in this case, medical experts. The recommendation of such
algorithms can be understood easily, but the reason for their recommendation is
not clear. This is the problem of non-interpretability of the model, and the
best-performing algorithms are usually the least interpretable. To address
these issues, in this paper, we propose an approach to extract a very small
number of aggregated features that are easy to interpret and compute, and
empirically show that we obtain high prediction accuracy even with a
significantly reduced feature-space.Comment: Accepted in The 2020 International Joint Conference on Neural
Networks (IJCNN 2020) At Glasgow (UK
Study of Different Deep Learning Approach with Explainable AI for Screening Patients with COVID-19 Symptoms: Using CT Scan and Chest X-ray Image Dataset
The outbreak of COVID-19 disease caused more than 100,000 deaths so far in
the USA alone. It is necessary to conduct an initial screening of patients with
the symptoms of COVID-19 disease to control the spread of the disease. However,
it is becoming laborious to conduct the tests with the available testing kits
due to the growing number of patients. Some studies proposed CT scan or chest
X-ray images as an alternative solution. Therefore, it is essential to use
every available resource, instead of either a CT scan or chest X-ray to conduct
a large number of tests simultaneously. As a result, this study aims to develop
a deep learning-based model that can detect COVID-19 patients with better
accuracy both on CT scan and chest X-ray image dataset. In this work, eight
different deep learning approaches such as VGG16, InceptionResNetV2, ResNet50,
DenseNet201, VGG19, MobilenetV2, NasNetMobile, and ResNet15V2 have been tested
on two dataset-one dataset includes 400 CT scan images, and another dataset
includes 400 chest X-ray images studied. Besides, Local Interpretable
Model-agnostic Explanations (LIME) is used to explain the model's
interpretability. Using LIME, test results demonstrate that it is conceivable
to interpret top features that should have worked to build a trust AI framework
to distinguish between patients with COVID-19 symptoms with other patients.Comment: This is a work in progress, it should not be relied upon without
context to guide clinical practice or health-related behavior and should not
be reported in news media as established information without consulting
multiple experts in the fiel
Case Study-Based Approach of Quantum Machine Learning in Cybersecurity: Quantum Support Vector Machine for Malware Classification and Protection
Quantum machine learning (QML) is an emerging field of research that
leverages quantum computing to improve the classical machine learning approach
to solve complex real world problems. QML has the potential to address
cybersecurity related challenges. Considering the novelty and complex
architecture of QML, resources are not yet explicitly available that can pave
cybersecurity learners to instill efficient knowledge of this emerging
technology. In this research, we design and develop QML-based ten learning
modules covering various cybersecurity topics by adopting student centering
case-study based learning approach. We apply one subtopic of QML on a
cybersecurity topic comprised of pre-lab, lab, and post-lab activities towards
providing learners with hands-on QML experiences in solving real-world security
problems. In order to engage and motivate students in a learning environment
that encourages all students to learn, pre-lab offers a brief introduction to
both the QML subtopic and cybersecurity problem. In this paper, we utilize
quantum support vector machine (QSVM) for malware classification and protection
where we use open source Pennylane QML framework on the drebin215 dataset. We
demonstrate our QSVM model and achieve an accuracy of 95% in malware
classification and protection. We will develop all the modules and introduce
them to the cybersecurity community in the coming days
Effect of Data Scaling Methods on Machine Learning Algorithms and Model Performance
Heart disease, one of the main reasons behind the high mortality rate around the world, requires a sophisticated and expensive diagnosis process. In the recent past, much literature has demonstrated machine learning approaches as an opportunity to efficiently diagnose heart disease patients. However, challenges associated with datasets such as missing data, inconsistent data, and mixed data (containing inconsistent missing data both as numerical and categorical) are often obstacles in medical diagnosis. This inconsistency led to a higher probability of misprediction and a misled result. Data preprocessing steps like feature reduction, data conversion, and data scaling are employed to form a standard dataset—such measures play a crucial role in reducing inaccuracy in final prediction. This paper aims to evaluate eleven machine learning (ML) algorithms—Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Naive Bayes (NB), Support Vector Machine (SVM), XGBoost (XGB), Random Forest Classifier (RF), Gradient Boost (GB), AdaBoost (AB), Extra Tree Classifier (ET)—and six different data scaling methods—Normalization (NR), Standscale (SS), MinMax (MM), MaxAbs (MA), Robust Scaler (RS), and Quantile Transformer (QT) on a dataset comprising of information of patients with heart disease. The result shows that CART, along with RS or QT, outperforms all other ML algorithms with 100% accuracy, 100% precision, 99% recall, and 100% F1 score. The study outcomes demonstrate that the model’s performance varies depending on the data scaling method.Open Access fees paid for in whole or in part by the University of Oklahoma Libraries.Ye
Introduction of Medical Imaging Modalities
The diagnosis and treatment of various diseases had been expedited with the
help of medical imaging. Different medical imaging modalities, including X-ray,
Computed Tomography (CT), Magnetic Resonance Imaging (MRI), Nuclear Imaging,
Ultrasound, Electrical Impedance Tomography (EIT), and Emerging Technologies
for in vivo imaging modalities is presented in this chapter, in addition to
these modalities, some advanced techniques such as contrast-enhanced MRI, MR
approaches for osteoarthritis, Cardiovascular Imaging, and Medical Imaging data
mining and search. Despite its important role and potential effectiveness as a
diagnostic tool, reading and interpreting medical images by radiologists is
often tedious and difficult due to the large heterogeneity of diseases and the
limitation of image quality or resolution. Besides the introduction and
discussion of the basic principles, typical clinical applications, advantages,
and limitations of each modality used in current clinical practice, this
chapter also highlights the importance of emerging technologies in medical
imaging and the role of data mining and search aiming to support translational
clinical research, improve patient care, and increase the efficiency of the
healthcare system.Comment: 19 pages, 7 figures, 1 table; Acceptance of the chapter for the
Springer book "Data-driven approaches to medical imaging
Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh]
<p>Abstract</p> <p>Background</p> <p>Pigeonpea [<it>Cajanus cajan </it>(L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping.</p> <p>Results</p> <p>In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped <it>in silico </it>identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population.</p> <p>Conclusion</p> <p>We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus <it>Cajanus</it>. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea.</p
Harmful and beneficial aspects of Parthenium hysterophorus: an update
Parthenium hysterophorus is a noxious weed in America, Asia, Africa and Australia. This weed is considered to be a cause of allergic respiratory problems, contact dermatitis, mutagenicity in human and livestock. Crop production is drastically reduced owing to its allelopathy. Also aggressive dominance of this weed threatens biodiversity. Eradication of P. hysterophorus by burning, chemical herbicides, eucalyptus oil and biological control by leaf-feeding beetle, stem-galling moth, stem-boring weevil and fungi have been carried out with variable degrees of success. Recently many innovative uses of this hitherto notorious plant have been discovered. Parthenium hysterophorus confers many health benefits, viz remedy for skin inflammation, rheumatic pain, diarrhoea, urinary tract infections, dysentery, malaria and neuralgia. Its prospect as nano-medicine is being carried out with some preliminary success so far. Removal of heavy metals and dye from the environment, eradication of aquatic weeds, use as substrate for commercial enzyme production, additives in cattle manure for biogas production, as biopesticide, as green manure and compost are to name a few of some other potentials. The active compounds responsible for hazardous properties have been summarized. The aim of this review article is to explore the problem P. hysterophorus poses as a weed, the effective control measures that can be implemented as well as to unravel the latent beneficial prospects of this weed
Robust filtering schemes for machine learning systems to defend Adversarial Attacks
Defenses against adversarial attacks are essential to ensure the reliability of machine learning models as their applications are expanding in different domains. Existing ML defense techniques have several limitations in practical use. I proposed a trustworthy framework that employs an adaptive strategy to inspect both inputs and decisions. In particular, data streams are examined by a series of diverse filters before sending to the learning system and then crossed checked its output through a diverse set of filters before making the final decision. My experimental results illustrated that the proposed active learning-based defense strategy could mitigate adaptive or advanced adversarial manipulations both in input and after with the model decision for a wide range of ML attacks by higher accuracy. Moreover, the output decision boundary inspection using a classification technique automatically reaffirms the reliability and increases the trustworthiness of any ML-Based decision support system. Unlike other defense strategies, my defense technique does not require adversarial sample generation, and updating the decision boundary for detection makes the defense systems robust to traditional adaptive attacks