6 research outputs found
Op2Vec: An Opcode Embedding Technique and Dataset Design for End-to-End Detection of Android Malware
Android is one of the leading operating systems for smart phones in terms of
market share and usage. Unfortunately, it is also an appealing target for
attackers to compromise its security through malicious applications. To tackle
this issue, domain experts and researchers are trying different techniques to
stop such attacks. All the attempts of securing Android platform are somewhat
successful. However, existing detection techniques have severe shortcomings,
including the cumbersome process of feature engineering. Designing
representative features require expert domain knowledge. There is a need for
minimizing human experts' intervention by circumventing handcrafted feature
engineering. Deep learning could be exploited by extracting deep features
automatically. Previous work has shown that operational codes (opcodes) of
executables provide key information to be used with deep learning models for
detection process of malicious applications. The only challenge is to feed
opcodes information to deep learning models. Existing techniques use one-hot
encoding to tackle the challenge. However, the one-hot encoding scheme has
severe limitations. In this paper, we introduce; (1) a novel technique for
opcodes embedding, which we name Op2Vec, (2) based on the learned Op2Vec we
have developed a dataset for end-to-end detection of android malware.
Introducing the end-to-end Android malware detection technique avoids
expert-intensive handcrafted features extraction, and ensures automation. Some
of the recent deep learning-based techniques showed significantly improved
results when tested with the proposed approach and achieved an average
detection accuracy of 97.47%, precision of 0.976 and F1 score of 0.979
Deep Learning Based Classification of Unsegmented Phonocardiogram Spectrograms Leveraging Transfer Learning
Cardiovascular diseases (CVDs) are the main cause of deaths all over the
world. Heart murmurs are the most common abnormalities detected during the
auscultation process. The two widely used publicly available phonocardiogram
(PCG) datasets are from the PhysioNet/CinC (2016) and PASCAL (2011) challenges.
The datasets are significantly different in terms of the tools used for data
acquisition, clinical protocols, digital storages and signal qualities, making
it challenging to process and analyze. In this work, we have used short-time
Fourier transform (STFT) based spectrograms to learn the representative
patterns of the normal and abnormal PCG signals. Spectrograms generated from
both the datasets are utilized to perform three different studies: (i) train,
validate and test different variants of convolutional neural network (CNN)
models with PhysioNet dataset, (ii) train, validate and test the best
performing CNN structure on combined PhysioNet-PASCAL dataset and (iii)
finally, transfer learning technique is employed to train the best performing
pre-trained network from the first study with PASCAL dataset. We propose a
novel, less complex and relatively light custom CNN model for the
classification of PhysioNet, combined and PASCAL datasets. The first study
achieves an accuracy, sensitivity, specificity, precision and F1 score of
95.4%, 96.3%, 92.4%, 97.6% and 96.98% respectively while the second study shows
accuracy, sensitivity, specificity, precision and F1 score of 94.2%, 95.5%,
90.3%, 96.8% and 96.1% respectively. Finally, the third study shows a precision
of 98.29% on the noisy PASCAL dataset with transfer learning approach. All the
three proposed approaches outperform most of the recent competing studies by
achieving comparatively high classification accuracy and precision, which make
them suitable for screening CVDs using PCG signals
ImageCAS: A Large-Scale Dataset and Benchmark for Coronary Artery Segmentation based on Computed Tomography Angiography Images
Cardiovascular disease (CVD) accounts for about half of non-communicable
diseases. Vessel stenosis in the coronary artery is considered to be the major
risk of CVD. Computed tomography angiography (CTA) is one of the widely used
noninvasive imaging modalities in coronary artery diagnosis due to its superior
image resolution. Clinically, segmentation of coronary arteries is essential
for the diagnosis and quantification of coronary artery disease. Recently, a
variety of works have been proposed to address this problem. However, on one
hand, most works rely on in-house datasets, and only a few works published
their datasets to the public which only contain tens of images. On the other
hand, their source code have not been published, and most follow-up works have
not made comparison with existing works, which makes it difficult to judge the
effectiveness of the methods and hinders the further exploration of this
challenging yet critical problem in the community. In this paper, we propose a
large-scale dataset for coronary artery segmentation on CTA images. In
addition, we have implemented a benchmark in which we have tried our best to
implement several typical existing methods. Furthermore, we propose a strong
baseline method which combines multi-scale patch fusion and two-stage
processing to extract the details of vessels. Comprehensive experiments show
that the proposed method achieves better performance than existing works on the
proposed large-scale dataset. The benchmark and the dataset are published at
https://github.com/XiaoweiXu/ImageCAS-A-Large-Scale-Dataset-and-Benchmark-for-Coronary-Artery-Segmentation-based-on-CT.Comment: 17 pages, 12 figures, 4 table
Deep learning based classification of unsegmented phonocardiogram spectrograms leveraging transfer learning
Objective. Cardiovascular diseases (CVDs) are a main cause of deaths all over the world. This research focuses on computer-aided analysis of phonocardiogram (PCG) signals based on deep learning that can enable improved and timely detection of heart abnormalities. The two widely used publicly available PCG datasets are from the PhysioNet/CinC (2016) and PASCAL (2011) challenges. The datasets are significantly different in terms of the tools used for data acquisition, clinical protocols, digital storages and signal qualities, making it challenging to process and analyze. Approach. In this work, we have used short-time Fourier transform-based spectrograms to learn the representative patterns of the normal and abnormal PCG signals. Spectrograms generated from both the datasets are utilized to perform four different studies: (i) train, validate and test different variants of convolutional neural network (CNN) models with PhysioNet dataset, (ii) train, validate and test the best performing CNN structure on the PASCAL dataset, as well as (iii) on the combined PhysioNet-PASCAL dataset and (iv) finally, the transfer learning technique is employed to train the best performing pre-trained network from the first study with PASCAL dataset. Main results. The first study achieves an accuracy, sensitivity, specificity, precision and F1 scores of 95.75%, 96.3%, 94.1%, 97.52%, and 96.93%, respectively, while the second study shows accuracy, sensitivity, specificity, precision and F1 scores of 75.25%, 74.2%, 76.4%, 76.73%, and 75.42%, respectively. The third study shows accuracy, sensitivity, specificity, precision and F1 scores of 92.7%, 94.98%, 89.95%, 95.3% and 94.6%, respectively. Finally, the fourth study shows a precision of 96.98% on the noisy PASCAL dataset with transfer learning approach. Significance. The proposed approach employs a less complex and relatively light custom CNN model that outperforms most of the recent competing studies by achieving comparatively high classification accuracy and precision, making it suitable for screening CVDs using PCG signals. 2021 Institute of Physics and Engineering in Medicine.Scopu