Search CORE

62 research outputs found

DeepProteomics: Protein family classification using Shallow and Deep Networks

Author: KP Soman
R Vinayakumar
Vazhayil Anu
Publication venue
Publication date: 11/09/2018
Field of study

The knowledge regarding the function of proteins is necessary as it gives a clear picture of biological processes. Nevertheless, there are many protein sequences found and added to the databases but lacks functional annotation. The laboratory experiments take a considerable amount of time for annotation of the sequences. This arises the need to use computational techniques to classify proteins based on their functions. In our work, we have collected the data from Swiss-Prot containing 40433 proteins which is grouped into 30 families. We pass it to recurrent neural network(RNN), long short term memory(LSTM) and gated recurrent unit(GRU) model and compare it by applying trigram with deep neural network and shallow neural network on the same dataset. Through this approach, we could achieve maximum of around 78% accuracy for the classification of protein families

arXiv.org e-Print Archive

DeepImageSpam: Deep Learning based Image Spam Detection

Author: KP Soman
Kumar Amara Dinesh
R Vinayakumar
Publication venue
Publication date: 03/10/2018
Field of study

Hackers and spammers are employing innovative and novel techniques to deceive novice and even knowledgeable internet users. Image spam is one of such technique where the spammer varies and changes some portion of the image such that it is indistinguishable from the original image fooling the users. This paper proposes a deep learning based approach for image spam detection using the convolutional neural networks which uses a dataset with 810 natural images and 928 spam images for classification achieving an accuracy of 91.7% outperforming the existing image processing and machine learning techniquesComment: 4 page

arXiv.org e-Print Archive

A Compendium on Network and Host based Intrusion Detection Systems

Author: K Rahul-Vigneswaran
KP Soman
Poornachandran Prabaharan
Publication venue
Publication date: 06/04/2019
Field of study

The techniques of deep learning have become the state of the art methodology for executing complicated tasks from various domains of computer vision, natural language processing, and several other areas. Due to its rapid development and promising benchmarks in those fields, researchers started experimenting with this technique to perform in the area of, especially in intrusion detection related tasks. Deep learning is a subset and a natural extension of classical Machine learning and an evolved model of neural networks. This paper contemplates and discusses all the methodologies related to the leading edge Deep learning and Neural network models purposing to the arena of Intrusion Detection Systems.Comment: 8 pages, Accepted for ICDSMLA 201

arXiv.org e-Print Archive

Vector Space Model as Cognitive Space for Text Classification

Author: HB Barathi Ganesh
KP Soman
M Anand Kumar
Publication venue
Publication date: 20/08/2017
Field of study

In this era of digitization, knowing the user's sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user's language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the task is to find the sociolect aspects of the users from their tweets. The sociolect aspects considered in this experiment are user's gender and native language information. Here user's tweets written in a different language from their native language are represented as Document - Term Matrix with document frequency as the constraint. Further classification is done using the Support Vector Machine by taking gender and native language as target classes. This experiment attains the average accuracy of 73.42% in gender prediction and 76.26% in the native language identification task.Comment: 6 pages, 6 figures, 3 table

arXiv.org e-Print Archive

A short review on Applications of Deep learning for Cyber security

Author: KP Soman
R Mohammed Harun Babu
R Vinayakumar
Publication venue
Publication date: 29/01/2019
Field of study

Deep learning is an advanced model of traditional machine learning. This has the capability to extract optimal feature representation from raw input samples. This has been applied towards various use cases in cyber security such as intrusion detection, malware classification, android malware detection, spam and phishing detection and binary analysis. This paper outlines the survey of all the works related to deep learning based solutions for various cyber security use cases. Keywords: Deep learning, intrusion detection, malware detection, Android malware detection, spam & phishing detection, traffic analysis, binary analysis.Comment: 15 page

arXiv.org e-Print Archive

A Deep Learning Approach for Similar Languages, Varieties and Dialects

Author: K Vidya Prasad
KP Soman
R Vinayakumar
S Akarsh
Publication venue
Publication date: 02/01/2019
Field of study

Deep learning mechanisms are prevailing approaches in recent days for the various tasks in natural language processing, speech recognition, image processing and many others. To leverage this we use deep learning based mechanism specifically Bidirectional- Long Short-Term Memory (B-LSTM) for the task of dialectic identification in Arabic and German broadcast speech and Long Short-Term Memory (LSTM) for discriminating between similar Languages. Two unique B-LSTM models are created using the Large-vocabulary Continuous Speech Recognition (LVCSR) based lexical features and a fixed length of 400 per utterance bottleneck features generated by i-vector framework. These models were evaluated on the VarDial 2017 datasets for the tasks Arabic, German dialect identification with dialects of Egyptian, Gulf, Levantine, North African, and MSA for Arabic and Basel, Bern, Lucerne, and Zurich for German. Also for the task of Discriminating between Similar Languages like Bosnian, Croatian and Serbian. The B-LSTM model showed accuracy of 0.246 on lexical features and accuracy of 0.577 bottleneck features of i-Vector framework.Comment: 17 page

arXiv.org e-Print Archive

A Brief Survey on Autonomous Vehicle Possible Attacks, Exploits and Vulnerabilities

Author: Chebrolu Koti Naga Renu
KP Soman
Kumar Amara Dinesh
R Vinayakumar
Publication venue
Publication date: 03/10/2018
Field of study

Advanced driver assistance systems are advancing at a rapid pace and all major companies started investing in developing the autonomous vehicles. But the security and reliability is still uncertain and debatable. Imagine that a vehicle is compromised by the attackers and then what they can do. An attacker can control brake, accelerate and even steering which can lead to catastrophic consequences. This paper gives a very short and brief overview of most of the possible attacks on autonomous vehicle software and hardware and their potential implications.Comment: 5 Pages,1 Figur

arXiv.org e-Print Archive

Deep Learning Approach for Enhanced Cyber Threat Indicators in Twitter Stream

Author: Balakrishna Prathiksha
K Simran
KP Soman
R Vinayakumar
Publication venue
Publication date: 30/03/2020
Field of study

In recent days, the amount of Cyber Security text data shared via social media resources mainly Twitter has increased. An accurate analysis of this data can help to develop cyber threat situational awareness framework for a cyber threat. This work proposes a deep learning based approach for tweet data analysis. To convert the tweets into numerical representations, various text representations are employed. These features are feed into deep learning architecture for optimal feature extraction as well as classification. Various hyperparameter tuning approaches are used for identifying optimal text representation method as well as optimal network parameters and network structures for deep learning models. For comparative analysis, the classical text representation method with classical machine learning algorithm is employed. From the detailed analysis of experiments, we found that the deep learning architecture with advanced text representation methods performed better than the classical text representation and classical machine learning algorithms. The primary reason for this is that the advanced text representation methods have the capability to learn sequential properties which exist among the textual data and deep learning architectures learns the optimal features along with decreasing the feature size.Comment: 11 page

arXiv.org e-Print Archive

Deep Learning Approach for Intelligent Named Entity Recognition of Cyber Security

Author: K Simran
KP Soman
R Vinayakumar
S Sriram
Publication venue
Publication date: 30/03/2020
Field of study

In recent years, the amount of Cyber Security data generated in the form of unstructured texts, for example, social media resources, blogs, articles, and so on has exceptionally increased. Named Entity Recognition (NER) is an initial step towards converting this unstructured data into structured data which can be used by a lot of applications. The existing methods on NER for Cyber Security data are based on rules and linguistic characteristics. A Deep Learning (DL) based approach embedded with Conditional Random Fields (CRFs) is proposed in this paper. Several DL architectures are evaluated to find the most optimal architecture. The combination of Bidirectional Gated Recurrent Unit (Bi-GRU), Convolutional Neural Network (CNN), and CRF performed better compared to various other DL frameworks on a publicly available benchmark dataset. This may be due to the reason that the bidirectional structures preserve the features related to the future and previous words in a sequence.Comment: 10 page

arXiv.org e-Print Archive

Deep Learning based Frameworks for Handling Imbalance in DGA, Email, and URL Data Analysis

Author: Balakrishna Prathiksha
K Simran
KP Soman
R Vinayakumar
Publication venue
Publication date: 30/03/2020
Field of study

Deep learning is a state of the art method for a lot of applications. The main issue is that most of the real-time data is highly imbalanced in nature. In order to avoid bias in training, cost-sensitive approach can be used. In this paper, we propose cost-sensitive deep learning based frameworks and the performance of the frameworks is evaluated on three different Cyber Security use cases which are Domain Generation Algorithm (DGA), Electronic mail (Email), and Uniform Resource Locator (URL). Various experiments were performed using cost-insensitive as well as cost-sensitive methods and parameters for both of these methods are set based on hyperparameter tuning. In all experiments, the cost-sensitive deep learning methods performed better than the cost-insensitive approaches. This is mainly due to the reason that cost-sensitive approach gives importance to the classes which have a very less number of samples during training and this helps to learn all the classes in a more efficient manner.Comment: 12 page

arXiv.org e-Print Archive