391,739 research outputs found
An intrusion detection system for packet and flow based networks using deep neural network approach
Study on deep neural networks and big data is merging now by several aspects to enhance the capabilities of intrusion detection system (IDS). Many IDS models has been introduced to provide security over big data. This study focuses on the intrusion detection in computer networks using big datasets. The advent of big data has agitated the comprehensive assistance in cyber security by forwarding a brunch of affluent algorithms to classify and analysis patterns and making a better prediction more efficiently. In this study, to detect intrusion a detection model has been propounded applying deep neural networks. We applied the suggested model on the latest data set available at online, formatted with packet based, flow based data and some additional metadata. The data set is labeled and imbalanced with 79 attributes and some classes having much less training samples compared to other classes. The proposed model is build using Keras and Google Tensorflow deep learning environment. Experimental result shows that intrusions are detected with the accuracy over 99% for both binary and multi-class classification with selected best features. Receiver operating characteristics (ROC) and precision-recall curve average score is also 1. The outcome implies that Deep Neural Networks offers a novel research model with great accuracy for intrusion detection model, better than some models presented in the literature
Statistics in the Big Data era
It is estimated that about 90% of the currently available data have been produced over the last two years. Of these, only 0.5% is effectively analysed and used. However, this data can be a great wealth, the oil of 21st century, when analysed with the right approach. In this article, we illustrate some specificities of these data and the great interest that they can represent in many fields. Then we consider some challenges to statistical analysis that emerge from their analysis, suggesting some strategies
Distributed Correlation-Based Feature Selection in Spark
CFS (Correlation-Based Feature Selection) is an FS algorithm that has been
successfully applied to classification problems in many domains. We describe
Distributed CFS (DiCFS) as a completely redesigned, scalable, parallel and
distributed version of the CFS algorithm, capable of dealing with the large
volumes of data typical of big data applications. Two versions of the algorithm
were implemented and compared using the Apache Spark cluster computing model,
currently gaining popularity due to its much faster processing times than
Hadoop's MapReduce model. We tested our algorithms on four publicly available
datasets, each consisting of a large number of instances and two also
consisting of a large number of features. The results show that our algorithms
were superior in terms of both time-efficiency and scalability. In leveraging a
computer cluster, they were able to handle larger datasets than the
non-distributed WEKA version while maintaining the quality of the results,
i.e., exactly the same features were returned by our algorithms when compared
to the original algorithm available in WEKA.Comment: 25 pages, 5 figure
Personality cannot be predicted from the power of resting state EEG
In the present study we asked whether it is possible to decode personality
traits from resting state EEG data. EEG was recorded from a large sample of
subjects (N = 309) who had answered questionnaires measuring personality trait
scores of the 5 dimensions as well as the 10 subordinate aspects of the Big
Five. Machine learning algorithms were used to build a classifier to predict
each personality trait from power spectra of the resting state EEG data. The
results indicate that the five dimensions as well as their subordinate aspects
could not be predicted from the resting state EEG data. Finally, to demonstrate
that this result is not due to systematic algorithmic or implementation
mistakes the same methods were used to successfully classify whether the
subject had eyes open or eyes closed and whether the subject was male or
female. These results indicate that the extraction of personality traits from
the power spectra of resting state EEG is extremely noisy, if possible at all.Comment: 14 pages, 4 figure
Predicting customer's gender and age depending on mobile phone data
In the age of data driven solution, the customer demographic attributes, such
as gender and age, play a core role that may enable companies to enhance the
offers of their services and target the right customer in the right time and
place. In the marketing campaign, the companies want to target the real user of
the GSM (global system for mobile communications), not the line owner. Where
sometimes they may not be the same. This work proposes a method that predicts
users' gender and age based on their behavior, services and contract
information. We used call detail records (CDRs), customer relationship
management (CRM) and billing information as a data source to analyze telecom
customer behavior, and applied different types of machine learning algorithms
to provide marketing campaigns with more accurate information about customer
demographic attributes. This model is built using reliable data set of 18,000
users provided by SyriaTel Telecom Company, for training and testing. The model
applied by using big data technology and achieved 85.6% accuracy in terms of
user gender prediction and 65.5% of user age prediction. The main contribution
of this work is the improvement in the accuracy in terms of user gender
prediction and user age prediction based on mobile phone data and end-to-end
solution that approaches customer data from multiple aspects in the telecom
domain
Smart Asset Management for Electric Utilities: Big Data and Future
This paper discusses about future challenges in terms of big data and new
technologies. Utilities have been collecting data in large amounts but they are
hardly utilized because they are huge in amount and also there is uncertainty
associated with it. Condition monitoring of assets collects large amounts of
data during daily operations. The question arises "How to extract information
from large chunk of data?" The concept of "rich data and poor information" is
being challenged by big data analytics with advent of machine learning
techniques. Along with technological advancements like Internet of Things
(IoT), big data analytics will play an important role for electric utilities.
In this paper, challenges are answered by pathways and guidelines to make the
current asset management practices smarter for the future.Comment: 13 pages, 3 figures, Proceedings of 12th World Congress on
Engineering Asset Management (WCEAM) 201
Mapping Big Data into Knowledge Space with Cognitive Cyber-Infrastructure
Big data research has attracted great attention in science, technology,
industry and society. It is developing with the evolving scientific paradigm,
the fourth industrial revolution, and the transformational innovation of
technologies. However, its nature and fundamental challenge have not been
recognized, and its own methodology has not been formed. This paper explores
and answers the following questions: What is big data? What are the basic
methods for representing, managing and analyzing big data? What is the
relationship between big data and knowledge? Can we find a mapping from big
data into knowledge space? What kind of infrastructure is required to support
not only big data management and analysis but also knowledge discovery, sharing
and management? What is the relationship between big data and science paradigm?
What is the nature and fundamental challenge of big data computing? A
multi-dimensional perspective is presented toward a methodology of big data
computing.Comment: 59 page
- …