14 research outputs found
Combining exposure indicators and predictive analytics for threats detection in real industrial IoT sensor networks
We present a framework able to combine exposure indicators and predictive analytics using AI-tools and big data architectures for threats detection inside a real industrial IoT sensors network. The described framework, able to fill the gaps between these two worlds, provides mechanisms to internally assess and evaluate products, services and share results without disclosing any sensitive and private information. We analyze the actual state of the art and a possible future research on top of a real case scenario implemented into a technological platform being developed under the H2020 ECHO project, for sharing and evaluating cybersecurity relevant informations, increasing trust and transparency among different stakeholders
Network Anomaly Detection by Means of Machine Learning: Random Forest Approach with Apache Spark
Nowadays the network security is a crucial issue and traditional intrusion detection systems are not a sufficient way. Hence the intelligent detection systems should have a major role in network security by taking into consideration to process the network big data and predict the anomalies behavior as fast as possible. In this paper, we implemented a well-known supervised algorithm Random Forest Classifier with Apache Spark on NSL-KDD dataset provided by the University of New Brunswick with the accuracy of 78.69% and 35.2% false negative ratio. Empirical results show this approach is well in order to use for intrusion detection system as well as we seeking the best number of trees to be used on Random Forest Classifier for getting higher accuracy and lower cost for the intrusion detection system
model checking for data anomaly detection
Abstract Data tipically evolve according to specific processes, with the consequent possibility to identify a profile of evolution: the values it may assume, the frequencies at which it changes, the temporal variation in relation to other data, or other constraints that are directly connected to the reference domain. A violation of these conditions could be the signal of different menaces that threat the system, as well as: attempts of a tampering or a cyber attack, a failure in the system operation, a bug in the applications which manage the life cycle of data. To detect such violations is not straightforward as processes could be unknown or hard to extract. In this paper we propose an approach to detect data anomalies. We represent data user behaviours in terms of labelled transition systems and through the model checking techniques we demonstrate the proposed modeling can be exploited to successfully detect data anomalies
Lessons learned from challenging data science case studies
In this chapter, we revisit the conclusions and lessons learned of the chapters presented in Part II of this book and analyze them systematically. The goal of the chapter is threefold: firstly, it serves as a directory to the individual chapters, allowing readers to identify which chapters to focus on when they are interested either in a certain stage of the knowledge discovery process or in a certain data science method or application area. Secondly, the chapter serves as a digested, systematic summary of data science lessons that are relevant for data science practitioners. And lastly, we reflect on the perceptions of a broader public towards the methods and tools that we covered in this book and dare to give an outlook towards the future developments that will be influenced by them
Penggunaan Metode Backpropagation Pada Jaringan Syaraf Tiruan Untuk Intrusion Detection System
Convenience is the main goal of existence of technology this days, but over time these technological advancements have made user privacy increasingly difficult and become a cause of concern. The existence of IDS (Intrusion Detection System ) is believed to help in achieving security in network usage where this detection system work by observing abnormal network behavior. IDS emphasizes the need to use artificial neural network to detect these attack. From the research tht has been done, the use of artificial neural network with a learning rate of 0.1 has been applied and the KDDCup-99 dataset has been used to train and make network benchmarks. To conduct a comparison, training has also been carried out on the same dataset using several other learning algorithms. The number of layer used in this artificial neural network start from 1 to 5, and the result have been compared so that the conclusion said that the artificial neural network with 3 layers has a superior performance compare to other machine learning algorithms.DOI : 10.29408/jit.v3i2.231
DQSOps: Data Quality Scoring Operations Framework for Data-Driven Applications
Data quality assessment has become a prominent component in the successful
execution of complex data-driven artificial intelligence (AI) software systems.
In practice, real-world applications generate huge volumes of data at speeds.
These data streams require analysis and preprocessing before being permanently
stored or used in a learning task. Therefore, significant attention has been
paid to the systematic management and construction of high-quality datasets.
Nevertheless, managing voluminous and high-velocity data streams is usually
performed manually (i.e. offline), making it an impractical strategy in
production environments. To address this challenge, DataOps has emerged to
achieve life-cycle automation of data processes using DevOps principles.
However, determining the data quality based on a fitness scale constitutes a
complex task within the framework of DataOps. This paper presents a novel Data
Quality Scoring Operations (DQSOps) framework that yields a quality score for
production data in DataOps workflows. The framework incorporates two scoring
approaches, an ML prediction-based approach that predicts the data quality
score and a standard-based approach that periodically produces the ground-truth
scores based on assessing several data quality dimensions. We deploy the DQSOps
framework in a real-world industrial use case. The results show that DQSOps
achieves significant computational speedup rates compared to the conventional
approach of data quality scoring while maintaining high prediction performance.Comment: 10 Pages The International Conference on Evaluation and Assessment in
Software Engineering (EASE) conferenc
Hoeffding Tree Algorithms for Anomaly Detection in Streaming Datasets: A Survey
This survey aims to deliver an extensive and well-constructed overview of using machine learning for the problem of detecting anomalies in streaming datasets. The objective is to provide the effectiveness of using Hoeffding Trees as a machine learning algorithm solution for the problem of detecting anomalies in streaming cyber datasets. In this survey we categorize the existing research works of Hoeffding Trees which can be feasible for this type of study into the following: surveying distributed Hoeffding Trees, surveying ensembles of Hoeffding Trees and surveying existing techniques using Hoeffding Trees for anomaly detection. These categories are referred to as compositions within this paper and were selected based on their relation to streaming data and the flexibility of their techniques for use within different domains of streaming data. We discuss the relevance of how combining the techniques of the proposed research works within these compositions can be used to address the anomaly detection problem in streaming cyber datasets. The goal is to show how a combination of techniques from different compositions can solve a prominent problem, anomaly detection
Continuous Outlier Mining of Streaming Data in Flink
In this work, we focus on distance-based outliers in a metric space, where
the status of an entity as to whether it is an outlier is based on the number
of other entities in its neighborhood. In recent years, several solutions have
tackled the problem of distance-based outliers in data streams, where outliers
must be mined continuously as new elements become available. An interesting
research problem is to combine the streaming environment with massively
parallel systems to provide scalable streambased algorithms. However, none of
the previously proposed techniques refer to a massively parallel setting. Our
proposal fills this gap and investigates the challenges in transferring
state-of-the-art techniques to Apache Flink, a modern platform for intensive
streaming analytics. We thoroughly present the technical challenges encountered
and the alternatives that may be applied. We show speed-ups of up to 117 (resp.
2076) times over a naive parallel (resp. non-parallel) solution in Flink, by
using just an ordinary four-core machine and a real-world dataset. When moving
to a three-machine cluster, due to less contention, we manage to achieve both
better scalability in terms of the window slide size and the data
dimensionality, and even higher speed-ups, e.g., by a factor of 510. Overall,
our results demonstrate that oulier mining can be achieved in an efficient and
scalable manner. The resulting techniques have been made publicly available as
open-source software
Real-time big data processing for anomaly detection : a survey
The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed. © 2018 Elsevier Lt