Search CORE

174,252 research outputs found

Learning to Detect: A Data-driven Approach for Network Intrusion Detection

Author: Jiang Yushan
Song Houbing
Tauscher Zachary
Wang Jian
Zhang Kai
Publication venue: Scholarly Commons
Publication date: 18/08/2021
Field of study

With massive data being generated daily and the ever-increasing interconnectivity of the world’s Internet infrastructures, a machine learning based intrusion detection system (IDS) has become a vital component to protect our economic and national security. In this paper, we perform a comprehensive study on NSL-KDD, a network traffic dataset, by visualizing patterns and employing different learning-based models to detect cyber attacks. Unlike previous shallow learning and deep learning models that use the single learning model approach for intrusion detection, we adopt a hierarchy strategy, in which the intrusion and normal behavior are classified firstly, and then the specific types of attacks are classified. We demonstrate the advantage of the unsupervised representation learning model in binary intrusion detection tasks. Besides, we alleviate the data imbalance problem with SVM-SMOTE oversampling technique in 4-class classification and further demonstrate the effectiveness and the drawback of the oversampling mechanism with a deep neural network as a base model. Index Terms—Intrusio

Embry-Riddle Aeronautical University

A Machine Learning Enhanced Scheme for Intelligent Network Management

Author: Zuo Y
Publication venue: 'Division of Chemical Information and Computer Sciences'
Publication date: 25/11/2019
Field of study

The versatile networking services bring about huge influence on daily living styles while the amount and diversity of services cause high complexity of network systems. The network scale and complexity grow with the increasing infrastructure apparatuses, networking function, networking slices, and underlying architecture evolution. The conventional way is manual administration to maintain the large and complex platform, which makes effective and insightful management troublesome. A feasible and promising scheme is to extract insightful information from largely produced network data. The goal of this thesis is to use learning-based algorithms inspired by machine learning communities to discover valuable knowledge from substantial network data, which directly promotes intelligent management and maintenance. In the thesis, the management and maintenance focus on two schemes: network anomalies detection and root causes localization; critical traffic resource control and optimization. Firstly, the abundant network data wrap up informative messages but its heterogeneity and perplexity make diagnosis challenging. For unstructured logs, abstract and formatted log templates are extracted to regulate log records. An in-depth analysis framework based on heterogeneous data is proposed in order to detect the occurrence of faults and anomalies. It employs representation learning methods to map unstructured data into numerical features, and fuses the extracted feature for network anomaly and fault detection. The representation learning makes use of word2vec-based embedding technologies for semantic expression. Next, the fault and anomaly detection solely unveils the occurrence of events while failing to figure out the root causes for useful administration so that the fault localization opens a gate to narrow down the source of systematic anomalies. The extracted features are formed as the anomaly degree coupled with an importance ranking method to highlight the locations of anomalies in network systems. Two types of ranking modes are instantiated by PageRank and operation errors for jointly highlighting latent issue of locations. Besides the fault and anomaly detection, network traffic engineering deals with network communication and computation resource to optimize data traffic transferring efficiency. Especially when network traffic are constrained with communication conditions, a pro-active path planning scheme is helpful for efficient traffic controlling actions. Then a learning-based traffic planning algorithm is proposed based on sequence-to-sequence model to discover hidden reasonable paths from abundant traffic history data over the Software Defined Network architecture. Finally, traffic engineering merely based on empirical data is likely to result in stale and sub-optimal solutions, even ending up with worse situations. A resilient mechanism is required to adapt network flows based on context into a dynamic environment. Thus, a reinforcement learning-based scheme is put forward for dynamic data forwarding considering network resource status, which explicitly presents a promising performance improvement. In the end, the proposed anomaly processing framework strengthens the analysis and diagnosis for network system administrators through synthesized fault detection and root cause localization. The learning-based traffic engineering stimulates networking flow management via experienced data and further shows a promising direction of flexible traffic adjustment for ever-changing environments

Open Research Exeter

A Hybrid CNN-LSTM Model for Traffic Accident Frequency Forecasting During the Tourist Season

Author: Büyükgökoğlan Erdem
Uğuz Sinan
Publication venue: 'Mechanical Engineering Faculty in Slavonski Brod'
Publication date: 01/01/2022
Field of study

Population density in major tourist centers of the world increases significantly during the tourist season. Estimating the frequency of traffic accidents during the upcoming tourist season is of particular interest to many stakeholders, such as local governments. The objective of this study is to propose a hybrid deep learning model, based on convolutional neural network (CNN) and long short term memory (LSTM) models to predict the frequency of traffic accidents during the tourism season. The dataset used in the study includes daily frequencies of traffic accidents with fatalities and injuries that occurred in Antalya between January 2012 and December 2017. In the next phase of the study, seasonal autoregressive integrated moving average (SARIMA), Facebook prophet and deep learning methods including LSTM and the proposed Hybrid CNN-LSTM were tested to predict traffic accident frequencies in Antalya. The experimental results show that the root mean square error (RMSE) of the proposed model is less than 2480, 13266 and 186 compared to SARIMA, prophet and LSTM models, respectively. Also, the R-squared value of the proposed model is greater than 0.016, 0.103 and 0.001 compared to SARIMA, prophet and LSTM models, respectively. It is clear that the proposed hybrid CNN-LSTM model was more successful in predicting traffic accidents when compared to the other models

HRČAK - Portal of Croatian Scientific and Professional Journals

Hrčak - Portal of scientific journals of Croatia

Applied deep learning in intelligent transportation systems and embedding exploration

Author: Liang Xiaoyuan
Publication venue: Digital Commons @ NJIT
Publication date: 31/08/2019
Field of study

Deep learning techniques have achieved tremendous success in many real applications in recent years and show their great potential in many areas including transportation. Even though transportation becomes increasingly indispensable in people’s daily life, its related problems, such as traffic congestion and energy waste, have not been completely solved, yet some problems have become even more critical. This dissertation focuses on solving the following fundamental problems: (1) passenger demand prediction, (2) transportation mode detection, (3) traffic light control, in the transportation field using deep learning. The dissertation also extends the application of deep learning to an embedding system for visualization and data retrieval. The first part of this dissertation is about a Spatio-TEmporal Fuzzy neural Network (STEF-Net) which accurately predicts passenger demand by incorporating the complex interaction of all known important factors, such as temporal, spatial and external information. Specifically, a convolutional long short-term memory network is employed to simultaneously capture spatio-temporal feature interaction, and a fuzzy neural network to model external factors. A novel feature fusion method with convolution and an attention layer is proposed to keep the temporal relation and discriminative spatio-temporal feature interaction. Experiments on a large-scale real-world dataset show the proposed model outperforms the state-of-the-art approaches. The second part is a light-weight and energy-efficient system which detects transportation modes using only accelerometer sensors in smartphones. Understanding people’s transportation modes is beneficial to many civilian applications, such as urban transportation planning. The system collects accelerometer data in an efficient way and leverages a convolutional neural network to determine transportation modes. Different architectures and classification methods are tested with the proposed convolutional neural network to optimize the system design. Performance evaluation shows that the proposed approach achieves better accuracy than existing work in detecting people’s transportation modes. The third component of this dissertation is a deep reinforcement learning model, based on Q learning, to control the traffic light. Existing inefficient traffic light control causes numerous problems, such as long delay and waste of energy. In the proposed model, the complex traffic scenario is quantified as states by collecting data and dividing the whole intersection into grids. The timing changes of a traffic light are the actions, which are modeled as a high-dimension Markov decision process. The reward is the cumulative waiting time difference between two cycles. To solve the model, a convolutional neural network is employed to map states to rewards, which is further optimized by several components, such as dueling network, target network, double Q-learning network, and prioritized experience replay. The simulation results in Simulation of Urban MObility (SUMO) show the efficiency of the proposed model in controlling traffic lights. The last part of this dissertation studies the hierarchical structure in an embedding system. Traditional embedding approaches associate a real-valued embedding vector with each symbol or data point, which generates storage-inefficient representation and fails to effectively encode the internal semantic structure of data. A regularized autoencoder framework is proposed to learn compact Hierarchical K-way D-dimensional (HKD) discrete embedding of data points, aiming at capturing semantic structures of data. Experimental results on synthetic and real-world datasets show that the proposed HKD embedding can effectively reveal the semantic structure of data via visualization and greatly reduce the search space of nearest neighbor retrieval while preserving high accuracy

Digital Commons @ New Jersey Institute of Technology (NJIT)

Data mining approach for predicting the daily Internet data traffic of a smart university

Author: Abolade Jeremiah O.
Adekitan Aderibigbe I.
Shobayo Olamilekan
Publication venue
Publication date: 01/01/2019
Field of study

Internet traffic measurement and analysis generate dataset that are indicators of usage trends, and such dataset can be used for traffic prediction via various statistical analyses. In this study, an extensive analysis was carried out on the daily internet traffic data generated from January to December, 2017 in a smart university in Nigeria. The dataset analysed contains seven key features: the month, the week, the day of the week, the daily IP traffic for the previous day, the average daily IP traffic for the two previous days,the traffic status classification (TSC) for the download and the TSC for the upload internet traffic data. The data mining analysis was performed using four learning algorithms: the Decision Tree, the Tree Ensemble, the Random Forest, and the Naïve Bayes Algorithm on KNIME (Konstanz Information Miner) data mining application and kNN, Neural Network, Random Forest, Naïve Bayes and CN2 Rule Inducer algorithms on the Orange platform. A comparative performance analysis for the models is presented using the confusion matrix, Cohen’s Kappa value, the accuracy of each model, Area under ROC Curve, etc. A minimum accuracy of 55.66% was observed for both the upload and the download IP data on the KNIME platform while minimum accuracies of 57.3% and 51.4% respectively were observed on the Orange platform

Covenant University Repository

Directory of Open Access Journals

Sheffield Hallam University Research Archive

An Approach for Optimizing Resource Allocation and Usage in Cloud Computing Systems by Predicting Traffic Flow

Author: Malele Vusumuzi
Sekwatlakwatla Sello Prince
Publication venue: 'Escuela Politecnica Nacional'
Publication date: 08/01/2024
Field of study

The cloud provides computing resources as a service (scalable and cost-effective storage, management, and accessibility of data and applications) through the Internet. Even though cloud computing offers many opportunities for ICT (information and communication technology), many issues still remain, and the increasing demand for resource management and traffic flow is also becoming increasingly problematic. The amount of data in the cloud computing environment is increasing on a daily basis, which increases data traffic flow. Due to this problem, clients complained about the network speed. Autoregressive Integrated Moving Average (ARIMA), Monte Carlo, Extreme gradient boosting regression (XGBoost), is used in this paper for predicting traffic flow. A Monte Carlo prediction of 84% outperformed ARIMA's prediction of 79.8% and XGBoost's prediction of 71.5%, indicating that Monte Carlo is more accurate than other models when predicting traffic flow in organizational cloud computing systems. A machine learning model will be used for future studies, along with hourly monitoring and resource allocation.The cloud provides computing resources as a service (scalable and cost-effective storage, management, and accessibility of data and applications) through the Internet. Even though cloud computing offers many opportunities for ICT (information and communication technology), many issues still remain, and the increasing demand for resource management and traffic flow is also becoming increasingly problematic. The amount of data in the cloud computing environment is increasing on a daily basis, which increases data traffic flow. Due to this problem, clients complained about the network speed. Autoregressive Integrated Moving Average (ARIMA), Monte Carlo, Extreme gradient boosting regression (XGBoost), is used in this paper for predicting traffic flow. A Monte Carlo prediction of 84% outperformed ARIMA's prediction of 79.8% and XGBoost's prediction of 71.5%, indicating that Monte Carlo is more accurate than other models when predicting traffic flow in organizational cloud computing systems. A machine learning model will be used for future studies, along with hourly monitoring and resource allocation

Latin American Journal of Computing

On the Relation Between Mobile Encounters and Web Traffic Patterns: A Data-driven Study

Author: Alipour Babak
Helmy Ahmed
Qathrady Mimonah Al
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/11/2018
Field of study

Mobility and network traffic have been traditionally studied separately. Their interaction is vital for generations of future mobile services and effective caching, but has not been studied in depth with real-world big data. In this paper, we characterize mobility encounters and study the correlation between encounters and web traffic profiles using large-scale datasets (30TB in size) of WiFi and NetFlow traces. The analysis quantifies these correlations for the first time, across spatio-temporal dimensions, for device types grouped into on-the-go Flutes and sit-to-use Cellos. The results consistently show a clear relation between mobility encounters and traffic across different buildings over multiple days, with encountered pairs showing higher traffic similarity than non-encountered pairs, and long encounters being associated with the highest similarity. We also investigate the feasibility of learning encounters through web traffic profiles, with implications for dissemination protocols, and contact tracing. This provides a compelling case to integrate both mobility and web traffic dimensions in future models, not only at an individual level, but also at pairwise and collective levels. We have released samples of code and data used in this study on GitHub, to support reproducibility and encourage further research (https://github.com/BabakAp/encounter-traffic).Comment: Technical report with details for conference paper at MSWiM 2018, v3 adds GitHub lin

arXiv.org e-Print Archive

Crossref