1,071 research outputs found
Real-time big data processing for anomaly detection : a survey
The advent of connected devices and omnipresence of Internet have paved way for intruders to attack networks, which leads to cyber-attack, financial loss, information theft in healthcare, and cyber war. Hence, network security analytics has become an important area of concern and has gained intensive attention among researchers, off late, specifically in the domain of anomaly detection in network, which is considered crucial for network security. However, preliminary investigations have revealed that the existing approaches to detect anomalies in network are not effective enough, particularly to detect them in real time. The reason for the inefficacy of current approaches is mainly due the amassment of massive volumes of data though the connected devices. Therefore, it is crucial to propose a framework that effectively handles real time big data processing and detect anomalies in networks. In this regard, this paper attempts to address the issue of detecting anomalies in real time. Respectively, this paper has surveyed the state-of-the-art real-time big data processing technologies related to anomaly detection and the vital characteristics of associated machine learning algorithms. This paper begins with the explanation of essential contexts and taxonomy of real-time big data processing, anomalous detection, and machine learning algorithms, followed by the review of big data processing technologies. Finally, the identified research challenges of real-time big data processing in anomaly detection are discussed. © 2018 Elsevier Lt
Street Smart in 5G : Vehicular Applications, Communication, and Computing
Recent advances in information technology have revolutionized the automotive industry, paving the way for next-generation smart vehicular mobility. Specifically, vehicles, roadside units, and other road users can collaborate to deliver novel services and applications that leverage, for example, big vehicular data and machine learning. Relatedly, fifth-generation cellular networks (5G) are being developed and deployed for low-latency, high-reliability, and high bandwidth communications. While 5G adjacent technologies such as edge computing allow for data offloading and computation at the edge of the network thus ensuring even lower latency and context-awareness. Overall, these developments provide a rich ecosystem for the evolution of vehicular applications, communications, and computing. Therefore in this work, we aim at providing a comprehensive overview of the state of research on vehicular computing in the emerging age of 5G and big data. In particular, this paper highlights several vehicular applications, investigates their requirements, details the enabling communication technologies and computing paradigms, and studies data analytics pipelines and the integration of these enabling technologies in response to application requirements.Peer reviewe
System Support For Stream Processing In Collaborative Cloud-Edge Environment
Stream processing is a critical technique to process huge amount of data in real-time manner.
Cloud computing has been used for stream processing due to its unlimited computation
resources. At the same time, we are entering the era of Internet of Everything (IoE). The emerging
edge computing benefits low-latency applications by leveraging computation resources at
the proximity of data sources. Billions of sensors and actuators are being deployed worldwide
and huge amount of data generated by things are immersed in our daily life. It has become
essential for organizations to be able to stream and analyze data, and provide low-latency analytics
on streaming data. However, cloud computing is inefficient to process all data in a centralized
environment in terms of the network bandwidth cost and response latency. Although
edge computing offloads computation from the cloud to the edge of the Internet, there is not
a data sharing and processing framework that efficiently utilizes computation resources in the
cloud and the edge. Furthermore, the heterogeneity of edge devices brings more difficulty to the development of collaborative cloud-edge applications.
To explore and attack the challenges of stream processing system in collaborative cloudedge
environment, in this dissertation we design and develop a series of systems to support
stream processing applications in hybrid cloud-edge analytics. Specifically, we develop an
hierarchical and hybrid outlier detection model for multivariate time series streams that automatically
selects the best model for different time series. We optimize one of the stream
processing system (i.e., Spark Streaming) to reduce the end-to-end latency. To facilitate the
development of collaborative cloud-edge applications, we propose and implement a new computing
framework, Firework that allows stakeholders to share and process data by leveraging
both the cloud and the edge. A vision-based cloud-edge application is implemented to demonstrate
the capabilities of Firework. By combining all these studies, we provide comprehensive
system support for stream processing in collaborative cloud-edge environment
Designing the next generation intelligent transportation sensor system using big data driven machine learning techniques
Accurate traffic data collection is essential for supporting advanced traffic management system operations. This study investigated a large-scale data-driven sequential traffic sensor health monitoring (TSHM) module that can be used to monitor sensor health conditions over large traffic networks. Our proposed module consists of three sequential steps for detecting different types of abnormal sensor issues. The first step detects sensors with abnormally high missing data rates, while the second step uses clustering anomaly detection to detect sensors reporting abnormal records. The final step introduces a novel Bayesian changepoint modeling technique to detect sensors reporting abnormal traffic data fluctuations by assuming a constant vehicle length distribution based on average effective vehicle length (AEVL). Our proposed method is then compared with two benchmark algorithms to show its efficacy. Results obtained by applying our method to the statewide traffic sensor data of Iowa show it can successfully detect different classes of sensor issues. This demonstrates that sequential TSHM modules can help transportation agencies determine traffic sensors’ exact problems, thereby enabling them to take the required corrective steps.
The second research objective will focus on the traffic data imputation after we discard the anomaly/missing data collected from failure traffic sensors. Sufficient high-quality traffic data are a crucial component of various Intelligent Transportation System (ITS) applications and research related to congestion prediction, speed prediction, incident detection, and other traffic operation tasks. Nonetheless, missing traffic data are a common issue in sensor data which is inevitable due to several reasons, such as malfunctioning, poor maintenance or calibration, and intermittent communications. Such missing data issues often make data analysis and decision-making complicated and challenging. In this study, we have developed a generative adversarial network (GAN) based traffic sensor data imputation framework (TSDIGAN) to efficiently reconstruct the missing data by generating realistic synthetic data. In recent years, GANs have shown impressive success in image data generation. However, generating traffic data by taking advantage of GAN based modeling is a challenging task, since traffic data have strong time dependency. To address this problem, we propose a novel time-dependent encoding method called the Gramian Angular Summation Field (GASF) that converts the problem of traffic time-series data generation into that of image generation. We have evaluated and tested our proposed model using the benchmark dataset provided by Caltrans Performance Management Systems (PeMS). This study shows that the proposed model can significantly improve the traffic data imputation accuracy in terms of Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) compared to state-of-the-art models on the benchmark dataset. Further, the model achieves reasonably high accuracy in imputation tasks even under a very high missing data rate (\u3e50%), which shows the robustness and efficiency of the proposed model.
Besides the loop and radar sensors, traffic cameras have shown great ability to provide insightful traffic information using the image and video processing techniques. Therefore, the third and final part of this work aimed to introduce an end to end real-time cloud-enabled traffic video analysis (IVA) framework to support the development of the future smart city. As Artificial intelligence (AI) growing rapidly, Computer vision (CV) techniques are expected to significantly improve the development of intelligent transportation systems (ITS), which are anticipated to be a key component of future Smart City (SC) frameworks. Powered by computer vision techniques, the converting of existing traffic cameras into connected ``smart sensors called intelligent video analysis (IVA) systems has shown the great capability of producing insightful data to support ITS applications. However, developing such IVA systems for large-scale, real-time application deserves further study, as the current research efforts are focused more on model effectiveness instead of model efficiency. Therefore, we have introduced a real-time, large-scale, cloud-enabled traffic video analysis framework using NVIDIA DeepStream, which is a streaming analysis toolkit for AI-based video and image analysis. In this study, we have evaluated the technical and economic feasibility of our proposed framework to help traffic agency to build IVA systems more efficiently. Our study shows that the daily operating cost for our proposed framework on Google Cloud Platform (GCP) is less than $0.14 per camera, and that, compared with manual inspections, our framework achieves an average vehicle-counting accuracy of 83.7% on sunny days
- …