9 research outputs found

    Supervised anomaly detection in uncertain pseudoperiodic data streams

    Get PDF
    Uncertain data streams have been widely generated in many Web applications. The uncertainty in data streams makes anomaly detection from sensor data streams far more challenging. In this paper, we present a novel framework that supports anomaly detection in uncertain data streams. The proposed framework adopts an efficient uncertainty pre-processing procedure to identify and eliminate uncertainties in data streams. Based on the corrected data streams, we develop effective period pattern recognition and feature extraction techniques to improve the computational efficiency. We use classification methods for anomaly detection in the corrected data stream. We also empirically show that the proposed approach shows a high accuracy of anomaly detection on a number of real datasets

    LSTM Learning with Bayesian and Gaussian Processing for Anomaly Detection in Industrial IoT

    Get PDF
    The data generated by millions of sensors in Industrial Internet of Things (IIoT) is extremely dynamic, heterogeneous, and large scale. It poses great challenges on the real-time analysis and decision making for anomaly detection in IIoT. In this paper, we propose a LSTM-Gauss-NBayes method, which is a synergy of the long short-term memory neural network (LSTM-NN) and the Gaussian Bayes model for outlier detection in IIoT. In a nutshell, the LSTM-NN builds model on normal time series. It detects outliers by utilising the predictive error for the Gaussian Naive Bayes model. Our method exploits advantages of both LSTM and Gaussian Naive Bayes models, which not only has strong prediction capability of LSTM for future time point data, but also achieves an excellent classification performance of Gaussian Naive Bayes model through the predictive error. Empirical studies demonstrate our solution outperforms the best-known competitors, which is a preferable choice for detecting anomalies

    A probabilistic method for emerging topic tracking in Microblog stream

    Get PDF

    PU-shapelets : Towards pattern-based positive unlabeled classification of time series

    Get PDF
    Real-world time series classification applications often involve positive unlabeled (PU) training data, where there are only a small set PL of positive labeled examples and a large set U of unlabeled ones. Most existing time series PU classification methods utilize all readings in the time series, making them sensitive to non-characteristic readings. Characteristic patterns named shapelets present a promising solution to this problem, yet discovering shapelets under PU settings is not easy. In this paper, we take on the challenging task of shapelet discovery with PU data. We propose a novel pattern ensemble technique utilizing both characteristic and non-characteristic patterns to rank U examples by their possibilities of being positive. We also present a novel stopping criterion to estimate the number of positive examples in U. These enable us to effectively label all U training examples and conduct supervised shapelet discovery. The shapelets are then used to build a one-nearest-neighbor classifier for online classification. Extensive experiments demonstrate the effectiveness of our method.Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics

    Cooperative Data Caching for Cloud Data Servers

    Get PDF
    Thanks to the advance of cloud computing technologies, users can access the data stored at cloud data centers at any time and from any where. However, the data centers are usually sparsely distributed over the Internet and are far away from end users. In this paper, we consider to construct a cache network by a large number of cache nodes close to the end users in order to minimize the data access delay.We firstly formulate the problem of placing the replicas of data items to cache nodes as a mixed integer programming (MIP) problem. Then, we proposed an efficient heuristic algorithm that allocates at least one replica of each data item in the cache network and attempt to allocate more data items so as to minimize the total data access cost. The simulation results show that our proposed algorithm behaves much better than a well-known LRU algorithm and the computation complexity is limited

    Distributed Contextual Anomaly Detection from Big Event Streams

    Get PDF
    The age of big digital data is emerged and the size of generating data is rapidly increasing in a millisecond through the Internet of Things (IoT) and Internet of Everything (IoE) objects. Specifically, most of today’s available data are generated in a form of streams through different applications including sensor networks, bioinformatics, smart airport, smart highway traffic, smart home applications, e-commerce online shopping, and social media streams. In this context, processing and mining such high volume of data stream becomes one of the research priority concern and challenging tasks. On the one hand, processing high volumes of streaming data with low-latency response is a critical concern in most of the real-time application before the important information can be missed or disregarded. On the other hand, detecting events from data stream is becoming a new research challenging task since the existing traditional anomaly detection method is mainly focusing on; a) limited size of data, b) centralised detection with limited computing resource, and c) specific anomaly detection types of either point or collective rather than the Contextual behaviour of the data. Thus, detecting Contextual events from high sequence volume of data stream is one of the research concerns to be addressed in this thesis. As the size of IoT data stream is scaled up to a high volume, it is impractical to propose existing processing data structure and anomaly detection method. This is due to the space, time and the complexity of the existing data processing model and learning algorithms. In this thesis, a novel distributed anomaly detection method and algorithm is proposed to detect Contextual behaviours from the sequence of bounded streams. Capturing event streams and partitioning them over several windows to control the high rate of event streams mainly base on, the proposed solution firstly. Secondly, by proposing a parallel and distributed algorithm to detect Contextual anomalous event. The experimental results are evaluated based on the algorithm’s performances, processing low-latency response, and detecting Contextual anomalous behaviour accuracy rate from the event streams. Finally, to address scalability concerned of the Contextual events, appropriate computational metrics are proposed to measure and evaluate the processing latency of distributed method. The achieved result is evidenced distributed detection is effective in terms of learning from high volumes of streams in real-time

    Concepts and Methods from Artificial Intelligence in Modern Information Systems – Contributions to Data-driven Decision-making and Business Processes

    Get PDF
    Today, organizations are facing a variety of challenging, technology-driven developments, three of the most notable ones being the surge in uncertain data, the emergence of unstructured data and a complex, dynamically changing environment. These developments require organizations to transform in order to stay competitive. Artificial Intelligence with its fields decision-making under uncertainty, natural language processing and planning offers valuable concepts and methods to address the developments. The dissertation at hand utilizes and furthers these contributions in three focal points to address research gaps in existing literature and to provide concrete concepts and methods for the support of organizations in the transformation and improvement of data-driven decision-making, business processes and business process management. In particular, the focal points are the assessment of data quality, the analysis of textual data and the automated planning of process models. In regard to data quality assessment, probability-based approaches for measuring consistency and identifying duplicates as well as requirements for data quality metrics are suggested. With respect to analysis of textual data, the dissertation proposes a topic modeling procedure to gain knowledge from CVs as well as a model based on sentiment analysis to explain ratings from customer reviews. Regarding automated planning of process models, concepts and algorithms for an automated construction of parallelizations in process models, an automated adaptation of process models and an automated construction of multi-actor process models are provided
    corecore