2,353 research outputs found

    Mining recurrent concepts in data streams using the discrete Fourier transform

    Get PDF
    In this research we address the problem of capturing recurring concepts in a data stream environment. Recurrence capture enables the re-use of previously learned classifiers without the need for re-learning while providing for better accuracy during the concept recurrence interval. We capture concepts by applying the Discrete Fourier Transform (DFT) to Decision Tree classifiers to obtain highly compressed versions of the trees at concept drift points in the stream and store such trees in a repository for future use. Our empirical results on real world and synthetic data exhibiting varying degrees of recurrence show that the Fourier compressed trees are more robust to noise and are able to capture recurring concepts with higher precision than a meta learning approach that chooses to re-use classifiers in their originally occurring form

    Use of Ensembles of Fourier Spectra in Capturing Recurrent Concepts in Data Streams

    Get PDF
    In this research, we apply ensembles of Fourier encoded spectra to capture and mine recurring concepts in a data stream environment. Previous research showed that compact versions of Decision Trees can be obtained by applying the Discrete Fourier Transform to accurately capture recurrent concepts in a data stream. However, in highly volatile environments where new concepts emerge often, the approach of encoding each concept in a separate spectrum is no longer viable due to memory overload and thus in this research we present an ensemble approach that addresses this problem. Our empirical results on real world data and synthetic data exhibiting varying degrees of recurrence reveal that the ensemble approach outperforms the single spectrum approach in terms of classification accuracy, memory and execution time

    Mining sequences in distributed sensors data for energy production.

    Get PDF
    Brief Overview of the Problem: The Environmental Protection Agency (EPA), a government funded agency, provides both legislative and judicial powers for emissions monitoring in the United States. The agency crafts laws based on self-made regulations to enforce companies to operate within the limits of the law resulting in environmentally safe operation. Specifically, power companies operate electric generating facilities under guidelines drawn-up and enforced by the EPA. Acid rain and other harmful factors require that electric generating facilities report hourly emissions recorded via a Supervisory Control and Data Acquisition (SCADA) system. SCADA is a control and reporting system that is present in all power plants consisting of sensors and control mechanisms that monitor all equipment within the plants. The data recorded by a SCADA system is collected by the EPA and allows them to enforce proper plant operation relating to emissions. This data includes a lot of generating unit and power plant specific details, including hourly generation. This hourly generation (termed grossunitload by the EPA) is the actual hourly average output of the generator on a per unit basis. The questions to be answered are do any of these units operate in tandem and do any of the units start, stop, or change operation as a result of another\u27s change in generation? These types of questions will be answered for the years of April 2002 through April 2003 for facilities that operate pipeline natural-gas-fired generating units. Purpose of Research The research conducted has dual uses if fruitful. First, the use of a local modeling between generating units would be highly profitable among energy traders. Betting that a plant will operate a unit based on another\u27s current characteristics would be sensationally profitable to energy traders. This profitability is variable due to fuel type. For instance, if the price of coal is extremely high due to shortages, the value of knowing a semioperating characteristic of two generating units is highly valuable. Second, this known characteristic can also be used in regulation and operational modeling. The second use is of great importance to government agencies. If regulatory committees can be aware of past (or current) similarities between power producers, they may be able to avoid a power struggle in a region caused by greedy traders or companies. Not considering profitable motives, the Department of Energy may use something similar to generate a model of power grid generation availability based on previous data for reliability purposes. Type of Problem: The problem tackled within this Master\u27s thesis is of multiple time series pattern recognition. This field is expansive and well studied, therefore the research performed will benefit from previously known techniques. The author has chosen to experiment with conventional techniques such as correlation, principal component analysis, and kmeans clustering for feature and eventually pattern extraction. For the primary analysis performed, the author chose to use a conventional sequence discovery algorithm. The sequence discovery algorithm has no prior knowledge of space limitations, therefore it searches over the entire space resulting in an expense but complete process. Prior to sequence discovery the author applies a uniform coding schema to the raw data, which is an adaption of a coding schema presented by Keogh. This coding and discovery process is deemed USD, or Uniform Sequence Discovery. The data is highly dimensional along with being extremely dynamic and sporadic with regards to magnitude. The energy market that demands power generation is profit and somewhat reliability driven. The obvious factors are more reliability based, for instance to keep system frequency at 60Hz, units may operate in an idle state resulting in a constant or very low value for a period of time (idle time). Also to avoid large frequency swings on the power grid, companies are require

    Multi-signal Anomaly Detection for Real-Time Embedded Systems

    Get PDF
    This thesis presents MuSADET, an anomaly detection framework targeting timing anomalies found in event traces from real-time embedded systems. The method leverages stationary event generators, signal processing, and distance metrics to classify inter-arrival time sequences as normal/anomalous. Experimental evaluation of traces collected from two real-time embedded systems provides empirical evidence of MuSADET’s anomaly detection performance. MuSADET is appropriate for embedded systems, where many event generators are intrinsically recurrent and generate stationary sequences of timestamp. To find timinganomalies, MuSADET compares the frequency domain features of an unknown trace to a normal model trained from well-behaved executions of the system. Each signal in the analysis trace receives a normal/anomalous score, which can help engineers isolate the source of the anomaly. Empirical evidence of anomaly detection performed on traces collected from an industrygrade hexacopter and the Controller Area Network (CAN) bus deployed in a real vehicle demonstrates the feasibility of the proposed method. In all case studies, anomaly detection did not require an anomaly model while achieving high detection rates. For some of the studied scenarios, the true positive detection rate goes above 99 %, with false-positive rates below one %. The visualization of classification scores shows that some timing anomalies can propagate to multiple signals within the system. Comparison to the similar method, Signal Processing for Trace Analysis (SiPTA), indicates that MuSADET is superior in detection performance and provides complementary information that can help link anomalies to the process where they occurred

    Periodic Pattern Mining a Algorithms and Applications

    Get PDF
    Owing to a large number of applications periodic pattern mining has been extensively studied for over a decade Periodic pattern is a pattern that repeats itself with a specific period in a give sequence Periodic patterns can be mined from datasets like biological sequences continuous and discrete time series data spatiotemporal data and social networks Periodic patterns are classified based on different criteria Periodic patterns are categorized as frequent periodic patterns and statistically significant patterns based on the frequency of occurrence Frequent periodic patterns are in turn classified as perfect and imperfect periodic patterns full and partial periodic patterns synchronous and asynchronous periodic patterns dense periodic patterns approximate periodic patterns This paper presents a survey of the state of art research on periodic pattern mining algorithms and their application areas A discussion of merits and demerits of these algorithms was given The paper also presents a brief overview of algorithms that can be applied for specific types of datasets like spatiotemporal data and social network

    TFAD: A Decomposition Time Series Anomaly Detection Architecture with Time-Frequency Analysis

    Full text link
    Time series anomaly detection is a challenging problem due to the complex temporal dependencies and the limited label data. Although some algorithms including both traditional and deep models have been proposed, most of them mainly focus on time-domain modeling, and do not fully utilize the information in the frequency domain of the time series data. In this paper, we propose a Time-Frequency analysis based time series Anomaly Detection model, or TFAD for short, to exploit both time and frequency domains for performance improvement. Besides, we incorporate time series decomposition and data augmentation mechanisms in the designed time-frequency architecture to further boost the abilities of performance and interpretability. Empirical studies on widely used benchmark datasets show that our approach obtains state-of-the-art performance in univariate and multivariate time series anomaly detection tasks. Code is provided at https://github.com/DAMO-DI-ML/CIKM22-TFAD.Comment: Accepted by the ACM International Conference on Information and Knowledge Management (CIKM 2022
    corecore