135 research outputs found

    A novel symbolization technique for time-series outlier detection

    Get PDF
    The detection of outliers in time series data is a core component of many data-mining applications and broadly applied in industrial applications. In large data sets algorithms that are efficient in both time and space are required. One area where speed and storage costs can be reduced is via symbolization as a pre-processing step, additionally opening up the use of an array of discrete algorithms. With this common pre-processing step in mind, this work highlights that (1) existing symbolization approaches are designed to address problems other than outlier detection and are hence sub-optimal and (2) use of off-the-shelf symbolization techniques can therefore lead to significant unnecessary data corruption and potential performance loss when outlier detection is a key aspect of the data mining task at hand. Addressing this a novel symbolization method is motivated specifically targeting the end use application of outlier detection. The method is empirically shown to outperform existing approaches

    A Deep Learning Framework for Generation and Analysis of Driving Scenario Trajectories

    Get PDF
    We propose a unified deep learning framework for the generation and analysis of driving scenario trajectories, and validate its effectiveness in a principled way. To model and generate scenarios of trajectories with different lengths, we develop two approaches. First, we adapt the Recurrent Conditional Generative Adversarial Networks (RC-GAN) by conditioning on the length of the trajectories. This provides us the flexibility to generate variable-length driving trajectories, a desirable feature for scenario test case generation in the verification of autonomous driving. Second, we develop an architecture based on Recurrent Autoencoder with GANs to obviate the variable length issue, wherein we train a GAN to learn/generate the latent representations of original trajectories. In this approach, we train an integrated feed-forward neural network to estimate the length of the trajectories to be able to bring them back from the latent space representation. In addition to trajectory generation, we employ the trained autoencoder as a feature extractor, for the purpose of clustering and anomaly detection, to obtain further insights into the collected scenario dataset. We experimentally investigate the performance of the proposed framework on real-world scenario trajectories obtained from in-field data collection

    DEVELOPMENT OF DIAGNOSTIC AND PROGNOSTIC METHODOLOGIES FOR ELECTRONIC SYSTEMS BASED ON MAHALANOBIS DISTANCE

    Get PDF
    Diagnostic and prognostic capabilities are one aspect of the many interrelated and complementary functions in the field of Prognostic and Health Management (PHM). These capabilities are sought after by industries in order to provide maximum operational availability of their products, maximum usage life, minimum periodic maintenance inspections, lower inventory cost, accurate tracking of part life, and no false alarms. Several challenges associated with the development and implementation of these capabilities are the consideration of a system's dynamic behavior under various operating environments; complex system architecture where the components that form the overall system have complex interactions with each other with feed-forward and feedback loops of instructions; the unavailability of failure precursors; unseen events; and the absence of unique mathematical techniques that can address fault and failure events in various multivariate systems. The Mahalanobis distance methodology distinguishes multivariable data groups in a multivariate system by a univariate distance measure calculated from the normalized value of performance parameters and their correlation coefficients. The Mahalanobis distance measure does not suffer from the scaling effect--a situation where the variability of one parameter masks the variability of another parameter, which happens when the measurement ranges or scales of two parameters are different. A literature review showed that the Mahalanobis distance has been used for classification purposes. In this thesis, the Mahalanobis distance measure is utilized for fault detection, fault isolation, degradation identification, and prognostics. For fault detection, a probabilistic approach is developed to establish threshold Mahalanobis distance, such that presence of a fault in a product can be identified and the product can be classified as healthy or unhealthy. A technique is presented to construct a control chart for Mahalanobis distance for detecting trends and biasness in system health or performance. An error function is defined to establish fault-specific threshold Mahalanobis distance. A fault isolation approach is developed to isolate faults by identifying parameters that are associated with that fault. This approach utilizes the design-of-experiment concept for calculating residual Mahalanobis distance for each parameter (i.e., the contribution of each parameter to a system's health determination). An expected contribution range for each parameter estimated from the distribution of residual Mahalanobis distance is used to isolate the parameters that are responsible for a system's anomalous behavior. A methodology to detect degradation in a system's health using a health indicator is developed. The health indicator is defined as the weighted sum of a histogram bin's fractional contribution. The histogram's optimal bin width is determined from the number of data points in a moving window. This moving window approach is utilized for progressive estimation of the health indicator over time. The health indicator is compared with a threshold value defined from the system's healthy data to indicate the system's health or performance degradation. A symbolic time series-based health assessment approach is developed. Prognostic measures are defined for detecting anomalies in a product and predicting a product's time and probability of approaching a faulty condition. These measures are computed from a hidden Markov model developed from the symbolic representation of product dynamics. The symbolic representation of a product's dynamics is obtained by representing a Mahalanobis distance time series in symbolic form. Case studies were performed to demonstrate the capability of the proposed methodology for real time health monitoring. Notebook computers were exposed to a set of environmental conditions representative of the extremes of their life cycle profiles. The performance parameters were monitored in situ during the experiments, and the resulting data were used as a training dataset. The dataset was also used to identify specific parameter behavior, estimate correlation among parameters, and extract features for defining a healthy baseline. Field-returned computer data and data corresponding to artificially injected faults in computers were used as test data

    Features Extraction from Time Series

    Get PDF
    Time series can be found in various domains like medicine, engineering, and finance. Generally speaking, a time series is a sequence of data that represents recorded values of a phenomenon over time. This thesis studies time series mining, including transformation and distance measure, anomaly or anomalies detection, clustering and remaining useful life estimation. In the course of the first mining task (transformation and distance measure), in order to increase the accuracy of distance measure between transformed series (symbolic series), we introduce a novel calculation of distance between symbols. By integrating this newly defined method to symbolic aggregate approximation and its extensions, the experimental results show this proposed method is promising. During the process of the second mining task (anomaly or anomalies detection), for the purpose of improving the accuracy of anomaly or anomalies detection, we propose a distance measure method and an anomalies detection calculation. These proposed methods, together with previous published anomaly detection methods, are applied to real ECG data selected from MIT-BIH database. The experimental results show that our proposed outperforms other methods. During the course of the third mining task (clustering), we present an automatic clustering method, called AT-means, which can automatically carry out clustering for a given time series dataset: from the calculation of global average time series to the setting of initial centres and the determination of the number of clusters. The performance of the proposed method was tested on 10 benchmark time series datasets obtained from UCR database. For comparison, the K-means method with three different conditions are also applied to the same datasets. The experimental results show the proposed method outperforms the compared K-means approaches. During the process of the fourth mining task (remaining useful life estimation), all the original data are transformed into low-dimensional space through principal components analysis. We then proposed a novel multidimensional time series distance measure method, called as multivariate time series warping distance (MTWD), for remaining useful life estimation. This whole process is tested on the CMAPSS (Commercial Modular Aero Propulsion System Simulation) datasets and the performance is compared with two existing methods. The experimental results show that the estimated remaining useful life (RUL) values are closer to real RUL values when compared with the comparison methods. Our work contributes to the time series mining by introducing novel approaches to distance measure, anomalies detection, clustering and RUL estimation. We furthermore apply our proposed methods and related methods to benchmark datasets. The experimental results show that our methods are better than previously published methods in terms of accuracy and efficiency
    • …
    corecore