11,398 research outputs found

    Information theoretic novelty detection

    Get PDF
    We present a novel approach to online change detection problems when the training sample size is small. The proposed approach is based on estimating the expected information content of a new data point and allows an accurate control of the false positive rate even for small data sets. In the case of the Gaussian distribution, our approach is analytically tractable and closely related to classical statistical tests. We then propose an approximation scheme to extend our approach to the case of the mixture of Gaussians. We evaluate extensively our approach on synthetic data and on three real benchmark data sets. The experimental validation shows that our method maintains a good overall accuracy, but significantly improves the control over the false positive rate

    Kernel Ellipsoidal Trimming

    No full text
    Ellipsoid estimation is an issue of primary importance in many practical areas such as control, system identification, visual/audio tracking, experimental design, data mining, robust statistics and novelty/outlier detection. This paper presents a new method of kernel information matrix ellipsoid estimation (KIMEE) that finds an ellipsoid in a kernel defined feature space based on a centered information matrix. Although the method is very general and can be applied to many of the aforementioned problems, the main focus in this paper is the problem of novelty or outlier detection associated with fault detection. A simple iterative algorithm based on Titterington's minimum volume ellipsoid method is proposed for practical implementation. The KIMEE method demonstrates very good performance on a set of real-life and simulated datasets compared with support vector machine methods

    Producing PID controllers for testing clustering - Investigating novelty detection for use in classifying PID parameters

    Get PDF
    PID controllers performance depend on how they are tuned. Tuning a controller is not easy either and many use their experience and intuition, or automatic software for tuning. We present a way to test the quality of controllers using statistics. The method uses multivariate extreme value statistics with novelty detection. With the analyser presented in this paper one can compare fresh PID parameters to those that have been tuned well. This tool can help in troubleshooting with PID controller tuning. Conventional novelty detection methods use a Gaussian mixture model, the analyser here uses a variational mixture model instead. This made the fitting process easier for the user. Part of this work was to create PID parameter configurations to test the analyser with. We needed both well tuned and poorly tuned parameters for testing the algorithm, as well as several examples of both cases. A genetic algorithm was seen as a tool that would meet these requirements. Genetic algorithms have previously been used for both test parameters generation and PID controller tuning in many applications. The genetic algorithm was written in Matlab. The reason for using Matlab is that the genetic algorithm uses a Simulink model of a PID control process in its fitness function. The parameters were simulated and plots of their step response were drawn. The best configurations according to the genetic algorithm had little error compared to the reference value. The error seemed to rise according to the index of goodness used by the genetic algorithm. We set three criterions on the parameters: maximum overshoot, settling time, and sum of absolute error. Each of these criterions had a threshold. Each parameter configuration that crossed at least one of these thresholds were classed abnormal. The performance of the analyser was assessed with these parameters. The analyser were first trained with a set of normal parameters, then tested with a set of normal and a set of abnormal parameters. The results showed 2 false alarms in both cases out of 104 possible. This gave us an accuracy of 98%, which is a very high one for a novelty detection method.fi=OpinnƤytetyƶ kokotekstinƤ PDF-muodossa.|en=Thesis fulltext in PDF format.|sv=LƤrdomsprov tillgƤngligt som fulltext i PDF-format

    A survey of outlier detection methodologies

    Get PDF
    Outlier detection has been used for centuries to detect and, where appropriate, remove anomalous observations from data. Outliers arise due to mechanical faults, changes in system behaviour, fraudulent behaviour, human error, instrument error or simply through natural deviations in populations. Their detection can identify system faults and fraud before they escalate with potentially catastrophic consequences. It can identify errors and remove their contaminating effect on the data set and as such to purify the data for processing. The original outlier detection methods were arbitrary but now, principled and systematic techniques are used, drawn from the full gamut of Computer Science and Statistics. In this paper, we introduce a survey of contemporary techniques for outlier detection. We identify their respective motivations and distinguish their advantages and disadvantages in a comparative review

    Anomaly Detection in Multivariate Non-stationary Time Series for Automatic DBMS Diagnosis

    Full text link
    Anomaly detection in database management systems (DBMSs) is difficult because of increasing number of statistics (stat) and event metrics in big data system. In this paper, I propose an automatic DBMS diagnosis system that detects anomaly periods with abnormal DB stat metrics and finds causal events in the periods. Reconstruction error from deep autoencoder and statistical process control approach are applied to detect time period with anomalies. Related events are found using time series similarity measures between events and abnormal stat metrics. After training deep autoencoder with DBMS metric data, efficacy of anomaly detection is investigated from other DBMSs containing anomalies. Experiment results show effectiveness of proposed model, especially, batch temporal normalization layer. Proposed model is used for publishing automatic DBMS diagnosis reports in order to determine DBMS configuration and SQL tuning.Comment: 8 page
    • ā€¦
    corecore