3,212 research outputs found

    Basics of Feature Selection and Statistical Learning for High Energy Physics

    Get PDF
    This document introduces basics in data preparation, feature selection and learning basics for high energy physics tasks. The emphasis is on feature selection by principal component analysis, information gain and significance measures for features. As examples for basic statistical learning algorithms, the maximum a posteriori and maximum likelihood classifiers are shown. Furthermore, a simple rule based classification as a means for automated cut finding is introduced. Finally two toolboxes for the application of statistical learning techniques are introduced.Comment: 12 pages, 8 figures. Part of the proceedings of the Track 'Computational Intelligence for HEP Data Analysis' at iCSC 200

    Utilizing visible and near infrared spectroscopy based on multi-class support vector machines classification to characterize olive oil adulteration

    Get PDF
    Rapid and non-destructive adulteration detection is of particular importance to oil industries. This paper presents an application of visible and near-infrared spectroscopy (VNIR) for detection of adulteration levels in olive oil. Sunflower oil was used as an adulterant to olive oil and adulteration samples with different levels ranging from 0 to 40% were prepared and used for the experiments. The spectra were first considered in the range of 500-900 nm and then smoothened and normalized to reduce the light scattering effects. Principal component analysis (PCA) was performed on the spectra to have a primary data visualization and feature extraction. The extracted PCA scores were used to calculate the Mahalanobis distances of the adulterated samples from the pure sample. Further, the PCA scores were fed to the multi-class support vector machine (SVM) model to perform classification on the basis of different adulteration levels. The results showed that the spectral normalization highlighted different regions over the spectrum affected due to the adulteration. The PCA score biplots showed differences in the samples based on the different amounts of the adulteration. Moreover, the Mahalanobis distance provided a quantitative measure of the differences between the adulterated oil and the pure oil samples. The SVM modelling further supported the classification of the different levels of the adulteration. Consequently, the VNIRS in combination with the SVM could support the development of the classification protocols for detection of adulteration in olive oils

    Energy Consumption Data Based Machine Anomaly Detection

    Get PDF

    Fault detection in operating helicopter drive train components based on support vector data description

    Get PDF
    The objective of the paper is to develop a vibration-based automated procedure dealing with early detection of mechanical degradation of helicopter drive train components using Health and Usage Monitoring Systems (HUMS) data. An anomaly-detection method devoted to the quantification of the degree of deviation of the mechanical state of a component from its nominal condition is developed. This method is based on an Anomaly Score (AS) formed by a combination of a set of statistical features correlated with specific damages, also known as Condition Indicators (CI), thus the operational variability is implicitly included in the model through the CI correlation. The problem of fault detection is then recast as a one-class classification problem in the space spanned by a set of CI, with the aim of a global differentiation between normal and anomalous observations, respectively related to healthy and supposedly faulty components. In this paper, a procedure based on an efficient one-class classification method that does not require any assumption on the data distribution, is used. The core of such an approach is the Support Vector Data Description (SVDD), that allows an efficient data description without the need of a significant amount of statistical data. Several analyses have been carried out in order to validate the proposed procedure, using flight vibration data collected from a H135, formerly known as EC135, servicing helicopter, for which micro-pitting damage on a gear was detected by HUMS and assessed through visual inspection. The capability of the proposed approach of providing better trade-off between false alarm rates and missed detection rates with respect to individual CI and to the AS obtained assuming jointly-Gaussian-distributed CI has been also analysed

    Anomaly Detection in Sequential Data: A Deep Learning-Based Approach

    Get PDF
    Anomaly Detection has been researched in various domains with several applications in intrusion detection, fraud detection, system health management, and bio-informatics. Conventional anomaly detection methods analyze each data instance independently (univariate or multivariate) and ignore the sequential characteristics of the data. Anomalies in the data can be detected by grouping the individual data instances into sequential data and hence conventional way of analyzing independent data instances cannot detect anomalies. Currently: (1) Deep learning-based algorithms are widely used for anomaly detection purposes. However, significant computational overhead time is incurred during the training process due to static constant batch size and learning rate parameters for each epoch, (2) the threshold to decide whether an event is normal or malicious is often set as static. This can drastically increase the false alarm rate if the threshold is set low or decrease the True Alarm rate if it is set to a remarkably high value, (3) Real-life data is messy. It is impossible to learn the data features by training just one algorithm. Therefore, several one-class-based algorithms need to be trained. The final output is the ensemble of the output from all the algorithms. The prediction accuracy can be increased by giving a proper weight to each algorithm\u27s output. By extending the state-of-the-art techniques in learning-based algorithms, this dissertation provides the following solutions: (i) To address (1), we propose a hybrid, dynamic batch size and learning rate tuning algorithm that reduces the overall training time of the neural network. (ii) As a solution for (2), we present an adaptive thresholding algorithm that reduces high false alarm rates. (iii) To overcome (3), we propose a multilevel hybrid ensemble anomaly detection framework that increases the anomaly detection rate of the high dimensional dataset
    • …
    corecore