4,148 research outputs found
Fault detection in operating helicopter drive train components based on support vector data description
The objective of the paper is to develop a vibration-based automated procedure dealing with early detection of
mechanical degradation of helicopter drive train components using Health and Usage Monitoring Systems (HUMS) data. An anomaly-detection method devoted to the quantification of the degree of deviation of the mechanical state of a component from its nominal condition is developed. This method is based on an Anomaly Score (AS) formed by a combination of a set of statistical features correlated with specific damages, also known as Condition Indicators (CI), thus the operational variability is implicitly included in the model through the CI correlation. The problem of fault detection is then recast as a one-class classification problem in the space spanned by a set of CI, with the aim of a global differentiation between normal and anomalous observations, respectively related to healthy and supposedly faulty components. In this paper, a procedure based on an efficient one-class classification method that does not require any assumption on the data distribution, is used. The core of such an approach is the Support Vector Data Description (SVDD), that allows an efficient data description without the need of a significant amount of statistical data. Several analyses have been carried out in order to validate the proposed procedure, using flight vibration data collected from a H135, formerly known as EC135, servicing helicopter, for which micro-pitting damage on a gear was detected by HUMS and assessed through visual inspection. The capability of the proposed approach of providing better trade-off between false alarm rates and missed detection rates with respect to individual CI and to the AS obtained assuming jointly-Gaussian-distributed CI has been also analysed
Aspects of Bayesian inference, classification and anomaly detection
The primary objective of this thesis is to develop rigorous Bayesian tools for common statistical challenges arising in modern science where there is a heightened demand for precise inference in the presence of large, known uncertainties. This thesis explores in detail two arenas where this manifests. The first is the development and testing of a unified Bayesian anomaly detection and classification framework (BADAC) which allows principled anomaly detection in the presence of measurement uncertainties, which are rarely incorporated into machine learning algorithms. BADAC deals with uncertainties by marginalising over the unknown, true value of the data. Using simulated data with Gaussian noise as an example, BADAC is shown to be superior to standard algorithms in both classification and anomaly detection performance in the presence of uncertainties. Additionally, BADAC provides well-calibrated classification probabilities, valuable for use in scientific pipelines. BADAC is therefore ideal where computational cost is not a limiting factor and statistical rigour is important. We discuss approximations to speed up BADAC, such as the use of Gaussian processes, and finally introduce a new metric, the Rank-Weighted Score (RWS), that is particularly suited to evaluating an algorithm's ability to detect anomalies. The second major exploration in this thesis presents methods for rigorous statistical inference in the presence of classification uncertainties and errors. Although this is explored specifically through supernova cosmology, the context is general. Supernova cosmology without spectra will be an important component of future surveys due to massive increases in data volumes in next-generation surveys such as from the Vera C. Rubin Observatory. This lack of supernova spectra results both in uncertainty in the redshifts and type of the supernova, which if ignored, leads to significantly biased estimates of cosmological parameters. We present a hierarchical Bayesian formalism, zBEAMS, which addresses this problem by marginalising over the unknown or uncertain supernova redshifts and types to produce unbiased cosmological estimates that are competitive with supernova data with fully spectroscopically confirmed redshifts. zBEAMS thus provides a unified treatment of both photometric redshifts, classification uncertainty and host galaxy misidentification, effectively correcting the inevitable contamination in the Hubble diagram with little or no loss of statistical power
On Practical machine Learning and Data Analysis
This thesis discusses and addresses some of the difficulties
associated with practical machine learning and data
analysis. Introducing data driven methods in e.g industrial and
business applications can lead to large gains in productivity and
efficiency, but the cost and complexity are often
overwhelming. Creating machine learning applications in practise often
involves a large amount of manual labour, which often needs to be
performed by an experienced analyst without significant experience
with the application area. We will here discuss some of the hurdles
faced in a typical analysis project and suggest measures and methods
to simplify the process.
One of the most important issues when applying machine learning
methods to complex data, such as e.g. industrial applications, is that
the processes generating the data are modelled in an appropriate
way. Relevant aspects have to be formalised and represented in a way
that allow us to perform our calculations in an efficient manner. We
present a statistical modelling framework, Hierarchical Graph
Mixtures, based on a combination of graphical models and mixture
models. It allows us to create consistent, expressive statistical
models that simplify the modelling of complex systems. Using a
Bayesian approach, we allow for encoding of prior knowledge and make
the models applicable in situations when relatively little data are
available.
Detecting structures in data, such as clusters and dependency
structure, is very important both for understanding an application
area and for specifying the structure of e.g. a hierarchical graph
mixture. We will discuss how this structure can be extracted for
sequential data. By using the inherent dependency structure of
sequential data we construct an information theoretical measure of
correlation that does not suffer from the problems most common
correlation measures have with this type of data.
In many diagnosis situations it is desirable to perform a
classification in an iterative and interactive manner. The matter is
often complicated by very limited amounts of knowledge and examples
when a new system to be diagnosed is initially brought into use. We
describe how to create an incremental classification system based on a
statistical model that is trained from empirical data, and show how
the limited available background information can still be used
initially for a functioning diagnosis system.
To minimise the effort with which results are achieved within data
analysis projects, we need to address not only the models used, but
also the methodology and applications that can help simplify the
process. We present a methodology for data preparation and a software
library intended for rapid analysis, prototyping, and deployment.
Finally, we will study a few example applications, presenting tasks
within classification, prediction and anomaly detection. The examples
include demand prediction for supply chain management, approximating
complex simulators for increased speed in parameter optimisation, and
fraud detection and classification within a media-on-demand system
Steganographer Identification
Conventional steganalysis detects the presence of steganography within single
objects. In the real-world, we may face a complex scenario that one or some of
multiple users called actors are guilty of using steganography, which is
typically defined as the Steganographer Identification Problem (SIP). One might
use the conventional steganalysis algorithms to separate stego objects from
cover objects and then identify the guilty actors. However, the guilty actors
may be lost due to a number of false alarms. To deal with the SIP, most of the
state-of-the-arts use unsupervised learning based approaches. In their
solutions, each actor holds multiple digital objects, from which a set of
feature vectors can be extracted. The well-defined distances between these
feature sets are determined to measure the similarity between the corresponding
actors. By applying clustering or outlier detection, the most suspicious
actor(s) will be judged as the steganographer(s). Though the SIP needs further
study, the existing works have good ability to identify the steganographer(s)
when non-adaptive steganographic embedding was applied. In this chapter, we
will present foundational concepts and review advanced methodologies in SIP.
This chapter is self-contained and intended as a tutorial introducing the SIP
in the context of media steganography.Comment: A tutorial with 30 page
- …