11 research outputs found
Directional outlyingness applied to distances between genomic words
The detection of outlier curves/images is crucial in many areas, such as environmental, meteorological, medical, or economic contexts. In the functional framework, outlying observations are not only those that contain atypically high or low values, but also curves that present a different shape or pattern from the rest of the curves in the sample. In this short paper, we mention some recent methods for outlier detection in functional data and apply a recently proposed measure, the directional outlyingness, and the functional outlier map to detect words with outlying distance
distribution in the human genome.publishe
Robust functional regression based on principal components
Functional data analysis is a fast evolving branch of modern statistics and
the functional linear model has become popular in recent years. However, most
estimation methods for this model rely on generalized least squares procedures
and therefore are sensitive to atypical observations. To remedy this, we
propose a two-step estimation procedure that combines robust functional
principal components and robust linear regression. Moreover, we propose a
transformation that reduces the curvature of the estimators and can be
advantageous in many settings. For these estimators we prove Fisher-consistency
at elliptical distributions and consistency under mild regularity conditions.
The influence function of the estimators is investigated as well. Simulation
experiments show that the proposed estimators have reasonable efficiency,
protect against outlying observations, produce smooth estimates and perform
well in comparison to existing approaches.Comment: 33 pages, including the appendix and reference
Automated data inspection in jet engines
Rolls Royce accumulate a large amount of sensor data throughout the testing and deployment of their engines. The availability of this rich source of data offers exciting opportunities to automate the monitoring and testing of the engines. In this thesis we have developed statistical models to make meaningful insights from engine test data. We have built a classification model to identify different types of engine running in Pass-Off tests. The labels can be used for post-analysis and highlight problematic engine tests. The model has been applied to two different types of engines, in which it gives close to perfect classification accuracy. We have also created an unsupervised approach when there are no defined classes of engine running. These models have been incorporated into Rolls Royce systems. Early warnings for potential issues can enable relatively cheap maintenance to be performed and reduce the risk of irreparable engine damage. We have therefore developed an outlier detection model to identify abnormal temperature behaviour. The capabilities of the model are shown theoretically and tested on experimental and real data. Lastly, in a test decisions are made by engineers to ensure the engine complies with certain standards. To support the engineers we have developed a predictive model to identify segments of the engine test that should be retested. The model is tested against the current decision making of the engineers, and gives good predictive performance. The model highlights the possibility of automating the decision making process within a test
Novel Methods for the Detection of Emergent Phenomena in Streaming Data
In the fast paced and data rich world of today there is an increased demand for methods that analyse a stream of data in real time. In particular, there is a desire for methods that can identify phenomena in the data stream as they are emerging. These emergent phenomena can be viewed as observations being received that are surprising when compared to the history of the data. Motivated by challenges in the telecommunications sector, we develop methods that operate when the stream does not follow classical assumptions. This includes when the data are not independent or identically distributed, or when the phenomena occur gradually over time. This thesis makes three contributions to the field of anomaly detection for streaming data. The first, Non-Parametric Unbounded Change (NUNC), provides a non-parametric method for identifying changes in the distribution of a data stream. The second, Functional Anomaly Sequential Test (FAST), provides a method for identifying deviations from an expected shape in a stream of partially observed functional data. The third, mvFAST, extends FAST to the multivariate functional data setting
Discussion of “Multivariate Functional Outlier Detection”, by Mia Hubert, Peter Rousseeuw and Pieter Segaert
info:eu-repo/semantics/publishe
CLADAG 2021 BOOK OF ABSTRACTS AND SHORT PAPERS
The book collects the short papers presented at the 13th Scientific Meeting of the Classification and Data Analysis Group (CLADAG) of the Italian Statistical Society (SIS). The meeting has been organized by the Department of Statistics, Computer Science and Applications of the University of Florence, under the auspices of the Italian Statistical Society and the International Federation of Classification Societies (IFCS). CLADAG is a member of the IFCS, a federation of national, regional, and linguistically-based classification societies. It is a non-profit, non-political scientific organization, whose aims are to further classification research