16 research outputs found
ATHMoS: Automated Telemetry Health Monitoring System at GSOC using Outlier Detection and Supervised Machine Learning
Knowing which telemetry parameters are behaving accordingly and those which are behaving out of the ordinary is vital information for continued mission success. For a large amount of different parameters, it is not possible to monitor all of them manually. One of the simplest methods of monitoring the behavior of telemetry is the Out Of Limit (OOL) check, which monitors whether a value exceeds its upper or lower limit. A fundamental problem occurs when a telemetry parameter is showing signs of abnormal behavior; yet, the values are not extreme enough for the OOL-check to detect the problem. By the time the OOL threshold is reached, it could be too late for the operators to react.
To solve this problem, the Automated Telemetry Health Monitoring System (ATHMoS) is in development at the German Space Operation Center (GSOC). At the heart of the framework is a novel algorithm for statistical outlier detection which makes use of the so-called Intrinsic Dimensionality (ID) of a data set. Using an ID measure as the core data mining technique allows us to not only run ATHMoS on a parameter by parameter basis, but also monitor and flag anomalies for multi-parameter interactions.
By aggregating past telemetry data and employing these techniques, ATHMoS employs a supervised machine learning approach to construct three databases: Historic Nominal data, Recent Nominal data and past Anomaly data. Once new telemetry is received, the algorithm makes a distinction between nominal behaviour and new potentially dangerous behaviour; the latter of which is then flagged to mission engineers. ATHMoS continually learns to distinguish between new nominal behavior and true anomaly events throughout the mission lifetime. To this end, we present an overview of the algorithms ATHMoS uses as well an example where we successfully detected both previously unknown, and known anomalies for an ongoing mission at GSOC
Dimension Estimation Using Random Connection Models
Information about intrinsic dimension is crucial to perform dimensionality
reduction, compress information, design efficient algorithms, and do
statistical adaptation. In this paper we propose an estimator for the intrinsic
dimension of a data set. The estimator is based on binary neighbourhood
information about the observations in the form of two adjacency matrices, and
does not require any explicit distance information. The underlying graph is
modelled according to a subset of a specific random connection model, sometimes
referred to as the Poisson blob model. Computationally the estimator scales
like n log n, and we specify its asymptotic distribution and rate of
convergence. A simulation study on both real and simulated data shows that our
approach compares favourably with some competing methods from the literature,
including approaches that rely on distance information
The generalized ratios intrinsic dimension estimator
Modern datasets are characterized by numerous features related by complex dependency structures. To deal with these data, dimensionality reduction techniques are essential. Many of these techniques rely on the concept of intrinsic dimension (id), a measure of the complexity of the dataset. However, the estimation of this quantity is not trivial: often, the id depends rather dramatically on the scale of the distances among data points. At short distances, the id can be grossly overestimated due to the presence of noise, becoming smaller and approximately scale-independent only at large distances. An immediate approach to examining the scale dependence consists in decimating the dataset, which unavoidably induces non-negligible statistical errors at large scale. This article introduces a novel statistical method, Gride, that allows estimating the id as an explicit function of the scale without performing any decimation. Our approach is based on rigorous distributional results that enable the quantification of uncertainty of the estimates. Moreover, our method is simple and computationally efficient since it relies only on the distances among data points. Through simulation studies, we show that Gride is asymptotically unbiased, provides comparable estimates to other state-of-the-art methods, and is more robust to short-scale noise than other likelihood-based approaches
A Modern Approach to Visualise Structured and Unstructured Space Missions' Data
In this paper the Visualisation and Data Analysis (ViDA) project, currently being developed at the German Space Operations Center (GSOC), is presented. ViDA is a modern, interactive, web-based frontend tool designed to efficiently explore various types of data generated by space missions. It is more than just a telemetry display tool and, as such, includes features from business intelligence, data science and AI tools, while being focused on the multi-spacecraft operations use case. The paper describes how the big data challenges (volume, variety, variability, complexity, value) in the context of spacecraft operations have been addressed and how the adopted solutions have been integrated into ViDA. It also highlights the importance of contextual knowledge as crucial point for the design and implementation of ViDA. The techniques used for creating appropriate visual representations of the data and their relations are described. Such visualisations are specifically designed to deliver interpretable results to the users, thus helping them to quickly extract knowledge from them during their analytical process. Finally, the integration of ViDA into the ground system and its connections to the other tools in the telemetry/telecommand chain are discussed
Image Classification with Deep Learning in the Presence of Noisy Labels: A Survey
Image classification systems recently made a giant leap with the advancement
of deep neural networks. However, these systems require an excessive amount of
labeled data to be adequately trained. Gathering a correctly annotated dataset
is not always feasible due to several factors, such as the expensiveness of the
labeling process or difficulty of correctly classifying data, even for the
experts. Because of these practical challenges, label noise is a common problem
in real-world datasets, and numerous methods to train deep neural networks with
label noise are proposed in the literature. Although deep neural networks are
known to be relatively robust to label noise, their tendency to overfit data
makes them vulnerable to memorizing even random noise. Therefore, it is crucial
to consider the existence of label noise and develop counter algorithms to fade
away its adverse effects to train deep neural networks efficiently. Even though
an extensive survey of machine learning techniques under label noise exists,
the literature lacks a comprehensive survey of methodologies centered
explicitly around deep learning in the presence of noisy labels. This paper
aims to present these algorithms while categorizing them into one of the two
subgroups: noise model based and noise model free methods. Algorithms in the
first group aim to estimate the noise structure and use this information to
avoid the adverse effects of noisy labels. Differently, methods in the second
group try to come up with inherently noise robust algorithms by using
approaches like robust losses, regularizers or other learning paradigms