192 research outputs found
A Survey on Explainable Anomaly Detection
In the past two decades, most research on anomaly detection has focused on
improving the accuracy of the detection, while largely ignoring the
explainability of the corresponding methods and thus leaving the explanation of
outcomes to practitioners. As anomaly detection algorithms are increasingly
used in safety-critical domains, providing explanations for the high-stakes
decisions made in those domains has become an ethical and regulatory
requirement. Therefore, this work provides a comprehensive and structured
survey on state-of-the-art explainable anomaly detection techniques. We propose
a taxonomy based on the main aspects that characterize each explainable anomaly
detection technique, aiming to help practitioners and researchers find the
explainable anomaly detection method that best suits their needs.Comment: Paper accepted by the ACM Transactions on Knowledge Discovery from
Data (TKDD) for publication (preprint version
A survey on explainable anomaly detection
NWOAlgorithms and the Foundations of Software technolog
Current Time Series Anomaly Detection Benchmarks are Flawed and are Creating the Illusion of Progress
Time series anomaly detection has been a perennially important topic in data
science, with papers dating back to the 1950s. However, in recent years there
has been an explosion of interest in this topic, much of it driven by the
success of deep learning in other domains and for other time series tasks. Most
of these papers test on one or more of a handful of popular benchmark datasets,
created by Yahoo, Numenta, NASA, etc. In this work we make a surprising claim.
The majority of the individual exemplars in these datasets suffer from one or
more of four flaws. Because of these four flaws, we believe that many published
comparisons of anomaly detection algorithms may be unreliable, and more
importantly, much of the apparent progress in recent years may be illusionary.
In addition to demonstrating these claims, with this paper we introduce the UCR
Time Series Anomaly Archive. We believe that this resource will perform a
similar role as the UCR Time Series Classification Archive, by providing the
community with a benchmark that allows meaningful comparisons between
approaches and a meaningful gauge of overall progress
Automated Quality Control for Sensor Based Symptom Measurement Performed Outside the Lab
The use of wearable sensing technology for objective, non-invasive and remote clinimetric testing of symptoms has considerable potential. However, the accuracy achievable with such technology is highly reliant on separating the useful from irrelevant sensor data. Monitoring patient symptoms using digital sensors outside of controlled, clinical lab settings creates a variety of practical challenges, such as recording unexpected user behaviors. These behaviors often violate the assumptions of clinimetric testing protocols, where these protocols are designed to probe for specific symptoms. Such violations are frequent outside the lab and affect the accuracy of the subsequent data analysis and scientific conclusions. To address these problems, we report on a unified algorithmic framework for automated sensor data quality control, which can identify those parts of the sensor data that are sufficiently reliable for further analysis. Combining both parametric and nonparametric signal processing and machine learning techniques, we demonstrate that across 100 subjects and 300 clinimetric tests from three different types of behavioral clinimetric protocols, the system shows an average segmentation accuracy of around 90%. By extracting reliable sensor data, it is possible to strip the data of confounding factors in the environment that may threaten reproducibility and replicability
Causal Discovery from Temporal Data: An Overview and New Perspectives
Temporal data, representing chronological observations of complex systems,
has always been a typical data structure that can be widely generated by many
domains, such as industry, medicine and finance. Analyzing this type of data is
extremely valuable for various applications. Thus, different temporal data
analysis tasks, eg, classification, clustering and prediction, have been
proposed in the past decades. Among them, causal discovery, learning the causal
relations from temporal data, is considered an interesting yet critical task
and has attracted much research attention. Existing casual discovery works can
be divided into two highly correlated categories according to whether the
temporal data is calibrated, ie, multivariate time series casual discovery, and
event sequence casual discovery. However, most previous surveys are only
focused on the time series casual discovery and ignore the second category. In
this paper, we specify the correlation between the two categories and provide a
systematical overview of existing solutions. Furthermore, we provide public
datasets, evaluation metrics and new perspectives for temporal data casual
discovery.Comment: 52 pages, 6 figure
Digital Oculomotor Biomarkers in Dementia
Dementia is an umbrella term that covers a number of neurodegenerative syndromes featuring gradual disturbance of various cognitive functions that are severe enough to interfere with tasks of daily life. The diagnosis of dementia occurs frequently when pathological changes have been developing for years, symptoms of cognitive impairment are evident and the quality of life of the patients has already been deteriorated significantly. Although brain imaging and fluid biomarkers allow the monitoring of disease progression in vivo, they are expensive, invasive and not necessarily diagnostic in isolation. Recent studies suggest that eye-tracking technology is an innovative tool that holds promise for accelerating early detection of the disease, as well as, supporting the development of strategies that minimise impairment during every day activities. However, the optimal methods for quantitative evaluation of oculomotor behaviour during complex and naturalistic tasks in dementia have yet to be determined. This thesis investigates the development of computational tools and techniques to analyse eye movements of dementia patients and healthy controls under naturalistic and less constrained scenarios to identify novel digital oculomotor biomarkers. Three key contributions are made. First, the evaluation of the role of environment during navigation in patients with typical Alzheimer disease and Posterior Cortical Atrophy compared to a control group using a combination of eye movement and egocentric video analysis. Secondly, the development of a novel method of extracting salient features directly from the raw eye-tracking data of a mixed sample of dementia patients during a novel instruction-less cognitive test to detect oculomotor biomarkers of dementia-related cognitive dysfunction. Third, the application of unsupervised anomaly detection techniques for visualisation of oculomotor anomalies during various cognitive tasks. The work presented in this thesis furthers our understanding of dementia-related oculomotor dysfunction and gives future research direction for the development of computerised cognitive tests and ecological interventions
Artificial intelligence for dementia prevention
INTRODUCTION:
A wide range of modifiable risk factors for dementia have been identified. Considerable debate remains about these risk factors, possible interactions between them or with genetic risk, and causality, and how they can help in clinical trial recruitment and drug development. Artificial intelligence (AI) and machine learning (ML) may refine understanding.//
METHODS:
ML approaches are being developed in dementia prevention. We discuss exemplar uses and evaluate the current applications and limitations in the dementia prevention field.//
RESULTS:
Risk-profiling tools may help identify high-risk populations for clinical trials; however, their performance needs improvement. New risk-profiling and trial-recruitment tools underpinned by ML models may be effective in reducing costs and improving future trials. ML can inform drug-repurposing efforts and prioritization of disease-modifying therapeutics.//
DISCUSSION:
ML is not yet widely used but has considerable potential to enhance precision in dementia prevention
Automated Intelligent Cueing Device to Improve Ambient Gait Behaviors for Patients with Parkinson\u27s Disease
Freezing of gait (FoG) is a common motor dysfunction in individuals with Parkinson’s disease (PD). FoG impairs walking and is associated with increased fall risk. Although pharmacological treatments have shown promise during ON-medication periods, FoG remains difficult to treat during medication OFF state and in advanced stages of the disease. External cueing therapy in the forms of visual, auditory, and vibrotactile, has been effective in treating gait deviations. Intelligent (or on-demand) cueing devices are novel systems that analyze gait patterns in real-time and activate cues only at moments when specific gait alterations are detected. In this study we developed methods to analyze gait signals collected through wearable sensors and accurately identify FoG episodes. We also investigated the potential of predicting the symptoms before their actual occurrence.
We collected data from seven participants with PD using two Inertial Measurement Units (IMUs) on ankles. In our first study, we extracted engineered features from the signals and used machine learning (ML) methods to identify FoG episodes. We tested the performance of models using patient-dependent and patient-independent paradigms. The former models achieved 92.5% and 89.0% for average sensitivity and specificity, respectively. However, the conventional binary classification methods fail to accurately classify data if only data from normal gait periods are available. In order to identify FoG episodes in participants who did not freeze during data collection sessions, we developed a Deep Gait Anomaly Detector (DGAD) to identify anomalies (i.e., FoG) in the signals. DGAD was formed of convolutional layers and trained to automatically learn features from signals. The convolutional layers are followed by fully connected layers to reduce the dimensions of the features. A k-nearest neighbors (kNN) classifier is then used to classify the data as normal or FoG. The models identified 87.4% of FoG onsets, with 21.9% being predicted on average for each participant. This study demonstrates our algorithm\u27s potential for delivery of preventive cues. The DGAD algorithm was then implemented in an Android application to monitor gait patterns of PD patients in ambient environments. The phone triggered vibrotactile and auditory cues on a connected smartwatch if an FoG episode was identified. A 6-week in-home study showed the potentials for effective treatment of FoG severity in ambient environments using intelligent cueing devices
Unmasking Clever Hans Predictors and Assessing What Machines Really Learn
Current learning machines have successfully solved hard application problems,
reaching high accuracy and displaying seemingly "intelligent" behavior. Here we
apply recent techniques for explaining decisions of state-of-the-art learning
machines and analyze various tasks from computer vision and arcade games. This
showcases a spectrum of problem-solving behaviors ranging from naive and
short-sighted, to well-informed and strategic. We observe that standard
performance evaluation metrics can be oblivious to distinguishing these diverse
problem solving behaviors. Furthermore, we propose our semi-automated Spectral
Relevance Analysis that provides a practically effective way of characterizing
and validating the behavior of nonlinear learning machines. This helps to
assess whether a learned model indeed delivers reliably for the problem that it
was conceived for. Furthermore, our work intends to add a voice of caution to
the ongoing excitement about machine intelligence and pledges to evaluate and
judge some of these recent successes in a more nuanced manner.Comment: Accepted for publication in Nature Communication
Behaviour Profiling using Wearable Sensors for Pervasive Healthcare
In recent years, sensor technology has advanced in terms of hardware sophistication and miniaturisation. This has led to the incorporation of unobtrusive, low-power sensors into networks centred on human participants, called Body Sensor Networks. Amongst the most important applications of these networks is their use in healthcare and healthy living. The technology has the possibility of decreasing burden on the healthcare systems by providing care at home, enabling early detection of symptoms, monitoring recovery remotely, and avoiding serious chronic illnesses by promoting healthy living through objective feedback. In this thesis, machine learning and data mining techniques are developed to estimate medically relevant parameters from a participant‘s activity and behaviour parameters, derived from simple, body-worn sensors.
The first abstraction from raw sensor data is the recognition and analysis of activity. Machine learning analysis is applied to a study of activity profiling to detect impaired limb and torso mobility. One of the advances in this thesis to activity recognition research is in the application of machine learning to the analysis of 'transitional activities': transient activity that occurs as people change their activity. A framework is proposed for the detection and analysis of transitional activities. To demonstrate the utility of transition analysis, we apply the algorithms to a study of participants undergoing and recovering from surgery. We demonstrate that it is possible to see meaningful changes in the transitional activity as the participants recover.
Assuming long-term monitoring, we expect a large historical database of activity to quickly accumulate. We develop algorithms to mine temporal associations to activity patterns. This gives an outline of the user‘s routine. Methods for visual and quantitative analysis of routine using this summary data structure are proposed and validated. The activity and routine mining methodologies developed for specialised sensors are adapted to a smartphone application, enabling large-scale use. Validation of the algorithms is performed using datasets collected in laboratory settings, and free living scenarios. Finally, future research directions and potential improvements to the techniques developed in this thesis are outlined
- …