22 research outputs found
Raising the ClaSS of Streaming Time Series Segmentation
Ubiquitous sensors today emit high frequency streams of numerical
measurements that reflect properties of human, animal, industrial, commercial,
and natural processes. Shifts in such processes, e.g. caused by external events
or internal state changes, manifest as changes in the recorded signals. The
task of streaming time series segmentation (STSS) is to partition the stream
into consecutive variable-sized segments that correspond to states of the
observed processes or entities. The partition operation itself must in
performance be able to cope with the input frequency of the signals. We
introduce ClaSS, a novel, efficient, and highly accurate algorithm for STSS.
ClaSS assesses the homogeneity of potential partitions using self-supervised
time series classification and applies statistical tests to detect significant
change points (CPs). In our experimental evaluation using two large benchmarks
and six real-world data archives, we found ClaSS to be significantly more
precise than eight state-of-the-art competitors. Its space and time complexity
is independent of segment sizes and linear only in the sliding window size. We
also provide ClaSS as a window operator with an average throughput of 538 data
points per second for the Apache Flink streaming engine
Opening the black-box of artificial intelligence predictions on clinical decision support systems
Cardiovascular diseases are the leading global death cause. Their treatment and prevention
rely on electrocardiogram interpretation, which is dependent on the physician’s variability.
Subjectiveness is intrinsic to electrocardiogram interpretation and hence, prone to
errors. To assist physicians in making precise and thoughtful decisions, artificial intelligence
is being deployed to develop models that can interpret extent datasets and provide
accurate decisions. However, the lack of interpretability of most machine learning models
stands as one of the drawbacks of their deployment, particularly in the medical domain.
Furthermore, most of the currently deployed explainable artificial intelligence methods
assume independence between features, which means temporal independence when dealing
with time series. The inherent characteristic of time series cannot be ignored as it
carries importance for the human decision making process.
This dissertation focuses on the explanation of heartbeat classification using several
adaptations of state-of-the-art model-agnostic methods, to locally explain time series classification.
To address the explanation of time series classifiers, a preliminary conceptual
framework is proposed, and the use of the derivative is suggested as a complement to
add temporal dependency between samples. The results were validated on an extent
public dataset, through the 1-D Jaccard’s index, which consists of the comparison of the
subsequences extracted from an interpretable model and the explanation methods used.
Secondly, through the performance’s decrease, to evaluate whether the explanation fits
the model’s behaviour. To assess models with distinct internal logic, the validation was
conducted on a more transparent model and more opaque one in both binary and multiclass
situation. The results show the promising use of including the signal’s derivative
to introduce temporal dependency between samples in the explanations, for models with
simpler internal logic.As doenças cardiovasculares são, a nÃvel mundial, a principal causa de morte e o seu
tratamento e prevenção baseiam-se na interpretação do electrocardiograma. A interpretação
do electrocardiograma, feita por médicos, é intrinsecamente subjectiva e, portanto,
sujeita a erros. De modo a apoiar a decisão dos médicos, a inteligência artificial está a ser
usada para desenvolver modelos com a capacidade de interpretar extensos conjuntos de
dados e fornecer decisões precisas. No entanto, a falta de interpretabilidade da maioria
dos modelos de aprendizagem automática é uma das desvantagens do recurso à mesma,
principalmente em contexto clÃnico. Adicionalmente, a maioria dos métodos inteligência
artifical explicável assumem independência entre amostras, o que implica a assunção de
independência temporal ao lidar com séries temporais. A caracterÃstica inerente das séries
temporais não pode ser ignorada, uma vez que apresenta importância para o processo de
tomada de decisão humana.
Esta dissertação baseia-se em inteligência artificial explicável para tornar inteligÃvel
a classificação de batimentos cardÃacos, através da utilização de várias adaptações de
métodos agnósticos do estado-da-arte. Para abordar a explicação dos classificadores de
séries temporais, propõe-se uma taxonomia preliminar, e o uso da derivada como um
complemento para adicionar dependência temporal entre as amostras. Os resultados foram
validados para um conjunto extenso de dados públicos, por meio do Ãndice de Jaccard
em 1-D, com a comparação das subsequências extraÃdas de um modelo interpretável e os
métodos inteligência artificial explicável utilizados, e a análise de qualidade, para avaliar
se a explicação se adequa ao comportamento do modelo. De modo a avaliar modelos com
lógicas internas distintas, a validação foi realizada usando, por um lado, um modelo mais
transparente e, por outro, um mais opaco, tanto numa situação de classificação binária
como numa situação de classificação multiclasse. Os resultados mostram o uso promissor
da inclusão da derivada do sinal para introduzir dependência temporal entre as amostras
nas explicações fornecidas, para modelos com lógica interna mais simples
Flexible Time Series Matching for Clinical and Behavioral Data
Time Series data became broadly applied by the research community in the last decades after
a massive explosion of its availability. Nonetheless, this rise required an improvement
in the existing analysis techniques which, in the medical domain, would help specialists
to evaluate their patients condition. One of the key tasks in time series analysis is pattern
recognition (segmentation and classification). Traditional methods typically perform subsequence
matching, making use of a pattern template and a similarity metric to search
for similar sequences throughout time series. However, real-world data is noisy and variable
(morphological distortions), making a template-based exact matching an elementary
approach. Intending to increase flexibility and generalize the pattern searching tasks
across domains, this dissertation proposes two Deep Learning-based frameworks to solve
pattern segmentation and anomaly detection problems.
Regarding pattern segmentation, a Convolution/Deconvolution Neural Network is
proposed, learning to distinguish, point-by-point, desired sub-patterns from background
content within a time series. The proposed framework was validated in two use-cases:
electrocardiogram (ECG) and inertial sensor-based human activity (IMU) signals. It outperformed
two conventional matching techniques, being capable of notably detecting the
targeted cycles even in noise-corrupted or extremely distorted signals, without using any
reference template nor hand-coded similarity scores.
Concerning anomaly detection, the proposed unsupervised framework uses the reconstruction
ability of Variational Autoencoders and a local similarity score to identify
non-labeled abnormalities. The proposal was validated in two public ECG datasets (MITBIH
Arrhythmia and ECG5000), performing cardiac arrhythmia identification. Results
indicated competitiveness relative to recent techniques, achieving detection AUC scores
of 98.84% (ECG5000) and 93.32% (MIT-BIH Arrhythmia).Dados de séries temporais tornaram-se largamente aplicados pela comunidade cientÃfica
nas últimas decadas após um aumento massivo da sua disponibilidade. Contudo, este
aumento exigiu uma melhoria das atuais técnicas de análise que, no domÃnio clÃnico, auxiliaria
os especialistas na avaliação da condição dos seus pacientes. Um dos principais
tipos de análise em séries temporais é o reconhecimento de padrões (segmentação e classificação).
Métodos tradicionais assentam, tipicamente, em técnicas de correspondência em
subsequências, fazendo uso de um padrão de referência e uma métrica de similaridade
para procurar por subsequências similares ao longo de séries temporais. Todavia, dados
do mundo real são ruidosos e variáveis (morfologicamente), tornando uma correspondência
exata baseada num padrão de referência uma abordagem rudimentar. Pretendendo
aumentar a flexibilidade da análise de séries temporais e generalizar tarefas de procura
de padrões entre domÃnios, esta dissertação propõe duas abordagens baseadas em Deep
Learning para solucionar problemas de segmentação de padrões e deteção de anomalias.
Acerca da segmentação de padrões, a rede neuronal de Convolução/Deconvolução
proposta aprende a distinguir, ponto a ponto, sub-padrões pretendidos de conteúdo de
fundo numa série temporal. O modelo proposto foi validado em dois casos de uso: sinais
eletrocardiográficos (ECG) e de sensores inerciais em atividade humana (IMU). Este superou
duas técnicas convencionais, sendo capaz de detetar os ciclos-alvo notavelmente,
mesmo em sinais corrompidos por ruÃdo ou extremamente distorcidos, sem o uso de
nenhum padrão de referência nem métricas de similaridade codificadas manualmente.
A respeito da deteção de anomalias, a técnica não supervisionada proposta usa a
capacidade de reconstrução dos Variational Autoencoders e uma métrica de similaridade
local para identificar anomalias desconhecidas. A proposta foi validada na identificação
de arritmias cardÃacas em duas bases de dados públicas de ECG (MIT-BIH Arrhythmia e
ECG5000). Os resultados revelam competitividade face a técnicas recentes, alcançando
métricas AUC de deteção de 93.32% (MIT-BIH Arrhythmia) e 98.84% (ECG5000)
Shapelet Transforms for Univariate and Multivariate Time Series Classification
Time Series Classification (TSC) is a growing field of machine learning research. One particular algorithm from the TSC literature is the Shapelet Transform (ST). Shapelets are a phase independent subsequences that are extracted from times series to form discriminatory features. It has been shown that using the shapelets to transform the datasets into a new space can improve performance. One of the major problems with ST, is that the algorithm is O(n2m4), where n is the number of time series and m is the length of the series. As a problem increases in sizes, or additional dimensions are added, the algorithm quickly becomes computationally infeasible.
The research question addressed is whether the shapelet transform be improved in terms of accuracy and speed. Making algorithmic improvements to shapelets will enable the development of multivariate shapelet algorithms that can attempt to solve much larger problems in realistic time frames.
In support of this thesis a new distance early abandon method is proposed. A class balancing algorithm is implemented, which uses a one vs. all multi class information gain that enables heuristics which were developed for two class problems. To support these improvements a large scale analysis of the best shapelet algorithms is conducted as part of a larger experimental evaluation. ST is proven to be one of the most accurate algorithms in TSC on the UCR-UEA datasets. Contract classification is proposed for shapelets, where a fixed run time is set, and the number of shapelets is bounded. Four search algorithms are evaluated with fixed run times of one hour and one day, three of which are not significantly worse than a full enumeration. Finally, three multivariate shapelet algorithms are developed and compared to benchmark results and multivariate dynamic time warping
Towards a multipurpose neural network approach to novelty detection
Novelty detection, the identification of data that is unusual or different in some way, is relevant in a wide number of real-world scenarios, ranging from identifying unusual weather conditions to detecting evidence of damage in mechanical systems. However, utilising novelty detection approaches in a particular scenario presents significant challenges to the non-expert user. They must first select an appropriate approach from the novelty detection literature for their scenario. Then, suitable values must be determined for any parameters of the chosen approach. These challenges are at best time consuming and at worst prohibitively difficult for the user. Worse still, if no suitable approach can be found from the literature, then the user is left with the impossible task of designing a novelty detector themselves. In order to make novelty detection more accessible, an approach is required which does not pose the above challenges. This thesis presents such an approach, which aims to automatically construct novelty detectors for specific applications. The approach combines a neural network model, recently proposed to explain a phenomenon observed in the neural pathways of the retina, with an evolutionary algorithm that is capable of simultaneously evolving the structure and weights of a neural network in order to optimise its performance in a particular task. The proposed approach was evaluated over a number of very different novelty detection tasks. It was found that, in each task, the approach successfully evolved novelty detectors which outperformed a number of existing techniques from the literature. A number of drawbacks with the approach were also identified, and suggestions were given on ways in which these may potentially be overcome.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
Towards a multipurpose neural network approach to novelty detection
Novelty detection, the identification of data that is unusual or different in some way, is relevant in a wide number of real-world scenarios, ranging from identifying unusual weather conditions to detecting evidence of damage in mechanical systems. However, utilising novelty detection approaches in a particular scenario presents significant challenges to the non-expert user. They must first select an appropriate approach from the novelty detection literature for their scenario. Then, suitable values must be determined for any parameters of the chosen approach. These challenges are at best time consuming and at worst prohibitively difficult for the user. Worse still, if no suitable approach can be found from the literature, then the user is left with the impossible task of designing a novelty detector themselves. In order to make novelty detection more accessible, an approach is required which does not pose the above challenges. This thesis presents such an approach, which aims to automatically construct novelty detectors for specific applications. The approach combines a neural network model, recently proposed to explain a phenomenon observed in the neural pathways of the retina, with an evolutionary algorithm that is capable of simultaneously evolving the structure and weights of a neural network in order to optimise its performance in a particular task. The proposed approach was evaluated over a number of very different novelty detection tasks. It was found that, in each task, the approach successfully evolved novelty detectors which outperformed a number of existing techniques from the literature. A number of drawbacks with the approach were also identified, and suggestions were given on ways in which these may potentially be overcome
Advances in Data Mining Knowledge Discovery and Applications
Advances in Data Mining Knowledge Discovery and Applications aims to help data miners, researchers, scholars, and PhD students who wish to apply data mining techniques. The primary contribution of this book is highlighting frontier fields and implementations of the knowledge discovery and data mining. It seems to be same things are repeated again. But in general, same approach and techniques may help us in different fields and expertise areas. This book presents knowledge discovery and data mining applications in two different sections. As known that, data mining covers areas of statistics, machine learning, data management and databases, pattern recognition, artificial intelligence, and other areas. In this book, most of the areas are covered with different data mining applications. The eighteen chapters have been classified in two parts: Knowledge Discovery and Data Mining Applications