9 research outputs found

    Generalised Interpretable Shapelets for Irregular Time Series

    Get PDF
    The shapelet transform is a form of feature extraction for time series, in which a time series is described by its similarity to each of a collection of `shapelets'. However it has previously suffered from a number of limitations, such as being limited to regularly-spaced fully-observed time series, and having to choose between efficient training and interpretability. Here, we extend the method to continuous time, and in doing so handle the general case of irregularly-sampled partially-observed multivariate time series. Furthermore, we show that a simple regularisation penalty may be used to train efficiently without sacrificing interpretability. The continuous-time formulation additionally allows for learning the length of each shapelet (previously a discrete object) in a differentiable manner. Finally, we demonstrate that the measure of similarity between time series may be generalised to a learnt pseudometric. We validate our method by demonstrating its performance and interpretability on several datasets; for example we discover (purely from data) that the digits 5 and 6 may be distinguished by the chirality of their bottom loop, and that a kind of spectral gap exists in spoken audio classification

    GENDIS : genetic discovery of shapelets

    Get PDF
    In the time series classification domain, shapelets are subsequences that are discriminative of a certain class. It has been shown that classifiers are able to achieve state-of-the-art results by taking the distances from the input time series to different discriminative shapelets as the input. Additionally, these shapelets can be visualized and thus possess an interpretable characteristic, making them appealing in critical domains, where longitudinal data are ubiquitous. In this study, a new paradigm for shapelet discovery is proposed, which is based on evolutionary computation. The advantages of the proposed approach are that: (i) it is gradient-free, which could allow escaping from local optima more easily and supports non-differentiable objectives; (ii) no brute-force search is required, making the algorithm scalable; (iii) the total amount of shapelets and the length of each of these shapelets are evolved jointly with the shapelets themselves, alleviating the need to specify this beforehand; (iv) entire sets are evaluated at once as opposed to single shapelets, which results in smaller final sets with fewer similar shapelets that result in similar predictive performances; and (v) the discovered shapelets do not need to be a subsequence of the input time series. We present the results of the experiments, which validate the enumerated advantages

    LSTM Models to Support the Selective Antibiotic Treatment Strategy of Dairy Cows in the Dry Period

    Get PDF
    Dissertation presented as the partial requirement for obtaining a Master's degree in Data Science and Advanced Analytics, specialization in Data ScienceUdder inflammation, known as mastitis, is the most significant disease of dairy cows worldwide, invoking substantial economic losses. The current common strategy to reduce this problem is the prophylactic administration of antibiotics treatment of cows during their dry period. Paradoxically, the indiscriminate use of antibiotics in animals and humans has been the leading cause of antimicrobial resistance, a concern in several public health organizations. In light of these assumptions, at the beginning of 2022, the European Union made it illegal to routinely administer antibiotics on farms, with Regulation 2019/6 of 11 December 2018. Considering this new scenario, the objective of this study was to produce a model that supports the decisions of veterinarians when administering antibiotics in the dry period of dairy cows. Deep learning models were used, namely LSTM layers that operate with dynamic features from milk recordings and a dense layer that uses static features. Two approaches were chosen to deal with this problem. The first is based on a binary classification model that considers the occurrence of mastitis within 60 days after calving. The second approach was a multiclass classification model based on veterinary expert judgment. In each approach, three models were implemented, a Vanilla LSTM, a Stacked LSTM, and a Stacked LSTM with a dense layer working in parallel. The best performances from binary and multiclass approaches were 65% and 84% accuracy, respectively. It was possible to conclude that the models of the multiclass classification approach had better performance than the other classification. The capture of long- and short-term dependencies in the LSTM models, especially with the combination of static features, obtained promising results, which will undoubtedly contribute to producing a machine learning system with a prompt and affordable response, allowing for a reduction in the administration of antibiotics in dairy cows to the strictly necessary

    Modelagem simbólica de padrões morfológicos para classificação de séries temporais

    Get PDF
    Orientador : Prof. Dr. Fabiano SilvaTese (Doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 14/09/2015Inclui referências : f. 149-167Resumo: O contínuo armazenamento de dados ao longo do tempo, tais como séries temporais, tem motivado o desenvolvimento de novas abordagens baseadas em métodos de mineração de dados. Nesse cenário, uma nova área de pesquisa emergiu durante as últimas duas décadas, a mineração de dados em séries temporais. Mais especificamente, as abordagens baseadas em técnicas de aprendizado de máquina têm apresentado maior interesse entre os pesquisadores. Dentre as tarefas de mineração de dados, a classificação de séries temporais tem sido amplamente explorada, de modo que estudos recentes, utilizando algoritmos de aprendizado não simbólicos, têm reportado resultados significativos, em termos da acurácia de classificação. No entanto, em aplicações que envolvem processos de auxílio à tomada de decisão, tais como diagnóstico médico, controle de produção industrial, sistemas de monitoração de segurança em aeronaves ou usinas de energia elétrica, é necessário possibilitar o entendimento do raciocínio utilizado no processo de classificação. A primitiva shapelet foi proposta na literatura como um descritor de características morfológicas locais para possibilitar melhor compreensão dos conceitos, devido a sua maior proximidade com a percepção humana na identificação de padrões em séries temporais. Contudo, a maioria dos trabalhos relacionados ao estudo dessa primitiva tem se dedicado ao desenvolvimento de abordagens mais eficientes em termos de tempo e de acurácia, desconsiderando a necessidade da inteligibilidade dos classificadores. Nesse contexto, neste trabalho foi proposto um método que utiliza a transformada shapelet para a construção de modelos simbólicos de classificação por meio de uma abordagem híbrida que combina a representação de árvore de decisão com o algoritmo vizinho mais próximo. Também, foram desenvolvidas estratégias para melhorar a qualidade de representação da transformada shapelet na utilização de classificadores simbólicos, como árvores de decisão. Para avaliar o desempenho dessas propostas, foi conduzida uma avaliação experimental que envolveu a comparação com os algoritmos considerados estado da arte usando conjuntos de dados amplamente estudados na literatura de classificação de séries temporais. Com base nos resultados e análises realizadas nesta tese, foi possível verificar que a melhoria do processo de identificação de shapelets possibilita a construção de classificadores inteligíveis e competitivos; e que métodos híbridos podem contribuir para prover uma representação simbólica dos modelos, com desempenho equivalente ou até mesmo superior aos métodos não simbólicos. Palavras-chave: mineração de dados. aprendizado de máquina. séries temporais. classificação. modelos simbólicos.Abstract: The large amount of stored data over time, such as time series, has motivated the development of new approaches based on data mining methods. In this context, a new research area has emerged over the last two decades, the time series data mining. In particular, the approaches based on machine learning techniques have shown large interest among researchers. Among the data mining tasks, the time series classification has been widely exploited. Recent studies using non-symbolic learning algorithms have reported significant results in terms of classification accuracy. However, in applications related to decision making process, such as medical diagnosis, industrial production control, security monitoring systems in aircraft and in power plants, it is necessary allow the understanding of the reasoning used in the classification process. To take this into account, the shapelet primitive has been proposed in the literature as a descriptor of local morphological characteristics, which is closer to human perception for patterns identification in time series. On the other hand, most of the existing work related to shapelets has been dedicated to the development of more effective approaches in terms of time and accuracy, disregarding the need for interpretability of the classifiers. In this work, we propose to build symbolic models for time series classification using the shapelet transformation. This method is based on a hybrid approach that merges the decision tree representation and the nearest neighbor algorithm. Also, we developed strategies to improve the representation quality of the shapelet transformation using feature selection algorithms. We performed an experimental evaluation to analyze the performance of our proposals in comparison to the algorithms considered state of the art using datasets widely studied in the literature of time series classification. Based on the results and analysis carried out in this thesis, we found that the improvement of shapelet representation allows the construction of interpretable and competitive classifiers. Moreover, we found that the hybrid methods can help to provide symbolic models with equivalent or even superior performance to non-symbolic methods. Keywords: data mining. machine learning. time series. classification. symbolic models
    corecore