7 research outputs found

    Clustering and Forecasting of Covid-19 Data in Indonesia

    Get PDF
    Indonesia reported its first case of Covid-19 in March 2020, which was suspected to have been infected by a foreigner who visited Indonesia. The distribution of cases that occurred in Indonesia has an uneven frequency considering that Indonesia is an archipelagic country, in the analysis of Covid-19 cases in Indonesia, there are many provinces and some have the same pattern of case characteristics. time series so that forecasting analysis can be used. So that clustering analysis and forecasting of Covid-19 data can be used in Indonesia. The analysis was carried out with 2 stages of analysis, namely clusters using the clustering hierarchy method and forecasting using the ARIMA method. By using 288 data from January 1, 2021 – October 15, 2021, the results show that the daily Covid-19 cases by province in Indonesia can be grouped into 2 clusters, in the forecasting analysis only one province is taken from each cluster used in determining the model, cluster 1 used data from the province of Banten and cluster 2 used data from the province of West Java. By using R software, a model for each cluster is obtained, namely ARIMA(0,1,1) for cluster 1 and ARIMA(2,1,2) for cluster 2. From the forecasting results obtained data until October 30, 2021 shows the number of cases tends to be constant

    Valve Health Identification Using Sensors and Machine Learning Methods

    Get PDF
    Predictive maintenance models attempt to identify developing issues with industrial equipment before they become critical. In this paper, we describe both supervised and unsupervised approaches to predictive maintenance for subsea valves in the oil and gas industry. The supervised approach is appropriate for valves for which a long history of operation along with manual assessments of the state of the valves exists, while the unsupervised approach is suitable to address the cold start problem when new valves, for which we do not have an operational history, come online. For the supervised prediction problem, we attempt to distinguish between healthy and unhealthy valve actuators using sensor data measuring hydraulic pressures and flows during valve opening and closing events. Unlike previous approaches that solely rely on raw sensor data, we derive frequency and time domain features, and experiment with a range of classification algorithms and different feature subsets. The performing models for the supervised approach were discovered to be Adaboost and Random Forest ensembles. In the unsupervised approach, the goal is to detect sudden abrupt changes in valve behaviour by comparing the sensor readings from consecutive opening or closing events. Our novel methodology doing this essentially works by comparing the sequences of sensor readings captured during these events using both raw sensor readings, as well as normalised and first derivative versions of the sequences. We evaluate the effectiveness of a number of well-known time series similarity measures and find that using discrete Frechet distance or dynamic time warping leads to the best results, with the Bray-Curtis similarity measure leading to only marginally poorer change detection but requiring considerably less computational effort

    GestureMeter: Evaluating Gesture Password Selection on Smartphones with Strength Meter

    Get PDF
    Department of Human Factors EngineeringGestures are potential authentication method for touchscreen devices and common tasks such as phone lock. While many studies have indicated gesture passwords can achieve high usability, evaluating their security remains a grey area. Key challenges stem from the small sample sizes in current gesture password studies and the requirement to use similarity-based recognition metrics which prevent the application of traditional entropy assessment methods. To overcome these problems, we perform a large-scale study online (N=2594). With the resulting data set, we develop a novel multi-stage discretization method and n-gram Markov models that enable us to assess the partial guessing entropy of gesture passwords and to create a novel clustering-based dictionary attack. We report then while partial guessing entropy appears to be greater than other common phone lock methods (e.g., Pin, pattern), gestures are highly susceptible to dictionary attack. To improve the security of gesture passwords, we develop a novel gesture password strength meter. Password strength meters has been previously proposed as an effective password policy that can improve the security of other authentication techniques such as passwords or pattern. Using the meter, we propose various mandated compliances in which users are restricted to meet certain level of strength: default (none), weak, fair, and strong. We validate the effectiveness of gesture strength meter designs on security by performing a follow up online study and applying the security framework and attacks established in the first study. The default policy improves the gesture password security with small cost in usability. This thesis concludes that gesture password meters can be an effective technique for improving the security of gesture authentication systems that deserve further study.clos

    Data Analytics for Automated Near Real Time Detection of Blockages in Smart Wastewater Systems

    Get PDF
    Blockage events account for a substantial portion of the reported failures in the wastewater network, causing flooding, loss of service, environmental pollution and significant clean-up costs. Increasing telemetry in Combined Sewer Overflows (CSOs) provides the opportunity for near real-time data-driven modelling of the sewer network. The research work presented in this thesis describes the development and testing of a novel system, designed for the automatic detection of blockages and other unusual events in near real-time. The methodology utilises an Evolutionary Artificial Neural Network (EANN) model for short term CSO level predictions and Statistical Process Control (SPC) techniques to analyse unusual CSO level behaviour. The system is designed to mimic the work of a trained, experience human technician in determining if a blockage event has occurred. The detection system has been applied to real blockage events from a UK wastewater network. The results obtained illustrate that the methodology can identify different types of blockage events in a reliable and timely manner, and with a low number of false alarms. In addition, a model has been developed for the prediction of water levels in a CSO chamber and the generation of alerts for upcoming spill events. The model consists of a bi-model committee evolutionary artificial neural network (CEANN), composed of two EANN models optimised for wet and dry weather, respectively. The models are combined using a non-linear weighted averaging approach to overcome bias arising from imbalanced data. Both methodologies are designed to be generic and self-learning, thus they can be applied to any CSO location, without requiring input from a human operator. It is envisioned that the technology will allow utilities to respond proactively to developing blockages events, thus reducing potential harm to the sewer network and the surrounding environment

    Uloga mera sličnosti u analizi vremenskih serija

    Get PDF
    The subject of this dissertation encompasses a comprehensive overview and analysis of the impact of Sakoe-Chiba global constraint on the most commonly used elastic similarity measures in the field of time-series data mining with a focus on classification accuracy. The choice of similarity measure is one of the most significant aspects of time-series analysis  -  it should correctly reflect the resemblance between the data presented in the form of time series. Similarity measures represent a critical component of many tasks of mining time series, including: classification, clustering, prediction, anomaly detection, and others. The research covered by this dissertation is oriented on several issues: 1.  review of the effects of  global constraints on the performance of computing similarity measures, 2.  a detailed analysis of the influence of constraining the elastic similarity measures on the accuracy of classical classification techniques, 3.  an extensive study of the impact of different weighting schemes on the classification of time series, 4.  development of an open source library that integrates the main techniques and methods required for analysis and mining time series, and which is used for the realization of these experimentsPredmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlјa kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uklјučujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detalјna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata.Predmet istraživanja ove disertacije obuhvata detaljan pregled i analizu uticaja Sakoe-Chiba globalnog ograničenja na najčešće korišćene elastične mere sličnosti u oblasti data mining-a vremenskih serija sa naglaskom na tačnost klasifikacije. Izbor mere sličnosti jedan je od najvažnijih aspekata analize vremenskih serija  -  ona treba  verno reflektovati sličnost između podataka prikazanih u obliku vremenskih serija.  Mera sličnosti predstavlja kritičnu komponentu mnogih zadataka  mining-a vremenskih serija, uključujući klasifikaciju, grupisanje (eng.  clustering), predviđanje, otkrivanje anomalija i drugih. Istraživanje obuhvaćeno ovom disertacijom usmereno je na nekoliko pravaca: 1.  pregled efekata globalnih ograničenja na performanse računanja mera sličnosti, 2.  detaljna analiza posledice ograničenja elastičnih mera sličnosti na tačnost klasifikacije klasičnih tehnika klasifikacije, 3.  opsežna studija uticaj različitih načina računanja težina (eng. weighting scheme) na klasifikaciju vremenskih serija, 4.  razvoj biblioteke otvorenog koda (Framework for Analysis and Prediction  -  FAP) koja će integrisati glavne tehnike i metode potrebne za analizu i mining  vremenskih serija i koja je korišćena za realizaciju ovih eksperimenata

    Modelagem simbólica de padrões morfológicos para classificação de séries temporais

    Get PDF
    Orientador : Prof. Dr. Fabiano SilvaTese (Doutorado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 14/09/2015Inclui referências : f. 149-167Resumo: O contínuo armazenamento de dados ao longo do tempo, tais como séries temporais, tem motivado o desenvolvimento de novas abordagens baseadas em métodos de mineração de dados. Nesse cenário, uma nova área de pesquisa emergiu durante as últimas duas décadas, a mineração de dados em séries temporais. Mais especificamente, as abordagens baseadas em técnicas de aprendizado de máquina têm apresentado maior interesse entre os pesquisadores. Dentre as tarefas de mineração de dados, a classificação de séries temporais tem sido amplamente explorada, de modo que estudos recentes, utilizando algoritmos de aprendizado não simbólicos, têm reportado resultados significativos, em termos da acurácia de classificação. No entanto, em aplicações que envolvem processos de auxílio à tomada de decisão, tais como diagnóstico médico, controle de produção industrial, sistemas de monitoração de segurança em aeronaves ou usinas de energia elétrica, é necessário possibilitar o entendimento do raciocínio utilizado no processo de classificação. A primitiva shapelet foi proposta na literatura como um descritor de características morfológicas locais para possibilitar melhor compreensão dos conceitos, devido a sua maior proximidade com a percepção humana na identificação de padrões em séries temporais. Contudo, a maioria dos trabalhos relacionados ao estudo dessa primitiva tem se dedicado ao desenvolvimento de abordagens mais eficientes em termos de tempo e de acurácia, desconsiderando a necessidade da inteligibilidade dos classificadores. Nesse contexto, neste trabalho foi proposto um método que utiliza a transformada shapelet para a construção de modelos simbólicos de classificação por meio de uma abordagem híbrida que combina a representação de árvore de decisão com o algoritmo vizinho mais próximo. Também, foram desenvolvidas estratégias para melhorar a qualidade de representação da transformada shapelet na utilização de classificadores simbólicos, como árvores de decisão. Para avaliar o desempenho dessas propostas, foi conduzida uma avaliação experimental que envolveu a comparação com os algoritmos considerados estado da arte usando conjuntos de dados amplamente estudados na literatura de classificação de séries temporais. Com base nos resultados e análises realizadas nesta tese, foi possível verificar que a melhoria do processo de identificação de shapelets possibilita a construção de classificadores inteligíveis e competitivos; e que métodos híbridos podem contribuir para prover uma representação simbólica dos modelos, com desempenho equivalente ou até mesmo superior aos métodos não simbólicos. Palavras-chave: mineração de dados. aprendizado de máquina. séries temporais. classificação. modelos simbólicos.Abstract: The large amount of stored data over time, such as time series, has motivated the development of new approaches based on data mining methods. In this context, a new research area has emerged over the last two decades, the time series data mining. In particular, the approaches based on machine learning techniques have shown large interest among researchers. Among the data mining tasks, the time series classification has been widely exploited. Recent studies using non-symbolic learning algorithms have reported significant results in terms of classification accuracy. However, in applications related to decision making process, such as medical diagnosis, industrial production control, security monitoring systems in aircraft and in power plants, it is necessary allow the understanding of the reasoning used in the classification process. To take this into account, the shapelet primitive has been proposed in the literature as a descriptor of local morphological characteristics, which is closer to human perception for patterns identification in time series. On the other hand, most of the existing work related to shapelets has been dedicated to the development of more effective approaches in terms of time and accuracy, disregarding the need for interpretability of the classifiers. In this work, we propose to build symbolic models for time series classification using the shapelet transformation. This method is based on a hybrid approach that merges the decision tree representation and the nearest neighbor algorithm. Also, we developed strategies to improve the representation quality of the shapelet transformation using feature selection algorithms. We performed an experimental evaluation to analyze the performance of our proposals in comparison to the algorithms considered state of the art using datasets widely studied in the literature of time series classification. Based on the results and analysis carried out in this thesis, we found that the improvement of shapelet representation allows the construction of interpretable and competitive classifiers. Moreover, we found that the hybrid methods can help to provide symbolic models with equivalent or even superior performance to non-symbolic methods. Keywords: data mining. machine learning. time series. classification. symbolic models
    corecore