138,534 research outputs found

    SE-shapelets: Semi-supervised Clustering of Time Series Using Representative Shapelets

    Full text link
    Shapelets that discriminate time series using local features (subsequences) are promising for time series clustering. Existing time series clustering methods may fail to capture representative shapelets because they discover shapelets from a large pool of uninformative subsequences, and thus result in low clustering accuracy. This paper proposes a Semi-supervised Clustering of Time Series Using Representative Shapelets (SE-Shapelets) method, which utilizes a small number of labeled and propagated pseudo-labeled time series to help discover representative shapelets, thereby improving the clustering accuracy. In SE-Shapelets, we propose two techniques to discover representative shapelets for the effective clustering of time series. 1) A \textit{salient subsequence chain} (SSCSSC) that can extract salient subsequences (as candidate shapelets) of a labeled/pseudo-labeled time series, which helps remove massive uninformative subsequences from the pool. 2) A \textit{linear discriminant selection} (LDSLDS) algorithm to identify shapelets that can capture representative local features of time series in different classes, for convenient clustering. Experiments on UCR time series datasets demonstrate that SE-shapelets discovers representative shapelets and achieves higher clustering accuracy than counterpart semi-supervised time series clustering methods

    Big Data Analysis for PV Applications

    Get PDF
    With increasing photovoltaic (PV) installations, large amounts of time series data from utility-scale PV systems such as meteorological data and string level measurements are collected [1, 2]. Due to fluctuations in irradiance and temperature, PV data is highly stochastic. Spatio-temporal differences with potential time-lagged correlation are also exhibited, due to the wind directions affecting cloud movements [3]. Coupling these variations with different types of PV systems in terms of power output and wiring configuration, as well as localised PV effects like partial shading and module mismatches, lengthy time series data from solar systems are highly multi-dimensional and challenging to process. In addition, these raw datasets can rarely be used directly due to the possibly high noise and irrelevant information embedded in them. Moreover, it is challenging to operate directly on the raw datasets, especially when it comes to visualizing and analyzing these data. On this point, the Pareto principle, or better-known as the 80/20 rule, commonly applies: researchers and solar engineers often spend most of their time collecting, cleaning, filtering, reducing and formatting the data. In this work, a data analytics algorithm is applied to mitigate some of the complexities and make sense of the large time series data in PV systems. Each time series is treated as an individual entity which can be characterized by a set of generic or application-specific features. This reduces the dimension of the data, i.e., from hundreds of samples in a time series to a few descriptive features. It is is also easier to visualize big time series data in the feature space, as compared to the traditional time series visualization methods, such as the spaghetti plot and horizon plot, which are informative but not very scalable. The time series data is processed to extract features through clustering and identify correspondence between specific measurements and geographical location of the PV systems. This characterisation of the time series data can be used for several PV applications, namely, (1) PV fault identification, (2) PV network design and (3) PV type pre-design for PV installation in locations with different geographical attributes

    Spatial Clustering Algorithm for Time Series Rainfall Data Using X-Means Data Splitting

    Get PDF
    The aim of this study is to present a new spatial clustering process for time series data. It has become an important and demanding application when the data involves chronological long time series and huge datasets. A great challenge in clustering is to achieve an optimal solution in searching similarity along the series.Furthermore, it also involves a very large-scale data analysis. Unfortunately, the existing clustering time series algorithms have become impractical since data do not scale properly for longer time series. The performance of the clustering algorithm gets even worse if it relies on actual data and many clustering algorithms are often faced with conflict in handling high dimensional data. In the case of spatial time series, the problem can be solved by unsupervised approaches rather than supervised classification, with appropriate preprocessing techniques to transform the actual data. The unsupervised solution using time series clustering algorithms is capable to extract valuable information and identify structure in complex and massive datasets as spatial time series. Therefore, a clustering algorithm by introducing data transformation using X-means data splitting is proposed to investigate the spatial homogeneity of time series rainfall data. The hierarchical clustering was used to demonstrate the similarity once the data was divided into training and testing sets. The proposed algorithm is compared with five types of data transformation techniques, namely mean and median in monthly data and the rest is in daily data such as binary, cumulative and actual values.Results indicate that data transformation using X-means data splitting in hierarchical clustering outperformed other transformation techniques and more consistent between training and testing datasets based on similarity measures

    Investigation Of Multi-Criteria Clustering Techniques For Smart Grid Datasets

    Get PDF
    The processing of data arising from connected smart grid technology is an important area of research for the next generation power system. The volume of data allows for increased awareness and efficiency of operation but poses challenges for analyzing the data and turning it into meaningful information. This thesis showcases the utility of clustering algorithms applied to three separate smart-grid data sets and analyzes their ability to improve awareness and operational efficiency. Hierarchical clustering for anomaly detection in phasor measurement unit (PMU) datasets is identified as an appropriate method for fault and anomaly detection. It showed an increase in anomaly detection efficiency according to Dunn Index (DI) and improved computational considerations compared to currently employed techniques such as Density Based Spatial Clustering of Applications with Noise (DBSCAN). The efficacy of betweenness-centrality (BC) based clustering in a novel clustering scheme for the determination of microgrids from large scale bus systems is demonstrated and compared against a multitude of other graph clustering algorithms. The BC based clustering showed an overall decrease in economic dispatch cost when compared to other methods of graph clustering. Additionally, the utility of BC for identification of critical buses was showcased. Finally, this work demonstrates the utility of partitional dynamic time warping (DTW) and k-shape clustering methods for classifying power demand profiles of households with and without electric vehicles (EVs). The utility of DTW time-series clustering was compared against other methods of time-series clustering and tested based upon demand forecasting using traditional and deep-learning techniques. Additionally, a novel process for selecting an optimal time-series clustering scheme based upon a scaled sum of cluster validity indices (CVIs) was developed. Forecasting schemes based on DTW and k-shape demand profiles showed an overall increase in forecast accuracy. In summary, the use of clustering methods for three distinct types of smart grid datasets is demonstrated. The use of clustering algorithms as a means of processing data can lead to overall methods that improve forecasting, economic dispatch, event detection, and overall system operation. Ultimately, the techniques demonstrated in this thesis give analytical insights and foster data-driven management and automation for smart grid power systems of the future

    TCGAN: Convolutional Generative Adversarial Network for Time Series Classification and Clustering

    Full text link
    Recent works have demonstrated the superiority of supervised Convolutional Neural Networks (CNNs) in learning hierarchical representations from time series data for successful classification. These methods require sufficiently large labeled data for stable learning, however acquiring high-quality labeled time series data can be costly and potentially infeasible. Generative Adversarial Networks (GANs) have achieved great success in enhancing unsupervised and semi-supervised learning. Nonetheless, to our best knowledge, it remains unclear how effectively GANs can serve as a general-purpose solution to learn representations for time series recognition, i.e., classification and clustering. The above considerations inspire us to introduce a Time-series Convolutional GAN (TCGAN). TCGAN learns by playing an adversarial game between two one-dimensional CNNs (i.e., a generator and a discriminator) in the absence of label information. Parts of the trained TCGAN are then reused to construct a representation encoder to empower linear recognition methods. We conducted comprehensive experiments on synthetic and real-world datasets. The results demonstrate that TCGAN is faster and more accurate than existing time-series GANs. The learned representations enable simple classification and clustering methods to achieve superior and stable performance. Furthermore, TCGAN retains high efficacy in scenarios with few-labeled and imbalanced-labeled data. Our work provides a promising path to effectively utilize abundant unlabeled time series data

    Sensor Relationship Inference in Single Resident Smart Homes Using Time Series

    Get PDF
    Determining sensor relationships in smart environments is complex due to the variety and volume of time series information they provide. Moreover, identifying sensor relationships to connect them with actuators is difficult for smart home users who may not have technical experience. Yet, gathering information on sensor relationships is a crucial intermediate step towards more advanced smart home applications such as advanced policy generation or automatic sensor configuration. Therefore, in this thesis, I propose a novel unsupervised learning approach, named SeReIn, to automatically group sensors by their inherent relationships solely using time series data for single resident smart homes. SeReIn extracts three features from smart home time series data - Frequent Next Event (FNE), Time Delta (TD), and Frequency (FQ). It then applies Spectral Clustering, K-Means clustering, and DBSCAN to group the related sensors. The application of unsupervised learning enables this approach to operate anywhere in the smart home domain regardless of the sensor types and deployment scenarios. SeReIn functions on both large deployments consisting of around 70 sensors and small deployments of only 10 sensors. Evaluation of SeReIn on real-world smart home datasets has shown that it can recognize inherent spatial relationships. Using three different unsupervised clustering evaluation metrics: Calinski-Harabasz Score, Silhouette Score, and Davies-Bouldin Score, I ensure that SeReIn successfully builds clusters based on sensor relationships
    • …
    corecore