26 research outputs found

    Identification and clustering of seasonality patterns for demand forecasting

    Get PDF
    Time series are essential in various domains and applications. Especially in retail business forecasting demand is a crucial task in order to make the appropriate business decisions. In this thesis we focus on a problem that can be characterized as a sub-problem in the field of demand forecasting: we attempt to form clusters of products that reflect the products’ annual seasonality patterns. We believe that these clusters would aid us in building more accurate forecast models. The seasonality patterns are identified from weekly sales time series, which in many cases are very sparse and noisy. In order to successfully identify the seasonality patterns from all the other factors contributing in a product’s sales, we build a pipeline to preprocess the data accordingly. This pipeline consist of first aggregating the sales of individual products over several stores to strengthen the sales signal, followed by solving a regularized weighted least squares objective to smooth the aggregates. Finally, the seasonality patterns are extracted using the STL decomposition procedure. These seasonality patterns are then used as input for the k-means algorithm and several hierarchical agglomerative clustering algorithms. We evaluate the clusters using two distinct approaches. In the first approach we manually label a subset of the data. These labeled subsets are then compared against the clusters provided by the clustering algorithms. In the second approach we form a simple forecast model that fits the clusters’ seasonality patterns back to the observed sales time series of individual products. In this approach we also build a secondary validation forecast model with the same objective, but instead of using the clusters provided by the algorithms, we use predetermined product categories as the clusters. These product categories should naturally provide a valid baseline for groups of products with similar seasonality as they reflect the structure of how similar products are organized within close proximity in physical stores. Our results indicate that we were able to find clear seasonal structure in the clusters. Especially the k-means algorithm and hierarchical agglomerative clustering algorithms with complete linkage and Ward’s method were able to form reasonable clusters, whereas hierarchical agglomerative clustering algorithm with single linkage was proven to be unsuitable given our data

    Wide-Area Measurement-Based Applications for Power System Monitoring and Dynamic Modeling

    Get PDF
    Due to the increasingly complex behavior exhibited by large-scale power systems with more uncertain renewables introduced to the grid, wide-area measurement system (WAMS) has been utilized to complement the traditional supervisory control and data acquisition (SCADA) system to improve operators’ situational awareness. By providing wide-area GPS-time-synchronized measurements of grid status at high time-resolution, it is able to reveal power system dynamics which cannot be captured before and has become an essential tool to deal with current and future power grid challenges. According to the time requirements of different power system applications, the applications can be roughly divided into online applications (e.g., data visualization, fast disturbance and oscillation detection, and system response prediction and reduction) and offline applications (e.g., measurement-driven dynamic modeling and validation, post-event analysis, and statistical analysis of historical data). In this dissertation, various wide-area measurement-based applications are presented. Firstly a pioneering WAMS deployed at the distribution level, the frequency monitoring network (FNET/GridEye) is introduced. For conventional large-scale power grid dynamic simulation, two major challenges are 1) accuracy of detailed dynamic models, and 2) computation burden for online dynamic assessment. To overcome the restrictions of the traditional approach, a measurement-based system response prediction tool using a Multivariate AutoRegressive (MAR) model is developed. It is followed by a measurement-based power system dynamic reduction tool using an autoregressive model vi to represent the external system. In addition, phasor measurement unit (PMU) data are employed to perform the generator dynamic model validation study. It utilizes both simulation data and measurement data to explore the potentials and limitations of the proposed approach. As an innovative application of using wide-area power system measurement, digital recordings could be authenticated by comparing the extracted frequency and phase angle from recordings with power system measurement database. It includes four research studies, i.e., oscillator error removal, ENF phenomenology, tampering detection, and frequency localization. Finally, several preliminary data analytics studies including inertia estimation and analysis, fault-induced delayed voltage recovery (FIDVR) detection, and statistical analysis of oscillation database, are presented

    Feature-based Time Series Analytics

    Get PDF
    Time series analytics is a fundamental prerequisite for decision-making as well as automation and occurs in several applications such as energy load control, weather research, and consumer behavior analysis. It encompasses time series engineering, i.e., the representation of time series exhibiting important characteristics, and data mining, i.e., the application of the representation to a specific task. Due to the exhaustive data gathering, which results from the ``Industry 4.0'' vision and its shift towards automation and digitalization, time series analytics is undergoing a revolution. Big datasets with very long time series are gathered, which is challenging for engineering techniques. Traditionally, one focus has been on raw-data-based or shape-based engineering. They assess the time series' similarity in shape, which is only suitable for short time series. Another focus has been on model-based engineering. It assesses the time series' similarity in structure, which is suitable for long time series but requires larger models or a time-consuming modeling. Feature-based engineering tackles these challenges by efficiently representing time series and comparing their similarity in structure. However, current feature-based techniques are unsatisfactory as they are designed for specific data-mining tasks. In this work, we introduce a novel feature-based engineering technique. It efficiently provides a short representation of time series, focusing on their structural similarity. Based on a design rationale, we derive important time series characteristics such as the long-term and cyclically repeated characteristics as well as distribution and correlation characteristics. Moreover, we define a feature-based distance measure for their comparison. Both the representation technique and the distance measure provide desirable properties regarding storage and runtime. Subsequently, we introduce techniques based on our feature-based engineering and apply them to important data-mining tasks such as time series generation, time series matching, time series classification, and time series clustering. First, our feature-based generation technique outperforms state-of-the-art techniques regarding the accuracy of evolved datasets. Second, with our features, a matching method retrieves a match for a time series query much faster than with current representations. Third, our features provide discriminative characteristics to classify datasets as accurately as state-of-the-art techniques, but orders of magnitude faster. Finally, our features recommend an appropriate clustering of time series which is crucial for subsequent data-mining tasks. All these techniques are assessed on datasets from the energy, weather, and economic domains, and thus, demonstrate the applicability to real-world use cases. The findings demonstrate the versatility of our feature-based engineering and suggest several courses of action in order to design and improve analytical systems for the paradigm shift of Industry 4.0

    PROGNOSIS - Historical Pattern Matching for Economic Forecasting and Trading

    Get PDF
    In recent years financial markets have become complex environments that continuously change and they change quickly. The strong link between the continuous change in the markets and the danger of losing money when trading in them, has made financial studies a domain that concentrates increasing scientific and business attention. In this context, the development of computational techniques that can monitor recent financial events can process them according to their similarity with historical data recordings, and can support financial decision making, is a challenging problem. In this work, the principal idea for tackling this problem is the integration of 'current' market information as derived from the market's recent past and historical information. A robust technique which is based on flexible pattern matching, segmented data representations, time warping, and time series embedding dimension measures is proposed. Complementary time series derived features, concerning trend structures, temporal considerations and statistical measures are systematically combined in this technique. All these components have been integrated into a software package, which I called PROGNOSIS, that can selectively monitor its application and allows systematic evaluation in terms of financial forecasting and trading performance. In addition, two other topics are discussed in this thesis. Firstly, in chapter 3, a neural network, that is known as the Growing Neural Gas network, is employed for financial forecasting and trading. To my knowledge, this network has never been applied before to financial problems. Based on this a neural network forecasting and trading benchmark was constructed for comparison purposes. Secondly, a novel method of approaching the well established co-integraton theory is proposed in the last chapter of the thesis. This method enhances the co-integration theory by integrating into it local time relations between two time series. These local time dependencies are identified using dynamic time warping. The hypothesis that is tested is that local time shifts, delays, shrinks or stretches, if identified, may help to reveal co-integrating movement between the two time series. I called this type of co-integration time-warped co-integration. To this end, the time-warped co-integration framework is presented as an error correction model and it is tested on arbitrage trading opportunities within PROGNOSIS

    Multivariate Correlation Discovery in Streaming Data

    Get PDF

    Integrated data-driven techniques for environmental pollution monitoring

    Get PDF
    The adverse health e_x000B_ffects of tropospheric ozone around urban zones indicate a substantial risk for many segments of the population. This necessitates the short term forecast in order to take evasive action on days conducive to ozone formation. Therefore it is important to study the ozone formation mechanisms and predict the ozone levels in a geographic region. Multivariate statistical techniques provide a very e_x000B_ffective framework for the classifi_x000C_cation and monitoring of systems with multiple variables. Cluster analysis, sequence analysis and hidden Markov models (HMMs) are statistical methods which have been used in a wide range of studies to model the data structure. In this dissertation, we propose to formulate, implement and apply a data-driven computational framework for air quality monitoring and forecasting with application to ozone formation. The proposed framework integrates, in a unique way, advanced statistical data processing and analysis tools to investigate ozone formation mechanisms and predict the ozone levels in a geographic region. This dissertation focuses on cluster analysis for identi_x000C_fication and classi_x000C_fication of underlying mechanisms of a system and HMMs for predicting the occurrence of an extreme event in a system. The usefulness of the proposed methodology in air quality monitoring is demonstrated by applying it to study the ozone problem in Houston, Texas and Baton Rouge, Louisiana regions. Hierarchical clustering is used to visualize air flow patterns at two time scales relevant for ozone buildup. First, clustering is performed at the hourly time scale to identify surface flow patterns. Then, sequencing is performed at the daily time scale to identify groups of days sharing similar diurnal cycles for the surface flow. Selection of appropriate numbers of air flow patterns allowed inference of regional transport and dispersion patterns for understanding population exposure to ozone. This dissertation proposes to build HMMs for ozone prediction using air quality and meteorological measurements obtained from a network of surface monitors. The case study of the Houston, Texas region for the 2004 and 2005 ozone seasons showed that the results indicate the capability of HMMs as a simpler forecasting tool
    corecore