362 research outputs found

    Efficient In-Database Maintenance of ARIMA Models

    Get PDF
    Forecasting is an important analysis task and there is a need of integrating time series models and estimation methods in database systems. The main issue is the computationally expensive maintenance of model parameters when new data is inserted. In this paper, we examine how an important class of time series models, the AutoRegressive Integrated Moving Average (ARIMA) models, can be maintained with respect to inserts. Therefore, we propose a novel approach, on-demand estimation, for the efficient maintenance of maximum likelihood estimates from numerically implemented estimators. We present an extensive experimental evaluation on both real and synthetic data, which shows that our approach yields a substantial speedup while sacrificing only a limited amount of predictive accuracy

    F2DB: The Flash-Forward Database System

    Get PDF
    Forecasts are important to decision-making and risk assessment in many domains. Since current database systems do not provide integrated support for forecasting, it is usually done outside the database system by specially trained experts using forecast models. However, integrating model-based forecasting as a first-class citizen inside a DBMS speeds up the forecasting process by avoiding exporting the data and by applying database-related optimizations like reusing created forecast models. It especially allows subsequent processing of forecast results inside the database. In this demo, we present our prototype F2DB based on PostgreSQL, which allows for transparent processing of forecast queries. Our system automatically takes care of model maintenance when the underlying dataset changes. In addition, we offer optimizations to save maintenance costs and increase accuracy by using derivation schemes for multidimensional data. Our approach reduces the required expert knowledge by enabling arbitrary users to apply forecasting in a declarative way

    Sample-Based Forecasting Exploiting Hierarchical Time Series

    Get PDF
    Time series forecasting is challenging as sophisticated forecast models are computationally expensive to build. Recent research has addressed the integration of forecasting inside a DBMS. One main benefit is that models can be created once and then repeatedly used to answer forecast queries. Often forecast queries are submitted on higher aggregation levels, e. g., forecasts of sales over all locations. To answer such a forecast query, we have two possibilities. First, we can aggregate all base time series (sales in Austria, sales in Belgium...) and create only one model for the aggregate time series. Second, we can create models for all base time series and aggregate the base forecast values. The second possibility might lead to a higher accuracy but it is usually too expensive due to a high number of base time series. However, we actually do not need all base models to achieve a high accuracy, a sample of base models is enough. With this approach, we still achieve a better accuracy than an aggregate model, very similar to using all models, but we need less models to create and maintain in the database. We further improve this approach if new actual values of the base time series arrive at different points in time. With each new actual value we can refine the aggregate forecast and eventually converge towards the real actual value. Our experimental evaluation using several real-world data sets, shows a high accuracy of our approaches and a fast convergence towards the optimal value with increasing sample sizes and increasing number of actual values respectively

    Indexing forecast models for matching and maintenance

    Get PDF
    Forecasts are important to decision-making and risk assessment in many domains. There has been recent interest in integrating forecast queries inside a DBMS. Answering a forecast query requires the creation of forecast models. Creating a forecast model is an expensive process and may require several scans over the base data as well as expensive operations to estimate model parameters. However, if forecast queries are issued repeatedly, answer times can be reduced significantly if forecast models are reused. Due to the possibly high number of forecast queries, existing models need to be found quickly. Therefore, we propose a model index that efficiently stores forecast models and allows for the efficient reuse of existing ones. Our experiments illustrate that the model index shows a negligible overhead for update transactions, but it yields significant improvements during query execution

    Hybrid Data-Flow Graphs for Procedural Domain-Specific Query Languages

    Get PDF
    Domain-specific query languages (DSQL) let users express custom business logic. Relational databases provide a limited set of options to execute business logic. Usually, stored procedures or a series of queries with some glue code. Both methods have drawbacks and often business logic is still executed on application side transferring large amounts of data between application and database, which is expensive. We translate a DSQL into a hybrid data-flow execution plan, containing relational operators mixed with procedural ones. A cost model is used to drive the translation towards an optimal mixture of relational and procedural plan operators

    Exploiting big data in time series forecasting: A cross-sectional approach

    Get PDF
    Forecasting time series data is an integral component for management, planning and decision making. Following the Big Data trend, large amounts of time series data are available from many heterogeneous data sources in more and more applications domains. The highly dynamic and often fluctuating character of these domains in combination with the logistic problems of collecting such data from a variety of sources, imposes new challenges to forecasting. Traditional approaches heavily rely on extensive and complete historical data to build time series models and are thus no longer applicable if time series are short or, even more important, intermittent. In addition, large numbers of time series have to be forecasted on different aggregation levels with preferably low latency, while forecast accuracy should remain high. This is almost impossible, when keeping the traditional focus on creating one forecast model for each individual time series. In this paper we tackle these challenges by presenting a novel forecasting approach called cross-sectional forecasting. This method is especially designed for Big Data sets with a multitude of time series. Our approach breaks with existing concepts by creating only one model for a whole set of time series and requiring only a fraction of the available data to provide accurate forecasts. By utilizing available data from all time series of a data set, missing values can be compensated and accurate forecasting results can be calculated quickly on arbitrary aggregation levels

    Global Slope Change Synopses for Measurement Maps

    Get PDF
    Quality control using scalar quality measures is standard practice in manufacturing. However, there are also quality measures that are determined at a large number of positions on a product, since the spatial distribution is important. We denote such a mapping of local coordinates on the product to values of a measure as a measurement map. In this paper, we examine how measurement maps can be clustered according to a novel notion of similarity - mapscape similarity - that considers the overall course of the measure on the map. We present a class of synopses called global slope change that uses the profile of the measure along several lines from a reference point to different points on the borders to represent a measurement map. We conduct an evaluation of global slope change using a real-world data set from manufacturing and demonstrate its superiority over other synopses

    DYNAMIC DISPERSION MODELLING OF ODOURS AND AEROSOLS

    Get PDF
    The transmission of dust particles is one of the interesting processes in the dispersion of aerosols. Due to the fact that it is impossible to follow the track of every single particle, a lot of effects and their parameters must be known to simulate the dispersion. At the Harmo 7 conference a dynamic model to simulate the dispersion of odours was presented, before. This model is based on a numerical solution of the Navier-Strokes equation. Building upon this effort the dispersion model was enhanced, so that it is now possible to simulate the dispersion of aerosol particles. Extensive modifications were necessary to consider the aerodynamic and physical characteristics of polydisperse aerosols. Effects as sedimentation, deposition, resuspension and agglomeration of aerosols are or will be integrated into the simulation model. In order to realize a validation of such a complex dispersion model, our research group is developing two independent aerosol tracer systems. Primary attention is paid to the environmental compatibility of the tracer dust. Both procedures are based on fluorescence marked particles, but they differ from each other with regard to their methods of detection. This enables us to practice both procedures at the same time. The dispersion model as well as the validation methods is objects of this paper

    Clustering Uncertain Data with Possible Worlds

    Get PDF
    The topic of managing uncertain data has been explored in many ways. Different methodologies for data storage and query processing have been proposed. As the availability of management systems grows, the research on analytics of uncertain data is gaining in importance. Similar to the challenges faced in the field of data management, algorithms for uncertain data mining also have a high performance degradation compared to their certain algorithms. To overcome the problem of performance degradation, the MCDB approach was developed for uncertain data management based on the possible world scenario. As this methodology shows significant performance and scalability enhancement, we adopt this method for the field of mining on uncertain data. In this paper, we introduce a clustering methodology for uncertain data and illustrate current issues with this approach within the field of clustering uncertain data

    Drift-Aware Ensemble Regression

    Get PDF
    Regression models are often required for controlling production processes by predicting parameter values. However, the implicit assumption of standard regression techniques that the data set used for parameter estimation comes from a stationary joint distribution may not hold in this context because manufacturing processes are subject to physical changes like wear and aging, denoted as process drift. This can cause the estimated model to deviate significantly from the current state of the modeled system. In this paper, we discuss the problem of estimating regression models from drifting processes and we present ensemble regression, an approach that maintains a set of regression models—estimated from different ranges of the data set—according to their predictive performance. We extensively evaluate our approach on synthetic and real-world data
    • …
    corecore