424 research outputs found

    Efficient In-Database Maintenance of ARIMA Models

    Get PDF
    Forecasting is an important analysis task and there is a need of integrating time series models and estimation methods in database systems. The main issue is the computationally expensive maintenance of model parameters when new data is inserted. In this paper, we examine how an important class of time series models, the AutoRegressive Integrated Moving Average (ARIMA) models, can be maintained with respect to inserts. Therefore, we propose a novel approach, on-demand estimation, for the efficient maintenance of maximum likelihood estimates from numerically implemented estimators. We present an extensive experimental evaluation on both real and synthetic data, which shows that our approach yields a substantial speedup while sacrificing only a limited amount of predictive accuracy

    A mobile seismograph array

    Get PDF
    A movable array composed of Ranger type seismometers, 60-day film recorders and 7-day magnetic tape recorders housed in compact trailers, has been developed. The array is useful for research requiring frequent instrument relocation such as P-delay, micro-seismicity, aftershock and signal-to-noise ratio studies. The array unit combines the functions found in conventional fixed stations with a high degree of mobility. Conveniences such as solid state amplifiers, radio and clock circuitry, internal calibration, and minimum installation time are special features. With the battery supply provided, a one week period of unattended film and tape recording is possible. With commercial power, the instruments can operate unattended for up to sixty days. Useful magnification up to several million is available, depending on the frequency band selected

    Sample-Based Forecasting Exploiting Hierarchical Time Series

    Get PDF
    Time series forecasting is challenging as sophisticated forecast models are computationally expensive to build. Recent research has addressed the integration of forecasting inside a DBMS. One main benefit is that models can be created once and then repeatedly used to answer forecast queries. Often forecast queries are submitted on higher aggregation levels, e. g., forecasts of sales over all locations. To answer such a forecast query, we have two possibilities. First, we can aggregate all base time series (sales in Austria, sales in Belgium...) and create only one model for the aggregate time series. Second, we can create models for all base time series and aggregate the base forecast values. The second possibility might lead to a higher accuracy but it is usually too expensive due to a high number of base time series. However, we actually do not need all base models to achieve a high accuracy, a sample of base models is enough. With this approach, we still achieve a better accuracy than an aggregate model, very similar to using all models, but we need less models to create and maintain in the database. We further improve this approach if new actual values of the base time series arrive at different points in time. With each new actual value we can refine the aggregate forecast and eventually converge towards the real actual value. Our experimental evaluation using several real-world data sets, shows a high accuracy of our approaches and a fast convergence towards the optimal value with increasing sample sizes and increasing number of actual values respectively

    Topology-aware optimization of big sparse matrices and matrix multiplications on main-memory systems

    Get PDF
    Since data sizes of analytical applications are continuously growing, many data scientists are switching from customized micro-solutions to scalable alternatives, such as statistical and scientific databases. However, many algorithms in data mining and science are expressed in terms of linear algebra, which is barely supported by major database vendors and big data solutions. On the other side, conventional linear algebra algorithms and legacy matrix representations are often not suitable for very large matrices. We propose a strategy for large matrix processing on modern multicore systems that is based on a novel, adaptive tile matrix representation (AT MATRIX). Our solution utilizes multiple techniques inspired from database technology, such as multidimensional data partitioning, cardinality estimation, indexing, dynamic rewrites, and many more in order to optimize the execution time. Based thereon we present a matrix multiplication operator ATMULT, which outperforms alternative approaches. The aim of our solution is to overcome the burden for data scientists of selecting appropriate algorithms and matrix storage representations. We evaluated AT MATRIX together with ATMULT on several real-world and synthetic random matrices

    Bringing Linear Algebra Objects to Life in a Column-Oriented In-Memory Database

    Get PDF
    Large numeric matrices and multidimensional data arrays appear in many science domains, as well as in applications of financial and business warehousing. Common applications include eigenvalue determination of large matrices, which decompose into a set of linear algebra operations. With the rise of in-memory databases it is now feasible to execute these complex analytical queries directly in a relational database system without the need of transfering data out of the system and being restricted by hard disc latencies for random accesses. In this paper, we present a way to integrate linear algebra operations and large matrices as first class citizens into an in-memory database following a two-layered architectural model. The architecture consists of a logical component receiving manipulation statements and linear algebra expressions, and of a physical layer, which autonomously administrates multiple matrix storage representations. A cost-based hybrid storage representation is presented and an experimental implementation is evaluated for matrix-vector multiplications

    SLACID - Sparse Linear Algebra in a Column-Oriented In-Memory Database System

    Get PDF
    Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs

    F2DB: The Flash-Forward Database System

    Get PDF
    Forecasts are important to decision-making and risk assessment in many domains. Since current database systems do not provide integrated support for forecasting, it is usually done outside the database system by specially trained experts using forecast models. However, integrating model-based forecasting as a first-class citizen inside a DBMS speeds up the forecasting process by avoiding exporting the data and by applying database-related optimizations like reusing created forecast models. It especially allows subsequent processing of forecast results inside the database. In this demo, we present our prototype F2DB based on PostgreSQL, which allows for transparent processing of forecast queries. Our system automatically takes care of model maintenance when the underlying dataset changes. In addition, we offer optimizations to save maintenance costs and increase accuracy by using derivation schemes for multidimensional data. Our approach reduces the required expert knowledge by enabling arbitrary users to apply forecasting in a declarative way

    Technology Time Machine 2012:Paving the Path for the Future Technology Developments

    Get PDF
    • …
    corecore