437 research outputs found
Efficient In-Database Maintenance of ARIMA Models
Forecasting is an important analysis task and there is a need of integrating time series models and estimation methods in database systems. The main issue is the computationally expensive maintenance of model parameters when new data is inserted. In this paper, we examine how an important class of time series models, the AutoRegressive Integrated Moving Average (ARIMA) models, can be maintained with respect to inserts. Therefore, we propose a novel approach, on-demand estimation, for the efficient maintenance of maximum likelihood estimates from numerically implemented estimators. We present an extensive experimental evaluation on both real and synthetic data, which shows that our approach yields a substantial speedup while sacrificing only a limited amount of predictive accuracy
A mobile seismograph array
A movable array composed of Ranger type seismometers, 60-day film recorders and 7-day magnetic tape recorders housed in compact trailers, has been developed. The array is useful for research requiring frequent instrument relocation such as P-delay, micro-seismicity, aftershock and signal-to-noise ratio studies.
The array unit combines the functions found in conventional fixed stations with a high degree of mobility. Conveniences such as solid state amplifiers, radio and clock circuitry, internal calibration, and minimum installation time are special features.
With the battery supply provided, a one week period of unattended film and tape recording is possible. With commercial power, the instruments can operate unattended for up to sixty days.
Useful magnification up to several million is available, depending on the frequency band selected
Sample-Based Forecasting Exploiting Hierarchical Time Series
Time series forecasting is challenging as sophisticated forecast models are computationally expensive to build. Recent research has addressed the integration of forecasting inside a DBMS. One main benefit is that models can be created once and then repeatedly used to answer forecast queries. Often forecast queries are submitted on higher aggregation levels, e. g., forecasts of sales over all locations. To answer such a forecast query, we have two possibilities. First, we can aggregate all base time series (sales in Austria, sales in Belgium...) and create only one model for the aggregate time series. Second, we can create models for all base time series and aggregate the base forecast values. The second possibility might lead to a higher accuracy but it is usually too expensive due to a high number of base time series. However, we actually do not need all base models to achieve a high accuracy, a sample of base models is enough. With this approach, we still achieve a better accuracy than an aggregate model, very similar to using all models, but we need less models to create and maintain in the database. We further improve this approach if new actual values of the base time series arrive at different points in time. With each new actual value we can refine the aggregate forecast and eventually converge towards the real actual value. Our experimental evaluation using several real-world data sets, shows a high accuracy of our approaches and a fast convergence towards the optimal value with increasing sample sizes and increasing number of actual values respectively
Topology-aware optimization of big sparse matrices and matrix multiplications on main-memory systems
Since data sizes of analytical applications are continuously growing, many data scientists are switching from customized micro-solutions to scalable alternatives, such as statistical and scientific databases. However, many algorithms in data mining and science are expressed in terms of linear algebra, which is barely supported by major database vendors and big data solutions. On the other side, conventional linear algebra algorithms and legacy matrix representations are often not suitable for very large matrices. We propose a strategy for large matrix processing on modern multicore systems that is based on a novel, adaptive tile matrix representation (AT MATRIX). Our solution utilizes multiple techniques inspired from database technology, such as multidimensional data partitioning, cardinality estimation, indexing, dynamic rewrites, and many more in order to optimize the execution time. Based thereon we present a matrix multiplication operator ATMULT, which outperforms alternative approaches. The aim of our solution is to overcome the burden for data scientists of selecting appropriate algorithms and matrix storage representations. We evaluated AT MATRIX together with ATMULT on several real-world and synthetic random matrices
Bringing Linear Algebra Objects to Life in a Column-Oriented In-Memory Database
Large numeric matrices and multidimensional data arrays appear in many science domains, as well as in applications of financial and business warehousing. Common applications include eigenvalue determination of large matrices, which decompose into a set of linear algebra operations. With the rise of in-memory databases it is now feasible to execute these complex analytical queries directly in a relational database system without the need of transfering data out of the system and being restricted by hard disc latencies for random accesses. In this paper, we present a way to integrate linear algebra operations and large matrices as first class citizens into an in-memory database following a two-layered architectural model. The architecture consists of a logical component receiving manipulation statements and linear algebra expressions, and of a physical layer, which autonomously administrates multiple matrix storage representations. A cost-based hybrid storage representation is presented and an experimental implementation is evaluated for matrix-vector multiplications
F2DB: The Flash-Forward Database System
Forecasts are important to decision-making and risk assessment in many domains. Since current database systems do not provide integrated support for forecasting, it is usually done outside the database system by specially trained experts using forecast models. However, integrating model-based forecasting as a first-class citizen inside a DBMS speeds up the forecasting process by avoiding exporting the data and by applying database-related optimizations like reusing created forecast models. It especially allows subsequent processing of forecast results inside the database. In this demo, we present our prototype F2DB based on PostgreSQL, which allows for transparent processing of forecast queries. Our system automatically takes care of model maintenance when the underlying dataset changes. In addition, we offer optimizations to save maintenance costs and increase accuracy by using derivation schemes for multidimensional data. Our approach reduces the required expert knowledge by enabling arbitrary users to apply forecasting in a declarative way
Recommended from our members
Topical TMPRSS2 inhibition prevents SARS-CoV-2 infection in differentiated human airway cultures
Background There are limited effective prophylactic/early treatments for SARS-CoV-2 infection. Viral entry requires spike protein binding to the ACE2 receptor and cleavage by TMPRSS2, a cell surface serine protease. Targeting of TMPRSS2 by either androgen blockade or direct inhibition is in clinical trials in early SARS-CoV-2 infection.
Methods We used differentiated primary human airway epithelial cells at the air-liquid interface to test the impact of targeting TMPRSS2 on the prevention of SARS-CoV-2 infection.
Results We first modelled the systemic delivery of compounds. Enzalutamide, an oral androgen receptor antagonist, had no impact on SARS-Cov-2 infection. By contrast, camostat mesylate, an orally available serine protease inhibitor, blocked SARS-CoV-2 entry. However, oral camostat is rapidly metabolised in the circulation, with poor airway bioavailability. We therefore modelled local airway administration by applying camostat to the apical surface of differentiated airway cultures. We demonstrated that a brief exposure to topical camostat effectively restricts SARS-CoV-2 infection.
Conclusion These experiments demonstrate a potential therapeutic role for topical camostat for pre- or post-exposure prophylaxis of SARS-CoV-2, which can now be evaluated in a clinical trial.SARS-CoV-2/human/Liverpool/REMRQ0001/2020 was a kind gift from Lance Turtle (University of Liverpool) and David Matthews and Andrew Davidson (University of Bristol). SARS-CoV-2 England/ATACCC 174/2020 was a kind gift from Greg Towers (University College London), and we are also grateful to Ajit Lalvani, Jake Dunning, Maria Zambon and colleagues at Public Health England and Giada Mattiuzzo at the National Institute for Biological Standards and Controls and Wendy Barclay and Jonathan Brown and all colleagues in the United Kingdom Research Institute funded collaboration Genotype to Phenotype. Sheep anti-SARS-CoV-2 nucleoprotein antibody (DA114) was a kind gift from Paul Davies (obtained from MRC PPU Reagents and Services, University of Dundee). LnCAP cells were a kind gift from Charlie Massie. We gratefully acknowledge the support from Dr Ravindra Mahadeva and Ms Jacqui Galloway in establishing the primary cells from patients. We are grateful for the generous support of the UKRI COVID Immunology Consortium, Addenbrooke’s Charitable Trust (15/20A) and the NIHR Cambridge Biomedical Research Centre. This work was supported by a Wellcome Trust Principal Research Fellowship (084957/Z/08/Z) and MRC research grant MR/V011561/1 to P.J.L. This work was supported by the NC3Rs NC/S001204/1 project grant and the Roy Castle Lung Cancer Foundation grant (2015/10/McCaughan) to FM.
This paper presents independent research supported by the NIHR Cambridge BRC. The NIHR Cambridge Biomedical Research Centre (BRC) is a partnership between Cambridge University Hospitals NHS Foundation Trust and the University of Cambridge, funded by the National Institute for Health Research (NIHR). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care
SLACID - Sparse Linear Algebra in a Column-Oriented In-Memory Database System
Scientific computations and analytical business applications are often based on linear algebra operations on large, sparse matrices. With the hardware shift of the primary storage from disc into memory it is now feasible to execute linear algebra queries directly in the database engine. This paper presents and compares different approaches of storing sparse matrices in an in-memory column-oriented database system. We show that a system layout derived from the compressed sparse row representation integrates well with a columnar database design and that the resulting architecture is moreover amenable to a wide range of non-numerical use cases when dictionary encoding is used. Dynamic matrix manipulation operations, like online insertion or deletion of elements, are not covered by most linear algebra frameworks. Therefore, we present a hybrid architecture that consists of a read-optimized main and a write-optimized delta structure and evaluate the performance for dynamic sparse matrix workloads by applying workflows of nuclear science and network graphs
- …