Search CORE

52 research outputs found

DROP: Dimensionality Reduction Optimization for Time Series

Author: Bailis Peter
Suri Sahaana
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Dimensionality reduction is a critical step in scaling machine learning pipelines. Principal component analysis (PCA) is a standard tool for dimensionality reduction, but performing PCA over a full dataset can be prohibitively expensive. As a result, theoretical work has studied the effectiveness of iterative, stochastic PCA methods that operate over data samples. However, termination conditions for stochastic PCA either execute for a predetermined number of iterations, or until convergence of the solution, frequently sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after operating over small (e.g., 1%) subsamples of input data, reducing whole workload runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds conventional approaches like FFT and PAA by up to 16x in end-to-end workloads

arXiv.org e-Print Archive

Crossref

Deep integration of machine learning Into column stores

Author: Holanda P.T. (Pedro)
Manegold S. (Stefan)
Mühleisen H.F. (Hannes)
Raasveldt M. (Mark)
Publication venue
Publication date: 01/01/2018
Field of study

We leverage vectorized User-Defined Functions (UDFs) to efficiently integrate unchanged machine learning pipelines into an analytical data management system. The entire pipelines including data, models, parameters and evaluation outcomes are stored and executed inside the database system. Experiments using our MonetDB/Python UDFs show greatly improved performance due to reduced data movement and parallel processing opportunities. In addition, this integration enables meta-analysis of models using relational queries

CWI's Institutional Repository

Leiden University Scholary Publications