Search CORE

6,505 research outputs found

DROP: Dimensionality Reduction Optimization for Time Series

Author: Bailis Peter
Suri Sahaana
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

Dimensionality reduction is a critical step in scaling machine learning pipelines. Principal component analysis (PCA) is a standard tool for dimensionality reduction, but performing PCA over a full dataset can be prohibitively expensive. As a result, theoretical work has studied the effectiveness of iterative, stochastic PCA methods that operate over data samples. However, termination conditions for stochastic PCA either execute for a predetermined number of iterations, or until convergence of the solution, frequently sampling too many or too few datapoints for end-to-end runtime improvements. We show how accounting for downstream analytics operations during DR via PCA allows stochastic methods to efficiently terminate after operating over small (e.g., 1%) subsamples of input data, reducing whole workload runtime. Leveraging this, we propose DROP, a DR optimizer that enables speedups of up to 5x over Singular-Value-Decomposition-based PCA techniques, and exceeds conventional approaches like FFT and PAA by up to 16x in end-to-end workloads

arXiv.org e-Print Archive

Crossref

Data Driven Surrogate Based Optimization in the Problem Solving Environment WBCSim

Author: Deshpande S.
Kamke F. A.
Ramakrishnan N.
Shu J.
Watson L.T.
Publication venue
Publication date: 01/01/2009
Field of study

Large scale, multidisciplinary, engineering designs are always difficult due to the complexity and dimensionality of these problems. Direct coupling between the analysis codes and the optimization routines can be prohibitively time consuming due to the complexity of the underlying simulation codes. One way of tackling this problem is by constructing computationally cheap(er) approximations of the expensive simulations, that mimic the behavior of the simulation model as closely as possible. This paper presents a data driven, surrogate based optimization algorithm that uses a trust region based sequential approximate optimization (SAO) framework and a statistical sampling approach based on design of experiment (DOE) arrays. The algorithm is implemented using techniques from two packages—SURFPACK and SHEPPACK that provide a collection of approximation algorithms to build the surrogates and three different DOE techniques—full factorial (FF), Latin hypercube sampling (LHS), and central composite design (CCD)—are used to train the surrogates. The results are compared with the optimization results obtained by directly coupling an optimizer with the simulation code. The biggest concern in using the SAO framework based on statistical sampling is the generation of the required database. As the number of design variables grows, the computational cost of generating the required database grows rapidly. A data driven approach is proposed to tackle this situation, where the trick is to run the expensive simulation if and only if a nearby data point does not exist in the cumulatively growing database. Over time the database matures and is enriched as more and more optimizations are performed. Results show that the proposed methodology dramatically reduces the total number of calls to the expensive simulation runs during the optimization process

Computer Science Technical Reports @Virginia Tech

CiteSeerX

Neural Networks for Modeling and Control of Particle Accelerators

Author: Biedron S. G.
Chase B. E.
Edelen A. L.
Edstrom D.
Milton S. V.
Stabile P.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 19/10/2016
Field of study

We describe some of the challenges of particle accelerator control, highlight recent advances in neural network techniques, discuss some promising avenues for incorporating neural networks into particle accelerator control systems, and describe a neural network-based control system that is being developed for resonance control of an RF electron gun at the Fermilab Accelerator Science and Technology (FAST) facility, including initial experimental results from a benchmark controller.Comment: 21 p

arXiv.org e-Print Archive

CERN Document Server

Data mining as a tool for environmental scientists

Author: Athanasiadis Ioannis
Comas Joaquim
Frank Eibe
Gibert Karina
Letcher Rebecca
Spate Jessica
Sànchez-Marrè Miquel
Publication venue: International Environmental Modelling and Software Society
Publication date: 01/01/2006
Field of study

Over recent years a huge library of data mining algorithms has been developed to tackle a variety of problems in fields such as medical imaging and network traffic analysis. Many of these techniques are far more flexible than more classical modelling approaches and could be usefully applied to data-rich environmental problems. Certain techniques such as Artificial Neural Networks, Clustering, Case-Based Reasoning and more recently Bayesian Decision Networks have found application in environmental modelling while other methods, for example classification and association rule extraction, have not yet been taken up on any wide scale. We propose that these and other data mining techniques could be usefully applied to difficult problems in the field. This paper introduces several data mining concepts and briefly discusses their application to environmental modelling, where data may be sparse, incomplete, or heterogenous

Research Commons@Waikato