2,259 research outputs found
Optimizing Photonic Nanostructures via Multi-fidelity Gaussian Processes
We apply numerical methods in combination with finite-difference-time-domain
(FDTD) simulations to optimize transmission properties of plasmonic mirror
color filters using a multi-objective figure of merit over a five-dimensional
parameter space by utilizing novel multi-fidelity Gaussian processes approach.
We compare these results with conventional derivative-free global search
algorithms, such as (single-fidelity) Gaussian Processes optimization scheme,
and Particle Swarm Optimization---a commonly used method in nanophotonics
community, which is implemented in Lumerical commercial photonics software. We
demonstrate the performance of various numerical optimization approaches on
several pre-collected real-world datasets and show that by properly trading off
expensive information sources with cheap simulations, one can more effectively
optimize the transmission properties with a fixed budget.Comment: NIPS 2018 Workshop on Machine Learning for Molecules and Materials.
arXiv admin note: substantial text overlap with arXiv:1811.0075
Hierarchical Bin Buffering: Online Local Moments for Dynamic External Memory Arrays
Local moments are used for local regression, to compute statistical measures
such as sums, averages, and standard deviations, and to approximate probability
distributions. We consider the case where the data source is a very large I/O
array of size n and we want to compute the first N local moments, for some
constant N. Without precomputation, this requires O(n) time. We develop a
sequence of algorithms of increasing sophistication that use precomputation and
additional buffer space to speed up queries. The simpler algorithms partition
the I/O array into consecutive ranges called bins, and they are applicable not
only to local-moment queries, but also to algebraic queries (MAX, AVERAGE, SUM,
etc.). With N buffers of size sqrt{n}, time complexity drops to O(sqrt n). A
more sophisticated approach uses hierarchical buffering and has a logarithmic
time complexity (O(b log_b n)), when using N hierarchical buffers of size n/b.
Using Overlapped Bin Buffering, we show that only a single buffer is needed, as
with wavelet-based algorithms, but using much less storage. Applications exist
in multidimensional and statistical databases over massive data sets,
interactive image processing, and visualization
Approximating Clustering for Memory Management and request processing
Clustering is a crucial tool for analyzing data in virtually every scientific
and engineering discipline. There are more scalable solutions framed to enable
time and space clustering for the future large-scale data analyses. As a
result, hardware and software innovations that can significantly improve data
efficiency and performance of the data clustering techniques are necessary to
make the future large-scale data analysis practical. This paper proposes a
novel mechanism for computing bit-serial medians. We propose a novel method,
two-parameter terms that enables in computation within the data array
Structure-Aware Sampling: Flexible and Accurate Summarization
In processing large quantities of data, a fundamental problem is to obtain a
summary which supports approximate query answering. Random sampling yields
flexible summaries which naturally support subset-sum queries with unbiased
estimators and well-understood confidence bounds.
Classic sample-based summaries, however, are designed for arbitrary subset
queries and are oblivious to the structure in the set of keys. The particular
structure, such as hierarchy, order, or product space (multi-dimensional),
makes range queries much more relevant for most analysis of the data.
Dedicated summarization algorithms for range-sum queries have also been
extensively studied. They can outperform existing sampling schemes in terms of
accuracy on range queries per summary size. Their accuracy, however, rapidly
degrades when, as is often the case, the query spans multiple ranges. They are
also less flexible - being targeted for range sum queries alone - and are often
quite costly to build and use.
In this paper we propose and evaluate variance optimal sampling schemes that
are structure-aware. These summaries improve over the accuracy of existing
structure-oblivious sampling schemes on range queries while retaining the
benefits of sample-based summaries: flexible summaries, with high accuracy on
both range queries and arbitrary subset queries
Enabling Efficient and General Subpopulation Analytics in Multidimensional Data Streams
Today’s large-scale services (e.g., video streaming platforms, data centers, sensor grids) need diverse real-time summary statistics across multiple subpopulations of multidimensional datasets. However, state-of-the-art frameworks do not offer general and accurate analytics in real time at reasonable costs. The root cause is the combinatorial explosion of data subpopulations and the diversity of summary statistics we need to monitor simultaneously. We present Hydra, an efficient framework for multidimensional analytics that presents a novel combination of using a “sketch of sketches” to avoid the overhead of monitoring exponentially-many subpopulations and universal sketching to ensure accurate estimates for multiple statistics. We build Hydra as an Apache Spark plugin and address practical system challenges to minimize overheads at scale. Across multiple real-world and synthetic multidimensional datasets, we show that Hydra can achieve robust error bounds and is an order of magnitude more efficient in terms of operational cost and memory footprint than existing frameworks (e.g., Spark, Druid) while ensuring interactive estimation times
- …