13,553 research outputs found
Algorithmic and Statistical Perspectives on Large-Scale Data Analysis
In recent years, ideas from statistics and scientific computing have begun to
interact in increasingly sophisticated and fruitful ways with ideas from
computer science and the theory of algorithms to aid in the development of
improved worst-case algorithms that are useful for large-scale scientific and
Internet data analysis problems. In this chapter, I will describe two recent
examples---one having to do with selecting good columns or features from a (DNA
Single Nucleotide Polymorphism) data matrix, and the other having to do with
selecting good clusters or communities from a data graph (representing a social
or information network)---that drew on ideas from both areas and that may serve
as a model for exploiting complementary algorithmic and statistical
perspectives in order to solve applied large-scale data analysis problems.Comment: 33 pages. To appear in Uwe Naumann and Olaf Schenk, editors,
"Combinatorial Scientific Computing," Chapman and Hall/CRC Press, 201
A Domain Specific Approach to High Performance Heterogeneous Computing
Users of heterogeneous computing systems face two problems: firstly, in
understanding the trade-off relationships between the observable
characteristics of their applications, such as latency and quality of the
result, and secondly, how to exploit knowledge of these characteristics to
allocate work to distributed computing platforms efficiently. A domain specific
approach addresses both of these problems. By considering a subset of
operations or functions, models of the observable characteristics or domain
metrics may be formulated in advance, and populated at run-time for task
instances. These metric models can then be used to express the allocation of
work as a constrained integer program, which can be solved using heuristics,
machine learning or Mixed Integer Linear Programming (MILP) frameworks. These
claims are illustrated using the example domain of derivatives pricing in
computational finance, with the domain metrics of workload latency or makespan
and pricing accuracy. For a large, varied workload of 128 Black-Scholes and
Heston model-based option pricing tasks, running upon a diverse array of 16
Multicore CPUs, GPUs and FPGAs platforms, predictions made by models of both
the makespan and accuracy are generally within 10% of the run-time performance.
When these models are used as inputs to machine learning and MILP-based
workload allocation approaches, a latency improvement of up to 24 and 270 times
over the heuristic approach is seen.Comment: 14 pages, preprint draft, minor revisio
Distributed Online Modified Greedy Algorithm for Networked Storage Operation under Uncertainty
The integration of intermittent and stochastic renewable energy resources
requires increased flexibility in the operation of the electric grid. Storage,
broadly speaking, provides the flexibility of shifting energy over time;
network, on the other hand, provides the flexibility of shifting energy over
geographical locations. The optimal control of storage networks in stochastic
environments is an important open problem. The key challenge is that, even in
small networks, the corresponding constrained stochastic control problems on
continuous spaces suffer from curses of dimensionality, and are intractable in
general settings. For large networks, no efficient algorithm is known to give
optimal or provably near-optimal performance for this problem. This paper
provides an efficient algorithm to solve this problem with performance
guarantees. We study the operation of storage networks, i.e., a storage system
interconnected via a power network. An online algorithm, termed Online Modified
Greedy algorithm, is developed for the corresponding constrained stochastic
control problem. A sub-optimality bound for the algorithm is derived, and a
semidefinite program is constructed to minimize the bound. In many cases, the
bound approaches zero so that the algorithm is near-optimal. A task-based
distributed implementation of the online algorithm relying only on local
information and neighbor communication is then developed based on the
alternating direction method of multipliers. Numerical examples verify the
established theoretical performance bounds, and demonstrate the scalability of
the algorithm.Comment: arXiv admin note: text overlap with arXiv:1405.778
- …