185,657 research outputs found
Hydrographic Data Processing on a Robust, Network-Coupled Parallel Cluster
Increasing data volumes and adoption of computer-assisted hydrographic data processing algorithms necessitate higher data processing rates if gains in efficiency achieved in the last decade are to be maintained and enhanced. Recent advances in desktop computer architectures have made multi-core and multi-processor systems readily available, and some advances have been made in implementing multi-threaded versions of common hydrographic data processing algorithms. In many cases, however, although the algorithms might be ideal for parallel implementation (so called ‘embarrassingly parallel’ tasks), limitations in memory, disc and network bandwidth within a single system can have significant limitations on the scalability of these solutions.
Offloading the computational requirements to a separate, clustered system of multiple computers is therefore appealing, since it has the potential for much higher net bandwidth, and robustness, without the collateral constraints of a desktop system. We consider, therefore, the advantages, potential efficiency gains, and difficulties, of processing hydrographic data in a robust, network-coupled, parallel cluster of computers. In particular, we address the problems of efficient and robust data distribution, compute load and network balancing, and of ensuring task- and system-level robustness in such a distributed system.
To illustrate the problem, we have considered two common processing tasks: pre-processing of raw Multibeam Echosounder (MBES) data to the stage of uncertainty-attributed resolved soundings in the local level, and computation of most-probable depths with a CUBE-like algorithm. These tasks illustrate a time- and spatially-indexed processing problem, respectively, which can engender differences in optimal data distribution and have different data- and network-use patterns. We demonstrate the gains and limitations of a clustered compute solution in these two cases, using the metrics of computational time as a function of processor resources committed, and robustness of processing in the face of intermittent random failures, as applied to (portions of) the Shallow Survey 2012 Common Data Set
Robust Optimisation Monte Carlo
This paper is on Bayesian inference for parametric statistical models that
are defined by a stochastic simulator which specifies how data is generated.
Exact sampling is then possible but evaluating the likelihood function is
typically prohibitively expensive. Approximate Bayesian Computation (ABC) is a
framework to perform approximate inference in such situations. While basic ABC
algorithms are widely applicable, they are notoriously slow and much research
has focused on increasing their efficiency. Optimisation Monte Carlo (OMC) has
recently been proposed as an efficient and embarrassingly parallel method that
leverages optimisation to accelerate the inference. In this paper, we
demonstrate an important previously unrecognised failure mode of OMC: It
generates strongly overconfident approximations by collapsing regions of
similar or near-constant likelihood into a single point. We propose an
efficient, robust generalisation of OMC that corrects this. It makes fewer
assumptions, retains the main benefits of OMC, and can be performed either as
post-processing to OMC or as a stand-alone computation. We demonstrate the
effectiveness of the proposed Robust OMC on toy examples and tasks in
inverse-graphics where we perform Bayesian inference with a complex image
renderer.Comment: 8 pages + 6 page appendix; v2: made clarifications, added a second
possible algorithm implementation and its results; v3: small clarifications,
to be published in AISTATS 202
Research and Education in Computational Science and Engineering
Over the past two decades the field of computational science and engineering
(CSE) has penetrated both basic and applied research in academia, industry, and
laboratories to advance discovery, optimize systems, support decision-makers,
and educate the scientific and engineering workforce. Informed by centuries of
theory and experiment, CSE performs computational experiments to answer
questions that neither theory nor experiment alone is equipped to answer. CSE
provides scientists and engineers of all persuasions with algorithmic
inventions and software systems that transcend disciplines and scales. Carried
on a wave of digital technology, CSE brings the power of parallelism to bear on
troves of data. Mathematics-based advanced computing has become a prevalent
means of discovery and innovation in essentially all areas of science,
engineering, technology, and society; and the CSE community is at the core of
this transformation. However, a combination of disruptive
developments---including the architectural complexity of extreme-scale
computing, the data revolution that engulfs the planet, and the specialization
required to follow the applications to new frontiers---is redefining the scope
and reach of the CSE endeavor. This report describes the rapid expansion of CSE
and the challenges to sustaining its bold advances. The report also presents
strategies and directions for CSE research and education for the next decade.Comment: Major revision, to appear in SIAM Revie
MPC for MPC: Secure Computation on a Massively Parallel Computing Architecture
Massively Parallel Computation (MPC) is a model of computation widely believed to best capture realistic parallel computing architectures such as large-scale MapReduce and Hadoop clusters. Motivated by the fact that many data analytics tasks performed on these platforms involve sensitive user data, we initiate the theoretical exploration of how to leverage MPC architectures to enable efficient, privacy-preserving computation over massive data. Clearly if a computation task does not lend itself to an efficient implementation on MPC even without security, then we cannot hope to compute it efficiently on MPC with security. We show, on the other hand, that any task that can be efficiently computed on MPC can also be securely computed with comparable efficiency. Specifically, we show the following results:
- any MPC algorithm can be compiled to a communication-oblivious counterpart while asymptotically preserving its round and space complexity, where communication-obliviousness ensures that any network intermediary observing the communication patterns learn no information about the secret inputs;
- assuming the existence of Fully Homomorphic Encryption with a suitable notion of compactness and other standard cryptographic assumptions, any MPC algorithm can be compiled to a secure counterpart that defends against an adversary who controls not only intermediate network routers but additionally up to 1/3 - ? fraction of machines (for an arbitrarily small constant ?) - moreover, this compilation preserves the round complexity tightly, and preserves the space complexity upto a multiplicative security parameter related blowup.
As an initial exploration of this important direction, our work suggests new definitions and proposes novel protocols that blend algorithmic and cryptographic techniques
Equivalence Classes and Conditional Hardness in Massively Parallel Computations
The Massively Parallel Computation (MPC) model serves as a common abstraction of many modern large-scale data processing frameworks, and has been receiving increasingly more attention over the past few years, especially in the context of classical graph problems. So far, the only way to argue lower bounds for this model is to condition on conjectures about the hardness of some specific problems, such as graph connectivity on promise graphs that are either one cycle or two cycles, usually called the one cycle vs. two cycles problem. This is unlike the traditional arguments based on conjectures about complexity classes (e.g., P ? NP), which are often more robust in the sense that refuting them would lead to groundbreaking algorithms for a whole bunch of problems.
In this paper we present connections between problems and classes of problems that allow the latter type of arguments. These connections concern the class of problems solvable in a sublogarithmic amount of rounds in the MPC model, denoted by MPC(o(log N)), and some standard classes concerning space complexity, namely L and NL, and suggest conjectures that are robust in the sense that refuting them would lead to many surprisingly fast new algorithms in the MPC model. We also obtain new conditional lower bounds, and prove new reductions and equivalences between problems in the MPC model
Deterministic Sampling and Range Counting in Geometric Data Streams
We present memory-efficient deterministic algorithms for constructing
epsilon-nets and epsilon-approximations of streams of geometric data. Unlike
probabilistic approaches, these deterministic samples provide guaranteed bounds
on their approximation factors. We show how our deterministic samples can be
used to answer approximate online iceberg geometric queries on data streams. We
use these techniques to approximate several robust statistics of geometric data
streams, including Tukey depth, simplicial depth, regression depth, the
Thiel-Sen estimator, and the least median of squares. Our algorithms use only a
polylogarithmic amount of memory, provided the desired approximation factors
are inverse-polylogarithmic. We also include a lower bound for non-iceberg
geometric queries.Comment: 12 pages, 1 figur
- …