18,203 research outputs found
Accurate and efficient explicit approximations of the Colebrook flow friction equation based on the Wright omega-function
The Colebrook equation is a popular model for estimating friction loss coefficients in water and gas pipes. The model is implicit in the unknown flow friction factor, f. To date, the captured flow friction factor, f, can be extracted from the logarithmic form analytically only in the term of the Lambert W-function. The purpose of this study is to find an accurate and computationally efficient solution based on the shifted Lambert W-function also known as the Wright omega-function. The Wright omega-function is more suitable because it overcomes the problem with the overflow error by switching the fast growing term, y = W (e(x)), of the Lambert W-function to series expansions that further can be easily evaluated in computers without causing overflow run-time errors. Although the Colebrook equation transformed through the Lambert W-function is identical to the original expression in terms of accuracy, a further evaluation of the Lambert W-function can be only approximate. Very accurate explicit approximations of the Colebrook equation that contain only one or two logarithms are shown. The final result is an accurate explicit approximation of the Colebrook equation with a relative error of no more than 0.0096%. The presented approximations are in a form suitable for everyday engineering use, and are both accurate and computationally efficient.Web of Science71art. no. 3
Secure Numerical and Logical Multi Party Operations
We derive algorithms for efficient secure numerical and logical operations
using a recently introduced scheme for secure multi-party
computation~\cite{sch15} in the semi-honest model ensuring statistical or
perfect security. To derive our algorithms for trigonometric functions, we use
basic mathematical laws in combination with properties of the additive
encryption scheme in a novel way. For division and logarithm we use a new
approach to compute a Taylor series at a fixed point for all numbers. All our
logical operations such as comparisons and large fan-in AND gates are perfectly
secure. Our empirical evaluation yields speed-ups of more than a factor of 100
for the evaluated operations compared to the state-of-the-art
PI-BA Bundle Adjustment Acceleration on Embedded FPGAs with Co-observation Optimization
Bundle adjustment (BA) is a fundamental optimization technique used in many
crucial applications, including 3D scene reconstruction, robotic localization,
camera calibration, autonomous driving, space exploration, street view map
generation etc. Essentially, BA is a joint non-linear optimization problem, and
one which can consume a significant amount of time and power, especially for
large optimization problems. Previous approaches of optimizing BA performance
heavily rely on parallel processing or distributed computing, which trade
higher power consumption for higher performance. In this paper we propose
{\pi}-BA, the first hardware-software co-designed BA engine on an embedded
FPGA-SoC that exploits custom hardware for higher performance and power
efficiency. Specifically, based on our key observation that not all points
appear on all images in a BA problem, we designed and implemented a
Co-Observation Optimization technique to accelerate BA operations with
optimized usage of memory and computation resources. Experimental results
confirm that {\pi}-BA outperforms the existing software implementations in
terms of performance and power consumption.Comment: in Proceedings of IEEE FCCM 201
Optimizing Lossy Compression Rate-Distortion from Automatic Online Selection between SZ and ZFP
With ever-increasing volumes of scientific data produced by HPC applications,
significantly reducing data size is critical because of limited capacity of
storage space and potential bottlenecks on I/O or networks in writing/reading
or transferring data. SZ and ZFP are the two leading lossy compressors
available to compress scientific data sets. However, their performance is not
consistent across different data sets and across different fields of some data
sets: for some fields SZ provides better compression performance, while other
fields are better compressed with ZFP. This situation raises the need for an
automatic online (during compression) selection between SZ and ZFP, with a
minimal overhead. In this paper, the automatic selection optimizes the
rate-distortion, an important statistical quality metric based on the
signal-to-noise ratio. To optimize for rate-distortion, we investigate the
principles of SZ and ZFP. We then propose an efficient online, low-overhead
selection algorithm that predicts the compression quality accurately for two
compressors in early processing stages and selects the best-fit compressor for
each data field. We implement the selection algorithm into an open-source
library, and we evaluate the effectiveness of our proposed solution against
plain SZ and ZFP in a parallel environment with 1,024 cores. Evaluation results
on three data sets representing about 100 fields show that our selection
algorithm improves the compression ratio up to 70% with the same level of data
distortion because of very accurate selection (around 99%) of the best-fit
compressor, with little overhead (less than 7% in the experiments).Comment: 14 pages, 9 figures, first revisio
4.45 Pflops Astrophysical N-Body Simulation on K computer -- The Gravitational Trillion-Body Problem
As an entry for the 2012 Gordon-Bell performance prize, we report performance
results of astrophysical N-body simulations of one trillion particles performed
on the full system of K computer. This is the first gravitational trillion-body
simulation in the world. We describe the scientific motivation, the numerical
algorithm, the parallelization strategy, and the performance analysis. Unlike
many previous Gordon-Bell prize winners that used the tree algorithm for
astrophysical N-body simulations, we used the hybrid TreePM method, for similar
level of accuracy in which the short-range force is calculated by the tree
algorithm, and the long-range force is solved by the particle-mesh algorithm.
We developed a highly-tuned gravity kernel for short-range forces, and a novel
communication algorithm for long-range forces. The average performance on 24576
and 82944 nodes of K computer are 1.53 and 4.45 Pflops, which correspond to 49%
and 42% of the peak speed.Comment: 10 pages, 6 figures, Proceedings of Supercomputing 2012
(http://sc12.supercomputing.org/), Gordon Bell Prize Winner. Additional
information is http://www.ccs.tsukuba.ac.jp/CCS/eng/gbp201
Machine learning for ultrafast X-ray diffraction patterns on large-scale GPU clusters
The classical method of determining the atomic structure of complex molecules
by analyzing diffraction patterns is currently undergoing drastic developments.
Modern techniques for producing extremely bright and coherent X-ray lasers
allow a beam of streaming particles to be intercepted and hit by an ultrashort
high energy X-ray beam. Through machine learning methods the data thus
collected can be transformed into a three-dimensional volumetric intensity map
of the particle itself. The computational complexity associated with this
problem is very high such that clusters of data parallel accelerators are
required.
We have implemented a distributed and highly efficient algorithm for
inversion of large collections of diffraction patterns targeting clusters of
hundreds of GPUs. With the expected enormous amount of diffraction data to be
produced in the foreseeable future, this is the required scale to approach real
time processing of data at the beam site. Using both real and synthetic data we
look at the scaling properties of the application and discuss the overall
computational viability of this exciting and novel imaging technique
Constrained LQR for Low-Precision Data Representation
Performing computations with a low-bit number representation results in a faster implementation that uses less silicon, and hence allows an algorithm to be implemented in smaller and cheaper processors without loss of performance. We propose a novel formulation to efficiently exploit the low (or non-standard) precision number representation of some computer architectures when computing the solution to constrained LQR problems, such as those that arise in predictive control. The main idea is to include suitably-defined decision variables in the quadratic program, in addition to the states and the inputs, to allow for smaller roundoff errors in the solver. This enables one to trade off the number of bits used for data representation against speed and/or hardware resources, so that smaller numerical errors can be achieved for the same number of bits (same silicon area). Because of data dependencies, the algorithm complexity, in terms of computation time and hardware resources, does not necessarily increase despite the larger number of decision variables. Examples show that a 10-fold reduction in hardware resources is possible compared to using double precision floating point, without loss of closed-loop performance
Scan matching by cross-correlation and differential evolution
Scan matching is an important task, solved in the context of many high-level problems including pose estimation, indoor localization, simultaneous localization and mapping and others. Methods that are accurate and adaptive and at the same time computationally efficient are required to enable location-based services in autonomous mobile devices. Such devices usually have a wide range of high-resolution sensors but only a limited processing power and constrained energy supply. This work introduces a novel high-level scan matching strategy that uses a combination of two advanced algorithms recently used in this field: cross-correlation and differential evolution. The cross-correlation between two laser range scans is used as an efficient measure of scan alignment and the differential evolution algorithm is used to search for the parameters of a transformation that aligns the scans. The proposed method was experimentally validated and showed good ability to match laser range scans taken shortly after each other and an excellent ability to match laser range scans taken with longer time intervals between them.Web of Science88art. no. 85
- …