44,785 research outputs found
Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies
The nature of dark energy and the complete theory of gravity are two central
questions currently facing cosmology. A vital tool for addressing them is the
3-point correlation function (3PCF), which probes deviations from a spatially
random distribution of galaxies. However, the 3PCF's formidable computational
expense has prevented its application to astronomical surveys comprising
millions to billions of galaxies. We present Galactos, a high-performance
implementation of a novel, O(N^2) algorithm that uses a load-balanced k-d tree
and spherical harmonic expansions to compute the anisotropic 3PCF. Our
implementation is optimized for the Intel Xeon Phi architecture, exploiting
SIMD parallelism, instruction and thread concurrency, and significant L1 and L2
cache reuse, reaching 39% of peak performance on a single node. Galactos scales
to the full Cori system, achieving 9.8PF (peak) and 5.06PF (sustained) across
9636 nodes, making the 3PCF easily computable for all galaxies in the
observable universe.Comment: 11 pages, 7 figures, accepted to SuperComputing 201
Tradeoffs for nearest neighbors on the sphere
We consider tradeoffs between the query and update complexities for the
(approximate) nearest neighbor problem on the sphere, extending the recent
spherical filters to sparse regimes and generalizing the scheme and analysis to
account for different tradeoffs. In a nutshell, for the sparse regime the
tradeoff between the query complexity and update complexity
for data sets of size is given by the following equation in
terms of the approximation factor and the exponents and :
For small , minimizing the time for updates leads to a linear
space complexity at the cost of a query time complexity .
Balancing the query and update costs leads to optimal complexities
, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner,
IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn,
STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A
subpolynomial query time complexity can be achieved at the cost of a
space complexity of the order , matching the bound
of [Andoni-Indyk-Patrascu, FOCS'06] and
[Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of
[Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98].
For large , minimizing the update complexity results in a query complexity
of , improving upon the related exponent for large of
[Kapralov, PODS'15] by a factor , and matching the bound
of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal
complexities , while a minimum query time complexity can be
achieved with update complexity , improving upon the
previous best exponents of Kapralov by a factor .Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580
[cs.DS] (along with arXiv:1605.02701 [cs.DS]
Hashing for Similarity Search: A Survey
Similarity search (nearest neighbor search) is a problem of pursuing the data
items whose distances to a query item are the smallest from a large database.
Various methods have been developed to address this problem, and recently a lot
of efforts have been devoted to approximate search. In this paper, we present a
survey on one of the main solutions, hashing, which has been widely studied
since the pioneering work locality sensitive hashing. We divide the hashing
algorithms two main categories: locality sensitive hashing, which designs hash
functions without exploring the data distribution and learning to hash, which
learns hash functions according the data distribution, and review them from
various aspects, including hash function design and distance measure and search
scheme in the hash coding space
Spherical harmonic transform with GPUs
We describe an algorithm for computing an inverse spherical harmonic
transform suitable for graphic processing units (GPU). We use CUDA and base our
implementation on a Fortran90 routine included in a publicly available parallel
package, S2HAT. We focus our attention on the two major sequential steps
involved in the transforms computation, retaining the efficient parallel
framework of the original code. We detail optimization techniques used to
enhance the performance of the CUDA-based code and contrast them with those
implemented in the Fortran90 version. We also present performance comparisons
of a single CPU plus GPU unit with the S2HAT code running on either a single or
4 processors. In particular we find that use of the latest generation of GPUs,
such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms
by as much as 18 times with respect to S2HAT executed on one core, and by as
much as 5.5 with respect to S2HAT on 4 cores, with the overall performance
being limited by the Fast Fourier transforms. The work presented here has been
performed in the context of the Cosmic Microwave Background simulations and
analysis. However, we expect that the developed software will be of more
general interest and applicability
Wavemoth -- Fast spherical harmonic transforms by butterfly matrix compression
We present Wavemoth, an experimental open source code for computing scalar
spherical harmonic transforms (SHTs). Such transforms are ubiquitous in
astronomical data analysis. Our code performs substantially better than
existing publicly available codes due to improvements on two fronts. First, the
computational core is made more efficient by using small amounts of precomputed
data, as well as paying attention to CPU instruction pipelining and cache
usage. Second, Wavemoth makes use of a fast and numerically stable algorithm
based on compressing a set of linear operators in a precomputation step. The
resulting SHT scales as O(L^2 (log L)^2) for the resolution range of practical
interest, where L denotes the spherical harmonic truncation degree. For low and
medium-range resolutions, Wavemoth tends to be twice as fast as libpsht, which
is the current state of the art implementation for the HEALPix grid. At the
resolution of the Planck experiment, L ~ 4000, Wavemoth is between three and
six times faster than libpsht, depending on the computer architecture and the
required precision. Due to the experimental nature of the project, only
spherical harmonic synthesis is currently supported, although adding support or
spherical harmonic analysis should be trivial.Comment: 13 pages, 6 figures, accepted by ApJ
Relativistic MHD with Adaptive Mesh Refinement
This paper presents a new computer code to solve the general relativistic
magnetohydrodynamics (GRMHD) equations using distributed parallel adaptive mesh
refinement (AMR). The fluid equations are solved using a finite difference
Convex ENO method (CENO) in 3+1 dimensions, and the AMR is Berger-Oliger.
Hyperbolic divergence cleaning is used to control the
constraint. We present results from three flat space tests, and examine the
accretion of a fluid onto a Schwarzschild black hole, reproducing the Michel
solution. The AMR simulations substantially improve performance while
reproducing the resolution equivalent unigrid simulation results. Finally, we
discuss strong scaling results for parallel unigrid and AMR runs.Comment: 24 pages, 14 figures, 3 table
- …