Search CORE

44,785 research outputs found

Galactos: Computing the Anisotropic 3-Point Correlation Function for 2 Billion Galaxies

Author: Austin Brian
Bard Deborah
Deslippe Jack
Dubey Pradeep
Eisenstein Daniel J
Friesen Brian
Patwary Md. Mostofa Ali
Prabhat
Satish Nadathur
Slepian Zachary
Sundaram Narayanan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

The nature of dark energy and the complete theory of gravity are two central questions currently facing cosmology. A vital tool for addressing them is the 3-point correlation function (3PCF), which probes deviations from a spatially random distribution of galaxies. However, the 3PCF's formidable computational expense has prevented its application to astronomical surveys comprising millions to billions of galaxies. We present Galactos, a high-performance implementation of a novel, O(N^2) algorithm that uses a load-balanced k-d tree and spherical harmonic expansions to compute the anisotropic 3PCF. Our implementation is optimized for the Intel Xeon Phi architecture, exploiting SIMD parallelism, instruction and thread concurrency, and significant L1 and L2 cache reuse, reaching 39% of peak performance on a single node. Galactos scales to the full Cori system, achieving 9.8PF (peak) and 5.06PF (sustained) across 9636 nodes, making the 3PCF easily computable for all galaxies in the observable universe.Comment: 11 pages, 7 figures, accepted to SuperComputing 201

arXiv.org e-Print Archive

Crossref

eScholarship - University of California

Tradeoffs for nearest neighbors on the sphere

Author: Laarhoven Thijs
Publication venue
Publication date: 01/01/2015
Field of study

We consider tradeoffs between the query and update complexities for the (approximate) nearest neighbor problem on the sphere, extending the recent spherical filters to sparse regimes and generalizing the scheme and analysis to account for different tradeoffs. In a nutshell, for the sparse regime the tradeoff between the query complexity

n^{\rho_q}

and update complexity

n^{\rho_u}

for data sets of size

n

is given by the following equation in terms of the approximation factor

c

and the exponents

\rho_q

and

\rho_u

c^2\sqrt{\rho_q}+(c^2-1)\sqrt{\rho_u}=\sqrt{2c^2-1}.

For small

c=1+\epsilon

, minimizing the time for updates leads to a linear space complexity at the cost of a query time complexity

n^{1-4\epsilon^2}

. Balancing the query and update costs leads to optimal complexities

n^{1/(2c^2-1)}

, matching bounds from [Andoni-Razenshteyn, 2015] and [Dubiner, IEEE-TIT'10] and matching the asymptotic complexities of [Andoni-Razenshteyn, STOC'15] and [Andoni-Indyk-Laarhoven-Razenshteyn-Schmidt, NIPS'15]. A subpolynomial query time complexity

n^{o(1)}

can be achieved at the cost of a space complexity of the order

n^{1/(4\epsilon^2)}

, matching the bound

n^{\Omega(1/\epsilon^2)}

of [Andoni-Indyk-Patrascu, FOCS'06] and [Panigrahy-Talwar-Wieder, FOCS'10] and improving upon results of [Indyk-Motwani, STOC'98] and [Kushilevitz-Ostrovsky-Rabani, STOC'98]. For large

c

, minimizing the update complexity results in a query complexity of

n^{2/c^2+O(1/c^4)}

, improving upon the related exponent for large

c

of [Kapralov, PODS'15] by a factor

2

, and matching the bound

n^{\Omega(1/c^2)}

of [Panigrahy-Talwar-Wieder, FOCS'08]. Balancing the costs leads to optimal complexities

n^{1/(2c^2-1)}

, while a minimum query time complexity can be achieved with update complexity

n^{2/c^2+O(1/c^4)}

, improving upon the previous best exponents of Kapralov by a factor

2

.Comment: 16 pages, 1 table, 2 figures. Mostly subsumed by arXiv:1608.03580 [cs.DS] (along with arXiv:1605.02701 [cs.DS]

arXiv.org e-Print Archive

Repository TU/e

Pure OAI Repository

Hashing for Similarity Search: A Survey

Author: Ji Jianqiu
Shen Heng Tao
Song Jingkuan
Wang Jingdong
Publication venue
Publication date: 13/08/2014
Field of study

Similarity search (nearest neighbor search) is a problem of pursuing the data items whose distances to a query item are the smallest from a large database. Various methods have been developed to address this problem, and recently a lot of efforts have been devoted to approximate search. In this paper, we present a survey on one of the main solutions, hashing, which has been widely studied since the pioneering work locality sensitive hashing. We divide the hashing algorithms two main categories: locality sensitive hashing, which designs hash functions without exploring the data distribution and learning to hash, which learns hash functions according the data distribution, and review them from various aspects, including hash function design and distance measure and search scheme in the hash coding space

arXiv.org e-Print Archive

CiteSeerX

Spherical harmonic transform with GPUs

Author: Falcou Joel
Grigori Laura
Hupca Ioan O.
Stompor Radek
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 06/10/2010
Field of study

We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel framework of the original code. We detail optimization techniques used to enhance the performance of the CUDA-based code and contrast them with those implemented in the Fortran90 version. We also present performance comparisons of a single CPU plus GPU unit with the S2HAT code running on either a single or 4 processors. In particular we find that use of the latest generation of GPUs, such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms by as much as 18 times with respect to S2HAT executed on one core, and by as much as 5.5 with respect to S2HAT on 4 cores, with the overall performance being limited by the Fast Fourier transforms. The work presented here has been performed in the context of the Cosmic Microwave Background simulations and analysis. However, we expect that the developed software will be of more general interest and applicability

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Wavemoth -- Fast spherical harmonic transforms by butterfly matrix compression

Author: D. S. Seljebotn
Eriksen
Górski
Hupca
Press
Suda
Szydlarski
Wiaux
Publication venue: 'IOP Publishing'
Publication date: 12/01/2012
Field of study

We present Wavemoth, an experimental open source code for computing scalar spherical harmonic transforms (SHTs). Such transforms are ubiquitous in astronomical data analysis. Our code performs substantially better than existing publicly available codes due to improvements on two fronts. First, the computational core is made more efficient by using small amounts of precomputed data, as well as paying attention to CPU instruction pipelining and cache usage. Second, Wavemoth makes use of a fast and numerically stable algorithm based on compressing a set of linear operators in a precomputation step. The resulting SHT scales as O(L^2 (log L)^2) for the resolution range of practical interest, where L denotes the spherical harmonic truncation degree. For low and medium-range resolutions, Wavemoth tends to be twice as fast as libpsht, which is the current state of the art implementation for the HEALPix grid. At the resolution of the Planck experiment, L ~ 4000, Wavemoth is between three and six times faster than libpsht, depending on the computer architecture and the required precision. Due to the experimental nature of the project, only spherical harmonic synthesis is currently supported, although adding support or spherical harmonic analysis should be trivial.Comment: 13 pages, 6 figures, accepted by ApJ

arXiv.org e-Print Archive

Crossref

Relativistic MHD with Adaptive Mesh Refinement

Author: Anile A M
David Neilsen
Deiterding R
Eric W Hirschmann
Gombosi T I
Hirschmann E Lehner L Neilsen D Palenzuela C Reula O
Hornung R Kohn S Elliott N Smith S Wissink A Gunney B Hysom D
Lawrence Berkeley National Lab
Lehner L
Li S
Lijewki M Beckner V Rendleman C
Matthew Anderson
Merz H
Motl P
Neilsen D
Papadopoulos P
Pretorius F
Sloan J
Steven L Liebling
Wilson J R
Publication venue: 'IOP Publishing'
Publication date: 17/05/2006
Field of study

This paper presents a new computer code to solve the general relativistic magnetohydrodynamics (GRMHD) equations using distributed parallel adaptive mesh refinement (AMR). The fluid equations are solved using a finite difference Convex ENO method (CENO) in 3+1 dimensions, and the AMR is Berger-Oliger. Hyperbolic divergence cleaning is used to control the

\nabla\cdot {\bf B}=0

constraint. We present results from three flat space tests, and examine the accretion of a fluid onto a Schwarzschild black hole, reproducing the Michel solution. The AMR simulations substantially improve performance while reproducing the resolution equivalent unigrid simulation results. Finally, we discuss strong scaling results for parallel unigrid and AMR runs.Comment: 24 pages, 14 figures, 3 table

arXiv.org e-Print Archive

Crossref

CERN Document Server