17,084 research outputs found

    A fast, simple, and stable Chebyshev-Legendre transform using an asymptotic formula

    Get PDF
    A fast, simple, and numerically stable transform for converting between Legendre and Chebyshev coefficients of a degree NN polynomial in O(N(logN)2/loglogN)O(N(\log N)^{2}/ \log \log N) operations is derived. The basis of the algorithm is to rewrite a well-known asymptotic formula for Legendre polynomials of large degree as a weighted linear combination of Chebyshev polynomials, which can then be evaluated by using the discrete cosine transform. Numerical results are provided to demonstrate the efficiency and numerical stability. Since the algorithm evaluates a Legendre expansion at an N+1N+1 Chebyshev grid as an intermediate step, it also provides a fast transform between Legendre coefficients and values on a Chebyshev grid

    Landau Collision Integral Solver with Adaptive Mesh Refinement on Emerging Architectures

    Full text link
    The Landau collision integral is an accurate model for the small-angle dominated Coulomb collisions in fusion plasmas. We investigate a high order accurate, fully conservative, finite element discretization of the nonlinear multi-species Landau integral with adaptive mesh refinement using the PETSc library (www.mcs.anl.gov/petsc). We develop algorithms and techniques to efficiently utilize emerging architectures with an approach that minimizes memory usage and movement and is suitable for vector processing. The Landau collision integral is vectorized with Intel AVX-512 intrinsics and the solver sustains as much as 22% of the theoretical peak flop rate of the Second Generation Intel Xeon Phi, Knights Landing, processor

    Simultaneous floating-point sine and cosine for VLIW integer processors

    Get PDF
    Accepted for publication in the proceedings of the 23rd IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2012).International audienceGraphics and signal processing applications often require that sines and cosines be evaluated at a same floating-point argument, and in such cases a very fast computation of the pair of values is desirable. This paper studies how 32-bit VLIW integer architectures can be exploited in order to perform this task accurately for IEEE single precision. We describe software implementations for sinf, cosf, and sincosf over [-pi/4,pi/4] that have a proven 1-ulp accuracy and whose latency on STMicroelectronics' ST231 VLIW integer processor is 19, 18, and 19 cycles, respectively. Such performances are obtained by introducing a novel algorithm for simultaneous sine and cosine that combines univariate and bivariate polynomial evaluation schemes

    Spectral/hp element methods: recent developments, applications, and perspectives

    Get PDF
    The spectral/hp element method combines the geometric flexibility of the classical h-type finite element technique with the desirable numerical properties of spectral methods, employing high-degree piecewise polynomial basis functions on coarse finite element-type meshes. The spatial approximation is based upon orthogonal polynomials, such as Legendre or Chebychev polynomials, modified to accommodate C0-continuous expansions. Computationally and theoretically, by increasing the polynomial order p, high-precision solutions and fast convergence can be obtained and, in particular, under certain regularity assumptions an exponential reduction in approximation error between numerical and exact solutions can be achieved. This method has now been applied in many simulation studies of both fundamental and practical engineering flows. This paper briefly describes the formulation of the spectral/hp element method and provides an overview of its application to computational fluid dynamics. In particular, it focuses on the use the spectral/hp element method in transitional flows and ocean engineering. Finally, some of the major challenges to be overcome in order to use the spectral/hp element method in more complex science and engineering applications are discussed

    Max-Sliced Wasserstein Distance and its use for GANs

    Full text link
    Generative adversarial nets (GANs) and variational auto-encoders have significantly improved our distribution modeling capabilities, showing promise for dataset augmentation, image-to-image translation and feature learning. However, to model high-dimensional distributions, sequential training and stacked architectures are common, increasing the number of tunable hyper-parameters as well as the training time. Nonetheless, the sample complexity of the distance metrics remains one of the factors affecting GAN training. We first show that the recently proposed sliced Wasserstein distance has compelling sample complexity properties when compared to the Wasserstein distance. To further improve the sliced Wasserstein distance we then analyze its `projection complexity' and develop the max-sliced Wasserstein distance which enjoys compelling sample complexity while reducing projection complexity, albeit necessitating a max estimation. We finally illustrate that the proposed distance trains GANs on high-dimensional images up to a resolution of 256x256 easily.Comment: Accepted to CVPR 201
    corecore