994 research outputs found

    Multicore-optimized wavefront diamond blocking for optimizing stencil updates

    Full text link
    The importance of stencil-based algorithms in computational science has focused attention on optimized parallel implementations for multilevel cache-based processors. Temporal blocking schemes leverage the large bandwidth and low latency of caches to accelerate stencil updates and approach theoretical peak performance. A key ingredient is the reduction of data traffic across slow data paths, especially the main memory interface. In this work we combine the ideas of multi-core wavefront temporal blocking and diamond tiling to arrive at stencil update schemes that show large reductions in memory pressure compared to existing approaches. The resulting schemes show performance advantages in bandwidth-starved situations, which are exacerbated by the high bytes per lattice update case of variable coefficients. Our thread groups concept provides a controllable trade-off between concurrency and memory usage, shifting the pressure between the memory interface and the CPU. We present performance results on a contemporary Intel processor

    Encoding of arbitrary micrometric complex illumination patterns with reduced speckle

    Get PDF
    In nonlinear microscopy, phase-only spatial light modulators (SLMs) allow achieving simultaneous two-photon excitation and fluorescence emission from specific regionof-interests (ROIs). However, as iterative Fourier transform algorithms (IFTAs) can only approximate the illumination of selected ROIs, both image formation and/or signal acquisition can be largely affected by the spatial irregularities of the illumination patterns and the speckle noise. To overcome these limitations, we propose an alternative complex illumination method (CIM) able to generate simultaneous excitation of large-area ROIs with full control over the amplitude and phase of light and reduced speckle. As a proof-of-concept we experimentally demonstrate single-photon and second harmonic generation (SHG) with structured illumination over large-area ROIs

    Power-Balanced Hybrid Optics Boosted Design for Achromatic Extended-Depth-of-Field Imaging via Optimized Mixed OTF

    Get PDF
    The power-balanced hybrid optical imaging system is a special design of a diffractive computational camera, introduced in this paper, with image formation by a refractive lens and Multilevel Phase Mask (MPM). This system provides a long focal depth with low chromatic aberrations thanks to MPM and a high energy light concentration due to the refractive lens. We introduce the concept of optical power balance between the lens and MPM which controls the contribution of each element to modulate the incoming light. Additional unique features of our MPM design are the inclusion of quantization of the MPM's shape on the number of levels and the Fresnel order (thickness) using a smoothing function. To optimize optical power-balance as well as the MPM, we build a fully-differentiable image formation model for joint optimization of optical and imaging parameters for the proposed camera using Neural Network techniques. Additionally, we optimize a single Wiener-like optical transfer function (OTF) invariant to depth to reconstruct a sharp image. We numerically and experimentally compare the designed system with its counterparts, lensless and just-lens optical systems, for the visible wavelength interval (400-700)nm and the depth-of-field range (0.5-∞\inftym for numerical and 0.5-2m for experimental). The attained results demonstrate that the proposed system equipped with the optimal OTF overcomes its counterparts (even when they are used with optimized OTF) in terms of reconstruction quality for off-focus distances. The simulation results also reveal that optimizing the optical power-balance, Fresnel order, and the number of levels parameters are essential for system performance attaining an improvement of up to 5dB of PSNR using the optimized OTF compared with its counterpart lensless setup.Comment: 18 pages, 14 figure

    Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction

    Full text link
    Abstract—Many sparse matrix computations can be speeded up if the matrix is first reordered. Reordering was originally developed for direct methods but it has recently become popular for improving the cache locality of parallel iterative solvers since reordering the matrix to reduce bandwidth and wavefront can improve the locality of reference of sparse matrix-vector multiplication (SpMV), the key kernel in iterative solvers. In this paper, we present the first parallel implementations of two widely used reordering algorithms: Reverse Cuthill-McKee (RCM) and Sloan. On 16 cores of the Stampede supercomputer, our parallel RCM is 5.56 times faster on the average than a state-of-the-art sequential implementation of RCM in the HSL library. Sloan is significantly more constrained than RCM, but our parallel implementation achieves a speedup of 2.88X on the average over sequential HSL-Sloan. Reordering the matrix using our parallel RCM and then performing 100 SpMV iterations is twice as fast as using HSL-RCM and then performing the SpMV iterations; it is also 1.5 times faster than performing the SpMV iterations without reordering the matrix. I

    Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning

    Get PDF
    Multilevel partitioning methods that are inspired by principles of multiscaling are the most powerful practical hypergraph partitioning solvers. Hypergraph partitioning has many applications in disciplines ranging from scientific computing to data science. In this paper we introduce the concept of algebraic distance on hypergraphs and demonstrate its use as an algorithmic component in the coarsening stage of multilevel hypergraph partitioning solvers. The algebraic distance is a vertex distance measure that extends hyperedge weights for capturing the local connectivity of vertices which is critical for hypergraph coarsening schemes. The practical effectiveness of the proposed measure and corresponding coarsening scheme is demonstrated through extensive computational experiments on a diverse set of problems. Finally, we propose a benchmark of hypergraph partitioning problems to compare the quality of other solvers
    • …
    corecore