994 research outputs found
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
The importance of stencil-based algorithms in computational science has
focused attention on optimized parallel implementations for multilevel
cache-based processors. Temporal blocking schemes leverage the large bandwidth
and low latency of caches to accelerate stencil updates and approach
theoretical peak performance. A key ingredient is the reduction of data traffic
across slow data paths, especially the main memory interface. In this work we
combine the ideas of multi-core wavefront temporal blocking and diamond tiling
to arrive at stencil update schemes that show large reductions in memory
pressure compared to existing approaches. The resulting schemes show
performance advantages in bandwidth-starved situations, which are exacerbated
by the high bytes per lattice update case of variable coefficients. Our thread
groups concept provides a controllable trade-off between concurrency and memory
usage, shifting the pressure between the memory interface and the CPU. We
present performance results on a contemporary Intel processor
Encoding of arbitrary micrometric complex illumination patterns with reduced speckle
In nonlinear microscopy, phase-only spatial light modulators (SLMs) allow
achieving simultaneous two-photon excitation and fluorescence emission from specific regionof-interests (ROIs). However, as iterative Fourier transform algorithms (IFTAs) can only
approximate the illumination of selected ROIs, both image formation and/or signal acquisition
can be largely affected by the spatial irregularities of the illumination patterns and the speckle
noise. To overcome these limitations, we propose an alternative complex illumination method
(CIM) able to generate simultaneous excitation of large-area ROIs with full control over the
amplitude and phase of light and reduced speckle. As a proof-of-concept we experimentally
demonstrate single-photon and second harmonic generation (SHG) with structured
illumination over large-area ROIs
Power-Balanced Hybrid Optics Boosted Design for Achromatic Extended-Depth-of-Field Imaging via Optimized Mixed OTF
The power-balanced hybrid optical imaging system is a special design of a
diffractive computational camera, introduced in this paper, with image
formation by a refractive lens and Multilevel Phase Mask (MPM). This system
provides a long focal depth with low chromatic aberrations thanks to MPM and a
high energy light concentration due to the refractive lens. We introduce the
concept of optical power balance between the lens and MPM which controls the
contribution of each element to modulate the incoming light. Additional unique
features of our MPM design are the inclusion of quantization of the MPM's shape
on the number of levels and the Fresnel order (thickness) using a smoothing
function. To optimize optical power-balance as well as the MPM, we build a
fully-differentiable image formation model for joint optimization of optical
and imaging parameters for the proposed camera using Neural Network techniques.
Additionally, we optimize a single Wiener-like optical transfer function (OTF)
invariant to depth to reconstruct a sharp image. We numerically and
experimentally compare the designed system with its counterparts, lensless and
just-lens optical systems, for the visible wavelength interval (400-700)nm and
the depth-of-field range (0.5-m for numerical and 0.5-2m for
experimental). The attained results demonstrate that the proposed system
equipped with the optimal OTF overcomes its counterparts (even when they are
used with optimized OTF) in terms of reconstruction quality for off-focus
distances. The simulation results also reveal that optimizing the optical
power-balance, Fresnel order, and the number of levels parameters are essential
for system performance attaining an improvement of up to 5dB of PSNR using the
optimized OTF compared with its counterpart lensless setup.Comment: 18 pages, 14 figure
Parallelization of Reordering Algorithms for Bandwidth and Wavefront Reduction
Abstract—Many sparse matrix computations can be speeded up if the matrix is first reordered. Reordering was originally developed for direct methods but it has recently become popular for improving the cache locality of parallel iterative solvers since reordering the matrix to reduce bandwidth and wavefront can improve the locality of reference of sparse matrix-vector multiplication (SpMV), the key kernel in iterative solvers. In this paper, we present the first parallel implementations of two widely used reordering algorithms: Reverse Cuthill-McKee (RCM) and Sloan. On 16 cores of the Stampede supercomputer, our parallel RCM is 5.56 times faster on the average than a state-of-the-art sequential implementation of RCM in the HSL library. Sloan is significantly more constrained than RCM, but our parallel implementation achieves a speedup of 2.88X on the average over sequential HSL-Sloan. Reordering the matrix using our parallel RCM and then performing 100 SpMV iterations is twice as fast as using HSL-RCM and then performing the SpMV iterations; it is also 1.5 times faster than performing the SpMV iterations without reordering the matrix. I
Relaxation-Based Coarsening for Multilevel Hypergraph Partitioning
Multilevel partitioning methods that are inspired by principles of
multiscaling are the most powerful practical hypergraph partitioning solvers.
Hypergraph partitioning has many applications in disciplines ranging from
scientific computing to data science. In this paper we introduce the concept of
algebraic distance on hypergraphs and demonstrate its use as an algorithmic
component in the coarsening stage of multilevel hypergraph partitioning
solvers. The algebraic distance is a vertex distance measure that extends
hyperedge weights for capturing the local connectivity of vertices which is
critical for hypergraph coarsening schemes. The practical effectiveness of the
proposed measure and corresponding coarsening scheme is demonstrated through
extensive computational experiments on a diverse set of problems. Finally, we
propose a benchmark of hypergraph partitioning problems to compare the quality
of other solvers
- …