13,651 research outputs found
Throughput Scaling Of Convolution For Error-Tolerant Multimedia Applications
Convolution and cross-correlation are the basis of filtering and pattern or
template matching in multimedia signal processing. We propose two throughput
scaling options for any one-dimensional convolution kernel in programmable
processors by adjusting the imprecision (distortion) of computation. Our
approach is based on scalar quantization, followed by two forms of tight
packing in floating-point (one of which is proposed in this paper) that allow
for concurrent calculation of multiple results. We illustrate how our approach
can operate as an optional pre- and post-processing layer for off-the-shelf
optimized convolution routines. This is useful for multimedia applications that
are tolerant to processing imprecision and for cases where the input signals
are inherently noisy (error tolerant multimedia applications). Indicative
experimental results with a digital music matching system and an MPEG-7 audio
descriptor system demonstrate that the proposed approach offers up to 175%
increase in processing throughput against optimized (full-precision)
convolution with virtually no effect in the accuracy of the results. Based on
marginal statistics of the input data, it is also shown how the throughput and
distortion can be adjusted per input block of samples under constraints on the
signal-to-noise ratio against the full-precision convolution.Comment: IEEE Trans. on Multimedia, 201
k-Nearest Neighbour Classifiers: 2nd Edition (with Python examples)
Perhaps the most straightforward classifier in the arsenal or machine
learning techniques is the Nearest Neighbour Classifier -- classification is
achieved by identifying the nearest neighbours to a query example and using
those neighbours to determine the class of the query. This approach to
classification is of particular importance because issues of poor run-time
performance is not such a problem these days with the computational power that
is available. This paper presents an overview of techniques for Nearest
Neighbour classification focusing on; mechanisms for assessing similarity
(distance), computational issues in identifying nearest neighbours and
mechanisms for reducing the dimension of the data.
This paper is the second edition of a paper previously published as a
technical report. Sections on similarity measures for time-series, retrieval
speed-up and intrinsic dimensionality have been added. An Appendix is included
providing access to Python code for the key methods.Comment: 22 pages, 15 figures: An updated edition of an older tutorial on kN
ColDICE: a parallel Vlasov-Poisson solver using moving adaptive simplicial tessellation
Resolving numerically Vlasov-Poisson equations for initially cold systems can
be reduced to following the evolution of a three-dimensional sheet evolving in
six-dimensional phase-space. We describe a public parallel numerical algorithm
consisting in representing the phase-space sheet with a conforming,
self-adaptive simplicial tessellation of which the vertices follow the
Lagrangian equations of motion. The algorithm is implemented both in six- and
four-dimensional phase-space. Refinement of the tessellation mesh is performed
using the bisection method and a local representation of the phase-space sheet
at second order relying on additional tracers created when needed at runtime.
In order to preserve in the best way the Hamiltonian nature of the system,
refinement is anisotropic and constrained by measurements of local Poincar\'e
invariants. Resolution of Poisson equation is performed using the fast Fourier
method on a regular rectangular grid, similarly to particle in cells codes. To
compute the density projected onto this grid, the intersection of the
tessellation and the grid is calculated using the method of Franklin and
Kankanhalli (1993) generalised to linear order. As preliminary tests of the
code, we study in four dimensional phase-space the evolution of an initially
small patch in a chaotic potential and the cosmological collapse of a
fluctuation composed of two sinusoidal waves. We also perform a "warm" dark
matter simulation in six-dimensional phase-space that we use to check the
parallel scaling of the code.Comment: Code and illustration movies available at:
http://www.vlasix.org/index.php?n=Main.ColDICE - Article submitted to Journal
of Computational Physic
Working With Incremental Spatial Data During Parallel (GPU) Computation
Central to many complex systems, spatial actors require an awareness of their local environment to enable behaviours such as communication and navigation. Complex system simulations represent this behaviour with Fixed Radius Near Neighbours (FRNN) search. This algorithm allows actors to store data at spatial locations and then query the data structure to find all data stored within a fixed radius of the search origin.
The work within this thesis answers the question: What techniques can be used for improving the performance of FRNN searches during complex system simulations on Graphics Processing Units (GPUs)?
It is generally agreed that Uniform Spatial Partitioning (USP) is the most suitable data structure for providing FRNN search on GPUs. However, due to the architectural complexities of GPUs, the performance is constrained such that FRNN search remains one of the most expensive common stages between complex systems models.
Existing innovations to USP highlight a need to take advantage of recent GPU advances, reducing the levels of divergence and limiting redundant memory accesses as viable routes to improve the performance of FRNN search. This thesis addresses these with three separate optimisations that can be used simultaneously.
Experiments have assessed the impact of optimisations to the general case of FRNN search found within complex system simulations and demonstrated their impact in practice when applied to full complex system models. Results presented show the performance of the construction and query stages of FRNN search can be improved by over 2x and 1.3x respectively. These improvements allow complex system simulations to be executed faster, enabling increases in scale and model complexity
Multi-criteria scheduling of pipeline workflows
Mapping workflow applications onto parallel platforms is a challenging
problem, even for simple application patterns such as pipeline graphs. Several
antagonist criteria should be optimized, such as throughput and latency (or a
combination). In this paper, we study the complexity of the bi-criteria mapping
problem for pipeline graphs on communication homogeneous platforms. In
particular, we assess the complexity of the well-known chains-to-chains problem
for different-speed processors, which turns out to be NP-hard. We provide
several efficient polynomial bi-criteria heuristics, and their relative
performance is evaluated through extensive simulations
- …