7,200 research outputs found
TensorFlow Enabled Genetic Programming
Genetic Programming, a kind of evolutionary computation and machine learning
algorithm, is shown to benefit significantly from the application of vectorized
data and the TensorFlow numerical computation library on both CPU and GPU
architectures. The open source, Python Karoo GP is employed for a series of 190
tests across 6 platforms, with real-world datasets ranging from 18 to 5.5M data
points. This body of tests demonstrates that datasets measured in tens and
hundreds of data points see 2-15x improvement when moving from the scalar/SymPy
configuration to the vector/TensorFlow configuration, with a single core
performing on par or better than multiple CPU cores and GPUs. A dataset
composed of 90,000 data points demonstrates a single vector/TensorFlow CPU core
performing 875x better than 40 scalar/Sympy CPU cores. And a dataset containing
5.5M data points sees GPU configurations out-performing CPU configurations on
average by 1.3x.Comment: 8 pages, 5 figures; presented at GECCO 2017, Berlin, German
CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication
Sparse matrix-vector multiplication (SpMV) is a fundamental building block
for numerous applications. In this paper, we propose CSR5 (Compressed Sparse
Row 5), a new storage format, which offers high-throughput SpMV on various
platforms including CPUs, GPUs and Xeon Phi. First, the CSR5 format is
insensitive to the sparsity structure of the input matrix. Thus the single
format can support an SpMV algorithm that is efficient both for regular
matrices and for irregular matrices. Furthermore, we show that the overhead of
the format conversion from the CSR to the CSR5 can be as low as the cost of a
few SpMV operations. We compare the CSR5-based SpMV algorithm with 11
state-of-the-art formats and algorithms on four mainstream processors using 14
regular and 10 irregular matrices as a benchmark suite. For the 14 regular
matrices in the suite, we achieve comparable or better performance over the
previous work. For the 10 irregular matrices, the CSR5 obtains average
performance improvement of 17.6\%, 28.5\%, 173.0\% and 293.3\% (up to 213.3\%,
153.6\%, 405.1\% and 943.3\%) over the best existing work on dual-socket Intel
CPUs, an nVidia GPU, an AMD GPU and an Intel Xeon Phi, respectively. For
real-world applications such as a solver with only tens of iterations, the CSR5
format can be more practical because of its low-overhead for format conversion.
The source code of this work is downloadable at
https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5Comment: 12 pages, 10 figures, In Proceedings of the 29th ACM International
Conference on Supercomputing (ICS '15
Buffer overflow vulnerabilities in CUDA: a preliminary analysis
We present a preliminary study of buffer overflow vulnerabilities in CUDA
software running on GPUs. We show how an attacker can overrun a buffer to
corrupt sensitive data or steer the execution flow by overwriting function
pointers, e.g., manipulating the virtual table of a C++ object. In view of a
potential mass market diffusion of GPU accelerated software this may be a major
concern.Comment: 12 pages, 2 figure
- …