99 research outputs found
Parallel evaluation of Pittsburgh rule-based classifiers on GPUs
Individuals from Pittsburgh rule-based classifiers represent a complete solution
to the classification problem and each individual is a variable-length set
of rules. Therefore, these systems usually demand a high level of computational
resources and run-time, which increases as the complexity and the size
of the data sets. It is known that this computational cost is mainly due to
the recurring evaluation process of the rules and the individuals as rule sets.
In this paper we propose a parallel evaluation model of rules and rule sets on
GPUs based on the NVIDIA CUDA programming model which significantly
allows reducing the run-time and speeding up the algorithm. The results
obtained from the experimental study support the great efficiency and high
performance of the GPU model, which is scalable to multiple GPU devices.
The GPU model achieves a rule interpreter performance of up to 64 billion
operations per second and the evaluation of the individuals is speeded up of
up to 3.461× when compared to the CPU model. This provides a significant
advantage of the GPU model, especially addressing large and complex
problems within reasonable time, where the CPU run-time is not acceptabl
Speeding up Multiple Instance Learning Classification Rules on GPUs
Multiple instance learning is a challenging task in supervised learning and data mining. How-
ever, algorithm performance becomes slow when learning from large-scale and high-dimensional data sets.
Graphics processing units (GPUs) are being used for reducing computing time of algorithms. This paper
presents an implementation of the G3P-MI algorithm on GPUs for solving multiple instance problems
using classification rules. The GPU model proposed is distributable to multiple GPUs, seeking for its scal-
ability across large-scale and high-dimensional data sets. The proposal is compared to the multi-threaded
CPU algorithm with SSE parallelism over a series of data sets. Experimental results report that the com-
putation time can be significantly reduced and its scalability improved. Specifically, an speedup of up
to 149× can be achieved over the multi-threaded CPU algorithm when using four GPUs, and the rules
interpreter achieves great efficiency and runs over 108 billion Genetic Programming operations per second
Minería de Reglas de Asociación en GPU
Premio extraordinario de Trabajo Fin de Máster curso 2012-2013.Sistemas Inteligentes
Mixing multi-core CPUs and GPUs for scientific simulation software
Recent technological and economic developments have led to widespread availability of
multi-core CPUs and specialist accelerator processors such as graphical processing units
(GPUs). The accelerated computational performance possible from these devices can be very
high for some applications paradigms. Software languages and systems such as NVIDIA's
CUDA and Khronos consortium's open compute language (OpenCL) support a number of
individual parallel application programming paradigms. To scale up the performance of some
complex systems simulations, a hybrid of multi-core CPUs for coarse-grained parallelism and
very many core GPUs for data parallelism is necessary. We describe our use of hybrid applica-
tions using threading approaches and multi-core CPUs to control independent GPU devices.
We present speed-up data and discuss multi-threading software issues for the applications
level programmer and o er some suggested areas for language development and integration
between coarse-grained and ne-grained multi-thread systems. We discuss results from three
common simulation algorithmic areas including: partial di erential equations; graph cluster
metric calculations and random number generation. We report on programming experiences
and selected performance for these algorithms on: single and multiple GPUs; multi-core CPUs;
a CellBE; and using OpenCL. We discuss programmer usability issues and the outlook and
trends in multi-core programming for scienti c applications developers
A new parallelisation technique for heterogeneous CPUs
Parallelization has moved in recent years into the mainstream compilers, and the demand
for parallelizing tools that can do a better job of automatic parallelization is higher than
ever. During the last decade considerable attention has been focused on developing programming
tools that support both explicit and implicit parallelism to keep up with the
power of the new multiple core technology. Yet the success to develop automatic parallelising
compilers has been limited mainly due to the complexity of the analytic process
required to exploit available parallelism and manage other parallelisation measures such
as data partitioning, alignment and synchronization.
This dissertation investigates developing a programming tool that automatically parallelises
large data structures on a heterogeneous architecture and whether a high-level programming
language compiler can use this tool to exploit implicit parallelism and make use
of the performance potential of the modern multicore technology. The work involved the
development of a fully automatic parallelisation tool, called VSM, that completely hides
the underlying details of general purpose heterogeneous architectures. The VSM implementation
provides direct and simple access for users to parallelise array operations on the
Cell’s accelerators without the need for any annotations or process directives. This work
also involved the extension of the Glasgow Vector Pascal compiler to work with the VSM
implementation as a one compiler system. The developed compiler system, which is called
VP-Cell, takes a single source code and parallelises array expressions automatically.
Several experiments were conducted using Vector Pascal benchmarks to show the validity
of the VSM approach. The VP-Cell system achieved significant runtime performance
on one accelerator as compared to the master processor’s performance and near-linear
speedups over code runs on the Cell’s accelerators. Though VSM was mainly designed for
developing parallelising compilers it also showed a considerable performance by running
C code over the Cell’s accelerators
Genetic improvement of GPU software
We survey genetic improvement (GI) of general purpose computing on graphics cards. We summarise several experiments which demonstrate four themes. Experiments with the gzip program show that genetic programming can automatically port sequential C code to parallel code. Experiments with the StereoCamera program show that GI can upgrade legacy parallel code for new hardware and software. Experiments with NiftyReg and BarraCUDA show that GI can make substantial improvements to current parallel CUDA applications. Finally, experiments with the pknotsRG program show that with semi-automated approaches, enormous speed ups can sometimes be had by growing and grafting new code with genetic programming in combination with human input
- …