339 research outputs found
Speeding up Multiple Instance Learning Classification Rules on GPUs
Multiple instance learning is a challenging task in supervised learning and data mining. How-
ever, algorithm performance becomes slow when learning from large-scale and high-dimensional data sets.
Graphics processing units (GPUs) are being used for reducing computing time of algorithms. This paper
presents an implementation of the G3P-MI algorithm on GPUs for solving multiple instance problems
using classification rules. The GPU model proposed is distributable to multiple GPUs, seeking for its scal-
ability across large-scale and high-dimensional data sets. The proposal is compared to the multi-threaded
CPU algorithm with SSE parallelism over a series of data sets. Experimental results report that the com-
putation time can be significantly reduced and its scalability improved. Specifically, an speedup of up
to 149× can be achieved over the multi-threaded CPU algorithm when using four GPUs, and the rules
interpreter achieves great efficiency and runs over 108 billion Genetic Programming operations per second
High Performance Twitter Sentiment Analysis Using CUDA Based Distance Kernel on GPUs
Sentiment analysis techniques are widely used for extracting feelings of users in different domains such as social media content, surveys, and user reviews. This is mostly performed by using classical text classification techniques. One of the major challenges in this field is having a large and sparse feature space that stems from sparse representation of texts. The high dimensionality of the feature space creates a serious problem in terms of time and performance for sentiment analysis. This is particularly important when selected classifier requires intense calculations as in k-NN. To cope with this problem, we used sentiment analysis techniques for Turkish Twitter feeds using the NVIDIA’s CUDA technology. We employed our CUDA-based distance kernel implementation for k-NN which is a widely used lazy classifier in this field. We conducted our experiments on four machines with different computing capacities in terms of GPU and CPU configuration to analyze the impact on speed-up
Exploiting Tournament Selection for Efficient Parallel Genetic Programming
Genetic Programming (GP) is a computationally intensive technique which is
naturally parallel in nature. Consequently, many attempts have been made to
improve its run-time from exploiting highly parallel hardware such as GPUs.
However, a second methodology of improving the speed of GP is through
efficiency techniques such as subtree caching. However achieving parallel
performance and efficiency is a difficult task. This paper will demonstrate an
efficiency saving for GP compatible with the harnessing of parallel CPU
hardware by exploiting tournament selection. Significant efficiency savings are
demonstrated whilst retaining the capability of a high performance parallel
implementation of GP. Indeed, a 74% improvement in the speed of GP is achieved
with a peak rate of 96 billion GPop/s for classification type problems
Speeding Up Evolutionary Learning Algorithms using GPUs
This paper propose a multithreaded Genetic
Programming classi cation evaluation model
using NVIDIA CUDA GPUs to reduce the
computational time due to the poor perfor-
mance in large problems. Two di erent clas-
si cation algorithms are benchmarked using
UCI Machine Learning data sets. Experi-
mental results compare the performance us-
ing single and multithreaded Java, C and
GPU code and show the e ciency far better
obtained by our proposal
Parallel evaluation of Pittsburgh rule-based classifiers on GPUs
Individuals from Pittsburgh rule-based classifiers represent a complete solution
to the classification problem and each individual is a variable-length set
of rules. Therefore, these systems usually demand a high level of computational
resources and run-time, which increases as the complexity and the size
of the data sets. It is known that this computational cost is mainly due to
the recurring evaluation process of the rules and the individuals as rule sets.
In this paper we propose a parallel evaluation model of rules and rule sets on
GPUs based on the NVIDIA CUDA programming model which significantly
allows reducing the run-time and speeding up the algorithm. The results
obtained from the experimental study support the great efficiency and high
performance of the GPU model, which is scalable to multiple GPU devices.
The GPU model achieves a rule interpreter performance of up to 64 billion
operations per second and the evaluation of the individuals is speeded up of
up to 3.461× when compared to the CPU model. This provides a significant
advantage of the GPU model, especially addressing large and complex
problems within reasonable time, where the CPU run-time is not acceptabl
Enabling scalable stochastic gradient-based inference for Gaussian processes by employing the Unbiased LInear System SolvEr (ULISSE)
In applications of Gaussian processes where quantification of uncertainty is
of primary interest, it is necessary to accurately characterize the posterior
distribution over covariance parameters. This paper proposes an adaptation of
the Stochastic Gradient Langevin Dynamics algorithm to draw samples from the
posterior distribution over covariance parameters with negligible bias and
without the need to compute the marginal likelihood. In Gaussian process
regression, this has the enormous advantage that stochastic gradients can be
computed by solving linear systems only. A novel unbiased linear systems solver
based on parallelizable covariance matrix-vector products is developed to
accelerate the unbiased estimation of gradients. The results demonstrate the
possibility to enable scalable and exact (in a Monte Carlo sense)
quantification of uncertainty in Gaussian processes without imposing any
special structure on the covariance or reducing the number of input vectors.Comment: 10 pages - paper accepted at ICML 201
Scalable CAIM Discretization on Multiple GPUs Using Concurrent Kernels
CAIM(Class-Attribute InterdependenceMaximization) is one of the stateof-
the-art algorithms for discretizing data for which classes are known. However, it
may take a long time when run on high-dimensional large-scale data, with large number
of attributes and/or instances. This paper presents a solution to this problem by
introducing a GPU-based implementation of the CAIM algorithm that significantly
speeds up the discretization process on big complex data sets. The GPU-based implementation
is scalable to multiple GPU devices and enables the use of concurrent
kernels execution capabilities ofmodernGPUs. The CAIMGPU-basedmodel is evaluated
and compared with the original CAIM using single and multi-threaded parallel
configurations on 40 data sets with different characteristics. The results show great
speedup, up to 139 times faster using 4 GPUs, which makes discretization of big
data efficient and manageable. For example, discretization time of one big data set is
reduced from 2 hours to less than 2 minute
- …