Kinetic equations represent the natural theoretical and computational tool for the investigation of rarefaction effects in
gaseous flows. Their complex mathematical structure leads to numerical schemes of various complexity whose common
feature is the considerable demand of computing resources. In the case of a dilute gas, the most complex term, i.e. the
collision term, has a spatially local structure. Hence, its time consuming numerical evaluation or simulation can be concurrently
performed on multi-processor hardware platforms. Recent developments of hardware and software tools have
made the massively parallel architecture of graphic processing units (GPUs) available for low cost scientific computing.
The paper aims at showing that a particular class of numerical schemes, based on finite difference discretization of the
distribution function combined with Monte Carlo evaluation of the collision integral, is very well adapted to the single
instruction multiple data (SIMD) structure of GPUs, allowing a two orders of magnitude reduction of the computing time
required by the single threaded version of the same code. The numerical scheme implementation is discussed and its
application is illustrated by solving the full nonlinear unsteady Boltzmann equation in two dimensional planar geometry
and by solving a system of coupled Boltzmann equations to investigate the sound propagation in a binary mixture. The
strategies to correct the scheme main drawbacks and further improvements of its performances are discussed