108 research outputs found
Sample-Parallel Execution of EBCOT in Fast Mode
JPEG 2000’s most computationally expensive building
block is the Embedded Block Coder with Optimized Truncation
(EBCOT). This paper evaluates how encoders targeting a parallel
architecture such as a GPU can increase their throughput in use
cases where very high data rates are used. The compression
efficiency in the less significant bit-planes is then often poor and
it is beneficial to enable the Selective Arithmetic Coding Bypass
style (fast mode) in order to trade a small loss in compression
efficiency for a reduction of the computational complexity. More
importantly, this style exposes a more finely grained parallelism
that can be exploited to execute the raw coding passes, including
bit-stuffing, in a sample-parallel fashion. For a latency- or
memory critical application that encodes one frame at a time,
EBCOT’s tier-1 is sped up between 1.1x and 2.4x compared to an
optimized GPU-based implementation. When a low GPU
occupancy has already been addressed by encoding multiple
frames in parallel, the throughput can still be improved by 5%
for high-entropy images and 27% for low-entropy images. Best
results are obtained when enabling the fast mode after the fourth
significant bit-plane. For most of the test images the compression
rate is within 1% of the original
Evaluation of GPU/CPU Co-Processing Models for JPEG 2000 Packetization
With the bottom-line goal of increasing the
throughput of a GPU-accelerated JPEG 2000 encoder, this paper
evaluates whether the post-compression rate control and
packetization routines should be carried out on the CPU or on
the GPU. Three co-processing models that differ in how the
workload is split among the CPU and GPU are introduced. Both
routines are discussed and algorithms for executing them in
parallel are presented. Experimental results for compressing a
detail-rich UHD sequence to 4 bits/sample indicate speed-ups of
200x for the rate control and 100x for the packetization
compared to the single-threaded implementation in the
commercial Kakadu library. These two routines executed on the
CPU take 4x as long as all remaining coding steps on the GPU
and therefore present a bottleneck. Even if the CPU bottleneck
could be avoided with multi-threading, it is still beneficial to
execute all coding steps on the GPU as this minimizes the
required device-to-host transfer and thereby speeds up the
critical path from 17.2 fps to 19.5 fps for 4 bits/sample and to
22.4 fps for 0.16 bits/sample
A New Strategy to Improve the Performance of PDP-Systems Simulators
One of the major challenges that current P systems simulators
have to deal with is to be as efficient as possible. A P system
is syntactically described as a membrane structure delimiting regions
where multisets of objects evolve by means of evolution rules. According
to that, on each computation step, the applicability of the rules for
the current P system configuration must be calculated. In this paper we
extend previous works that use Rete-based simulation algorithm in order
to improve the time consumed during the checking phase in the selection
of rules. A new approach is presented, oriented to the acceleration of
Population Dynamics P Systems simulations.Ministerio de Economía y Competitividad TIN2012- 3743
An Improved GPU Simulator For Spiking Neural P Systems
Spiking Neural P (SNP) systems, variants of Psystems (under Membrane and Natural computing), are computing models that acquire abstraction and inspiration from the way neurons 'compute' or process information. Similar to other P system variants, SNP systems are Turing complete models that by nature compute non-deterministically and in a maximally parallel manner. P systems usually trade (often exponential) space for (polynomial to constant) time. Due to this nature, P system variants are currently limited to parallel simulations, and several variants have already been simulated in parallel devices. In this paper we present an improved SNP system simulator based on graphics processing units (GPUs). Among other reasons, current GPUs are architectured for massively parallel computations, thus making GPUs very suitable for SNP system simulation. The computing model, hardware/software considerations, and simulation algorithm are presented, as well as the comparisons of the CPU only and CPU-GPU based simulators.Ministerio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420
Movies Tags Extraction Using Deep Learning
Retrieving information from movies is becoming increasingly
demanding due to the enormous amount of multimedia
data generated each day. Not only it helps in efficient
search, archiving and classification of movies, but is also instrumental
in content censorship and recommendation systems.
Extracting key information from a movie and summarizing
it in a few tags which best describe the movie presents
a dedicated challenge and requires an intelligent approach
to automatically analyze the movie. In this paper, we formulate
movies tags extraction problem as a machine learning
classification problem and train a Convolution Neural Network
(CNN) on a carefully constructed tag vocabulary. Our
proposed technique first extracts key frames from a movie
and applies the trained classifier on the key frames. The
predictions from the classifier are assigned scores and are
filtered based on their relative strengths to generate a compact
set of most relevant key tags. We performed a rigorous
subjective evaluation of our proposed technique for a
wide variety of movies with different experiments. The evaluation
results presented in this paper demonstrate that our
proposed approach can efficiently extract the key tags of a
movie with a good accuracy
Solving Sudoku with Membrane Computing
Sudoku is a very popular puzzle which consists on
placing several numbers in a squared grid according to some
simple rules. In this paper we present an efficient family of P
systems which solve sudokus of any order verifying a specific
property. The solution is searched by using a simple human-style
method. If the sudoku cannot be solved by using this strategy, the
P system detects this drawback and then the computations stops
and returns No. Otherwise, the P system encodes the solution
and returns Yes in the last computation step.Ministerio de Ciencia e Innovación TIN2008-04487-EMinisterio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420
Spiking Neural P Systems with Structural Plasticity: Attacking the Subset Sum Problem
Spiking neural P systems with structural plasticity (in short,
SNPSP systems) are models of computations inspired by the function and
structure of biological neurons. In SNPSP systems, neurons can create
or delete synapses using plasticity rules. We report two families of solutions:
a non-uniform and a uniform one, to the NP-complete problem
Subset Sum using SNPSP systems. Instead of the usual rule-level nondeterminism
(choosing which rule to apply) we use synapse-level nondeterminism
(choosing which synapses to create or delete). The nondeterminism
due to plasticity rules have the following improvements from a
previous solution: in our non-uniform solution, plasticity rules allowed
for a normal form to be used (i.e. without forgetting rules or rules with
delays, system is simple, only synapse-level nondeterminism); in our uniform
solution the number of neurons and the computation steps are
reduced.Ministerio de Economía y Competitividad TIN2012-3743
Simulating FRSN P Systems with Real Numbers in P-Lingua on sequential and CUDA platforms
Fuzzy Reasoning Spiking Neural P systems (FRSN P systems,
for short) is a variant of Spiking Neural P systems incorporating
fuzzy logic elements that make it suitable to model fuzzy diagnosis knowledge
and reasoning required for fault diagnosis applications. In this sense,
several FRSN P system variants have been proposed, dealing with real
numbers, trapezoidal numbers, weights, etc. The model incorporating
real numbers was the first introduced [13], presenting promising applications
in the field of fault diagnosis of electrical systems. For this variant,
a matrix-based algorithm was provided which, when executed on parallel
computing platforms, fully exploits the model maximally parallel
capacities. In this paper we introduce a P-Lingua framework extension
to parse and simulate FRSN P systems with real numbers. Two simulators,
implementing a variant of the original matrix-based simulation
algorithm, are provided: a sequential one (written in Java), intended to
run on traditional CPUs, and a parallel one, intended to run on CUDAenabled
devices.Ministerio de Economía y Competitividad TIN2012-3743
When Matrices Meet Brains
Spiking neural P systems (SN P systems, for short) are a class of distributed
parallel computing devices inspired from the way neurons communicate by means of
spikes. In this work, a discrete structure representation of SN P systems is proposed.
Specifically, matrices are used to represent SN P systems. In order to represent the
computations of SN P systems by matrices, configuration vectors are defined to monitor
the number of spikes in each neuron at any given configuration; transition net gain vectors
are also introduced to quantify the total amount of spikes consumed and produced after
the chosen rules are applied. Nondeterminism of the systems is assured by a set of spiking
transition vectors that could be used at any given time during the computation. With
such matrix representation, it is quite convenient to determine the next configuration
from a given configuration, since it involves only multiplying vectors to a matrix and
adding vectors
The Reduction Problem in CUDA and Its Simulation with P Systems
We introduce P systems with dynamic communication graphs which simu-
late the functioning of the CUDA architecture when solving the parallel reduction prob-
lem
- …