636 research outputs found
Sample-Parallel Execution of EBCOT in Fast Mode
JPEG 2000’s most computationally expensive building
block is the Embedded Block Coder with Optimized Truncation
(EBCOT). This paper evaluates how encoders targeting a parallel
architecture such as a GPU can increase their throughput in use
cases where very high data rates are used. The compression
efficiency in the less significant bit-planes is then often poor and
it is beneficial to enable the Selective Arithmetic Coding Bypass
style (fast mode) in order to trade a small loss in compression
efficiency for a reduction of the computational complexity. More
importantly, this style exposes a more finely grained parallelism
that can be exploited to execute the raw coding passes, including
bit-stuffing, in a sample-parallel fashion. For a latency- or
memory critical application that encodes one frame at a time,
EBCOT’s tier-1 is sped up between 1.1x and 2.4x compared to an
optimized GPU-based implementation. When a low GPU
occupancy has already been addressed by encoding multiple
frames in parallel, the throughput can still be improved by 5%
for high-entropy images and 27% for low-entropy images. Best
results are obtained when enabling the fast mode after the fourth
significant bit-plane. For most of the test images the compression
rate is within 1% of the original
Simulating Spiking Neural P systems without delays using GPUs
We present in this paper our work regarding simulating a type of P system
known as a spiking neural P system (SNP system) using graphics processing units
(GPUs). GPUs, because of their architectural optimization for parallel
computations, are well-suited for highly parallelizable problems. Due to the
advent of general purpose GPU computing in recent years, GPUs are not limited
to graphics and video processing alone, but include computationally intensive
scientific and mathematical applications as well. Moreover P systems, including
SNP systems, are inherently and maximally parallel computing models whose
inspirations are taken from the functioning and dynamics of a living cell. In
particular, SNP systems try to give a modest but formal representation of a
special type of cell known as the neuron and their interactions with one
another. The nature of SNP systems allowed their representation as matrices,
which is a crucial step in simulating them on highly parallel devices such as
GPUs. The highly parallel nature of SNP systems necessitate the use of hardware
intended for parallel computations. The simulation algorithms, design
considerations, and implementation are presented. Finally, simulation results,
observations, and analyses using an SNP system that generates all numbers in
- {1} are discussed, as well as recommendations for future work.Comment: 19 pages in total, 4 figures, listings/algorithms, submitted at the
9th Brainstorming Week in Membrane Computing, University of Seville, Spai
Evaluation of GPU/CPU Co-Processing Models for JPEG 2000 Packetization
With the bottom-line goal of increasing the
throughput of a GPU-accelerated JPEG 2000 encoder, this paper
evaluates whether the post-compression rate control and
packetization routines should be carried out on the CPU or on
the GPU. Three co-processing models that differ in how the
workload is split among the CPU and GPU are introduced. Both
routines are discussed and algorithms for executing them in
parallel are presented. Experimental results for compressing a
detail-rich UHD sequence to 4 bits/sample indicate speed-ups of
200x for the rate control and 100x for the packetization
compared to the single-threaded implementation in the
commercial Kakadu library. These two routines executed on the
CPU take 4x as long as all remaining coding steps on the GPU
and therefore present a bottleneck. Even if the CPU bottleneck
could be avoided with multi-threading, it is still beneficial to
execute all coding steps on the GPU as this minimizes the
required device-to-host transfer and thereby speeds up the
critical path from 17.2 fps to 19.5 fps for 4 bits/sample and to
22.4 fps for 0.16 bits/sample
A New Strategy to Improve the Performance of PDP-Systems Simulators
One of the major challenges that current P systems simulators
have to deal with is to be as efficient as possible. A P system
is syntactically described as a membrane structure delimiting regions
where multisets of objects evolve by means of evolution rules. According
to that, on each computation step, the applicability of the rules for
the current P system configuration must be calculated. In this paper we
extend previous works that use Rete-based simulation algorithm in order
to improve the time consumed during the checking phase in the selection
of rules. A new approach is presented, oriented to the acceleration of
Population Dynamics P Systems simulations.Ministerio de Economía y Competitividad TIN2012- 3743
An Improved GPU Simulator For Spiking Neural P Systems
Spiking Neural P (SNP) systems, variants of Psystems (under Membrane and Natural computing), are computing models that acquire abstraction and inspiration from the way neurons 'compute' or process information. Similar to other P system variants, SNP systems are Turing complete models that by nature compute non-deterministically and in a maximally parallel manner. P systems usually trade (often exponential) space for (polynomial to constant) time. Due to this nature, P system variants are currently limited to parallel simulations, and several variants have already been simulated in parallel devices. In this paper we present an improved SNP system simulator based on graphics processing units (GPUs). Among other reasons, current GPUs are architectured for massively parallel computations, thus making GPUs very suitable for SNP system simulation. The computing model, hardware/software considerations, and simulation algorithm are presented, as well as the comparisons of the CPU only and CPU-GPU based simulators.Ministerio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420
Model Driven Evolution of an Agent-Based Home Energy Management System
Advanced smart home appliances and new models of energy tariffs imposed
by energy providers pose new challenges in the automation of home energy
management. Users need some assistant tool that helps them to make complex decisions
with different goals, depending on the current situation. Multi-agent systems
have proved to be a suitable technology to develop self-management systems,
able to take the most adequate decision under different context-dependent situations,
like the home energy management. The heterogeneity of home appliances
and also the changes in the energy policies of providers introduce the necessity of
explicitly modeling this variability. But, multi-agent systems lack of mechanisms
to effectively deal with the different degrees of variability required by these kinds
of systems. Software Product Line technologies, including variability models, has
been successfully applied to different domains to explicitly model any kind of variability.
We have defined a software product line development process that performs
a model driven generation of agents embedded in heterogeneous smart objects with
different degrees of self-management. However, once deployed, the home energy
assistant system has to be able to evolve to self-adapt its decision making or devices
to new requirements. So, in this paper we propose a model driven mechanism to
automatically manage the evolution of multi-agent systems distributed among several
devices.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech
Movies Tags Extraction Using Deep Learning
Retrieving information from movies is becoming increasingly
demanding due to the enormous amount of multimedia
data generated each day. Not only it helps in efficient
search, archiving and classification of movies, but is also instrumental
in content censorship and recommendation systems.
Extracting key information from a movie and summarizing
it in a few tags which best describe the movie presents
a dedicated challenge and requires an intelligent approach
to automatically analyze the movie. In this paper, we formulate
movies tags extraction problem as a machine learning
classification problem and train a Convolution Neural Network
(CNN) on a carefully constructed tag vocabulary. Our
proposed technique first extracts key frames from a movie
and applies the trained classifier on the key frames. The
predictions from the classifier are assigned scores and are
filtered based on their relative strengths to generate a compact
set of most relevant key tags. We performed a rigorous
subjective evaluation of our proposed technique for a
wide variety of movies with different experiments. The evaluation
results presented in this paper demonstrate that our
proposed approach can efficiently extract the key tags of a
movie with a good accuracy
Solving Sudoku with Membrane Computing
Sudoku is a very popular puzzle which consists on
placing several numbers in a squared grid according to some
simple rules. In this paper we present an efficient family of P
systems which solve sudokus of any order verifying a specific
property. The solution is searched by using a simple human-style
method. If the sudoku cannot be solved by using this strategy, the
P system detects this drawback and then the computations stops
and returns No. Otherwise, the P system encodes the solution
and returns Yes in the last computation step.Ministerio de Ciencia e Innovación TIN2008-04487-EMinisterio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420
Spiking Neural P Systems with Structural Plasticity: Attacking the Subset Sum Problem
Spiking neural P systems with structural plasticity (in short,
SNPSP systems) are models of computations inspired by the function and
structure of biological neurons. In SNPSP systems, neurons can create
or delete synapses using plasticity rules. We report two families of solutions:
a non-uniform and a uniform one, to the NP-complete problem
Subset Sum using SNPSP systems. Instead of the usual rule-level nondeterminism
(choosing which rule to apply) we use synapse-level nondeterminism
(choosing which synapses to create or delete). The nondeterminism
due to plasticity rules have the following improvements from a
previous solution: in our non-uniform solution, plasticity rules allowed
for a normal form to be used (i.e. without forgetting rules or rules with
delays, system is simple, only synapse-level nondeterminism); in our uniform
solution the number of neurons and the computation steps are
reduced.Ministerio de Economía y Competitividad TIN2012-3743
Simulating FRSN P Systems with Real Numbers in P-Lingua on sequential and CUDA platforms
Fuzzy Reasoning Spiking Neural P systems (FRSN P systems,
for short) is a variant of Spiking Neural P systems incorporating
fuzzy logic elements that make it suitable to model fuzzy diagnosis knowledge
and reasoning required for fault diagnosis applications. In this sense,
several FRSN P system variants have been proposed, dealing with real
numbers, trapezoidal numbers, weights, etc. The model incorporating
real numbers was the first introduced [13], presenting promising applications
in the field of fault diagnosis of electrical systems. For this variant,
a matrix-based algorithm was provided which, when executed on parallel
computing platforms, fully exploits the model maximally parallel
capacities. In this paper we introduce a P-Lingua framework extension
to parse and simulate FRSN P systems with real numbers. Two simulators,
implementing a variant of the original matrix-based simulation
algorithm, are provided: a sequential one (written in Java), intended to
run on traditional CPUs, and a parallel one, intended to run on CUDAenabled
devices.Ministerio de Economía y Competitividad TIN2012-3743
- …