108 research outputs found

    Sample-Parallel Execution of EBCOT in Fast Mode

    Get PDF
    JPEG 2000’s most computationally expensive building block is the Embedded Block Coder with Optimized Truncation (EBCOT). This paper evaluates how encoders targeting a parallel architecture such as a GPU can increase their throughput in use cases where very high data rates are used. The compression efficiency in the less significant bit-planes is then often poor and it is beneficial to enable the Selective Arithmetic Coding Bypass style (fast mode) in order to trade a small loss in compression efficiency for a reduction of the computational complexity. More importantly, this style exposes a more finely grained parallelism that can be exploited to execute the raw coding passes, including bit-stuffing, in a sample-parallel fashion. For a latency- or memory critical application that encodes one frame at a time, EBCOT’s tier-1 is sped up between 1.1x and 2.4x compared to an optimized GPU-based implementation. When a low GPU occupancy has already been addressed by encoding multiple frames in parallel, the throughput can still be improved by 5% for high-entropy images and 27% for low-entropy images. Best results are obtained when enabling the fast mode after the fourth significant bit-plane. For most of the test images the compression rate is within 1% of the original

    Evaluation of GPU/CPU Co-Processing Models for JPEG 2000 Packetization

    Get PDF
    With the bottom-line goal of increasing the throughput of a GPU-accelerated JPEG 2000 encoder, this paper evaluates whether the post-compression rate control and packetization routines should be carried out on the CPU or on the GPU. Three co-processing models that differ in how the workload is split among the CPU and GPU are introduced. Both routines are discussed and algorithms for executing them in parallel are presented. Experimental results for compressing a detail-rich UHD sequence to 4 bits/sample indicate speed-ups of 200x for the rate control and 100x for the packetization compared to the single-threaded implementation in the commercial Kakadu library. These two routines executed on the CPU take 4x as long as all remaining coding steps on the GPU and therefore present a bottleneck. Even if the CPU bottleneck could be avoided with multi-threading, it is still beneficial to execute all coding steps on the GPU as this minimizes the required device-to-host transfer and thereby speeds up the critical path from 17.2 fps to 19.5 fps for 4 bits/sample and to 22.4 fps for 0.16 bits/sample

    A New Strategy to Improve the Performance of PDP-Systems Simulators

    Get PDF
    One of the major challenges that current P systems simulators have to deal with is to be as efficient as possible. A P system is syntactically described as a membrane structure delimiting regions where multisets of objects evolve by means of evolution rules. According to that, on each computation step, the applicability of the rules for the current P system configuration must be calculated. In this paper we extend previous works that use Rete-based simulation algorithm in order to improve the time consumed during the checking phase in the selection of rules. A new approach is presented, oriented to the acceleration of Population Dynamics P Systems simulations.Ministerio de Economía y Competitividad TIN2012- 3743

    An Improved GPU Simulator For Spiking Neural P Systems

    Get PDF
    Spiking Neural P (SNP) systems, variants of Psystems (under Membrane and Natural computing), are computing models that acquire abstraction and inspiration from the way neurons 'compute' or process information. Similar to other P system variants, SNP systems are Turing complete models that by nature compute non-deterministically and in a maximally parallel manner. P systems usually trade (often exponential) space for (polynomial to constant) time. Due to this nature, P system variants are currently limited to parallel simulations, and several variants have already been simulated in parallel devices. In this paper we present an improved SNP system simulator based on graphics processing units (GPUs). Among other reasons, current GPUs are architectured for massively parallel computations, thus making GPUs very suitable for SNP system simulation. The computing model, hardware/software considerations, and simulation algorithm are presented, as well as the comparisons of the CPU only and CPU-GPU based simulators.Ministerio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420

    Movies Tags Extraction Using Deep Learning

    Get PDF
    Retrieving information from movies is becoming increasingly demanding due to the enormous amount of multimedia data generated each day. Not only it helps in efficient search, archiving and classification of movies, but is also instrumental in content censorship and recommendation systems. Extracting key information from a movie and summarizing it in a few tags which best describe the movie presents a dedicated challenge and requires an intelligent approach to automatically analyze the movie. In this paper, we formulate movies tags extraction problem as a machine learning classification problem and train a Convolution Neural Network (CNN) on a carefully constructed tag vocabulary. Our proposed technique first extracts key frames from a movie and applies the trained classifier on the key frames. The predictions from the classifier are assigned scores and are filtered based on their relative strengths to generate a compact set of most relevant key tags. We performed a rigorous subjective evaluation of our proposed technique for a wide variety of movies with different experiments. The evaluation results presented in this paper demonstrate that our proposed approach can efficiently extract the key tags of a movie with a good accuracy

    Solving Sudoku with Membrane Computing

    Get PDF
    Sudoku is a very popular puzzle which consists on placing several numbers in a squared grid according to some simple rules. In this paper we present an efficient family of P systems which solve sudokus of any order verifying a specific property. The solution is searched by using a simple human-style method. If the sudoku cannot be solved by using this strategy, the P system detects this drawback and then the computations stops and returns No. Otherwise, the P system encodes the solution and returns Yes in the last computation step.Ministerio de Ciencia e Innovación TIN2008-04487-EMinisterio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420

    Spiking Neural P Systems with Structural Plasticity: Attacking the Subset Sum Problem

    Get PDF
    Spiking neural P systems with structural plasticity (in short, SNPSP systems) are models of computations inspired by the function and structure of biological neurons. In SNPSP systems, neurons can create or delete synapses using plasticity rules. We report two families of solutions: a non-uniform and a uniform one, to the NP-complete problem Subset Sum using SNPSP systems. Instead of the usual rule-level nondeterminism (choosing which rule to apply) we use synapse-level nondeterminism (choosing which synapses to create or delete). The nondeterminism due to plasticity rules have the following improvements from a previous solution: in our non-uniform solution, plasticity rules allowed for a normal form to be used (i.e. without forgetting rules or rules with delays, system is simple, only synapse-level nondeterminism); in our uniform solution the number of neurons and the computation steps are reduced.Ministerio de Economía y Competitividad TIN2012-3743

    Simulating FRSN P Systems with Real Numbers in P-Lingua on sequential and CUDA platforms

    Get PDF
    Fuzzy Reasoning Spiking Neural P systems (FRSN P systems, for short) is a variant of Spiking Neural P systems incorporating fuzzy logic elements that make it suitable to model fuzzy diagnosis knowledge and reasoning required for fault diagnosis applications. In this sense, several FRSN P system variants have been proposed, dealing with real numbers, trapezoidal numbers, weights, etc. The model incorporating real numbers was the first introduced [13], presenting promising applications in the field of fault diagnosis of electrical systems. For this variant, a matrix-based algorithm was provided which, when executed on parallel computing platforms, fully exploits the model maximally parallel capacities. In this paper we introduce a P-Lingua framework extension to parse and simulate FRSN P systems with real numbers. Two simulators, implementing a variant of the original matrix-based simulation algorithm, are provided: a sequential one (written in Java), intended to run on traditional CPUs, and a parallel one, intended to run on CUDAenabled devices.Ministerio de Economía y Competitividad TIN2012-3743

    When Matrices Meet Brains

    Get PDF
    Spiking neural P systems (SN P systems, for short) are a class of distributed parallel computing devices inspired from the way neurons communicate by means of spikes. In this work, a discrete structure representation of SN P systems is proposed. Specifically, matrices are used to represent SN P systems. In order to represent the computations of SN P systems by matrices, configuration vectors are defined to monitor the number of spikes in each neuron at any given configuration; transition net gain vectors are also introduced to quantify the total amount of spikes consumed and produced after the chosen rules are applied. Nondeterminism of the systems is assured by a set of spiking transition vectors that could be used at any given time during the computation. With such matrix representation, it is quite convenient to determine the next configuration from a given configuration, since it involves only multiplying vectors to a matrix and adding vectors

    The Reduction Problem in CUDA and Its Simulation with P Systems

    Get PDF
    We introduce P systems with dynamic communication graphs which simu- late the functioning of the CUDA architecture when solving the parallel reduction prob- lem
    corecore