5,340 research outputs found
An Improved GPU Simulator For Spiking Neural P Systems
Spiking Neural P (SNP) systems, variants of Psystems (under Membrane and Natural computing), are computing models that acquire abstraction and inspiration from the way neurons 'compute' or process information. Similar to other P system variants, SNP systems are Turing complete models that by nature compute non-deterministically and in a maximally parallel manner. P systems usually trade (often exponential) space for (polynomial to constant) time. Due to this nature, P system variants are currently limited to parallel simulations, and several variants have already been simulated in parallel devices. In this paper we present an improved SNP system simulator based on graphics processing units (GPUs). Among other reasons, current GPUs are architectured for massively parallel computations, thus making GPUs very suitable for SNP system simulation. The computing model, hardware/software considerations, and simulation algorithm are presented, as well as the comparisons of the CPU only and CPU-GPU based simulators.Ministerio de Ciencia e Innovación TIN2009–13192Junta de Andalucía P08-TIC-0420
Marker-based filtering of bilingual phrase pairs for SMT
State-of-the-art statistical machine translation
systems make use of a large translation table obtained after scoring a set of bilingual phrase pairs automatically extracted from a parallel corpus. The number of bilingual phrase pairs extracted from a pair of aligned sentences grows exponentially as the length of the sentences increases; therefore, the number of entries in the phrase table used to carry out the translation may become unmanageable, especially when online, 'on demand' translation is required in real time. We describe
the use of closed-class words to filter the set of bilingual phrase pairs extracted from the parallel corpus by taking into account the alignment information
and the type of the words involved in the alignments. On four European language pairs, we show that our simple yet novel approach can filter the phrase table by up to
a third yet still provide competitive results compared to the baseline. Furthermore, it provides a nice balance between the unfiltered approach and pruning using stop
words, where the deterioration in translation quality is unacceptably high
High-Efficient Parallel CAVLC Encoders on Heterogeneous Multicore Architectures
This article presents two high-efficient parallel realizations of the context-based adaptive variable length coding (CAVLC) based on heterogeneous multicore processors. By optimizing the architecture of the CAVLC encoder, three kinds of dependences are eliminated or weaken, including the context-based data dependence, the memory accessing dependence and the control dependence. The CAVLC pipeline is divided into three stages: two scans, coding, and lag packing, and be implemented on two typical heterogeneous multicore architectures. One is a block-based SIMD parallel CAVLC encoder on multicore stream processor STORM. The other is a component-oriented SIMT parallel encoder on massively parallel architecture GPU. Both of them exploited rich data-level parallelism. Experiments results show that compared with the CPU version, more than 70 times of speedup can be obtained for STORM and over 50 times for GPU. The implementation of encoder on STORM can make a real-time processing for 1080p @30fps and GPU-based version can satisfy the requirements for 720p real-time encoding. The throughput of the presented CAVLC encoders is more than 10 times higher than that of published software encoders on DSP and multicore platforms
A Continuously Growing Dataset of Sentential Paraphrases
A major challenge in paraphrase research is the lack of parallel corpora. In
this paper, we present a new method to collect large-scale sentential
paraphrases from Twitter by linking tweets through shared URLs. The main
advantage of our method is its simplicity, as it gets rid of the classifier or
human in the loop needed to select data before annotation and subsequent
application of paraphrase identification algorithms in the previous work. We
present the largest human-labeled paraphrase corpus to date of 51,524 sentence
pairs and the first cross-domain benchmarking for automatic paraphrase
identification. In addition, we show that more than 30,000 new sentential
paraphrases can be easily and continuously captured every month at ~70%
precision, and demonstrate their utility for downstream NLP tasks through
phrasal paraphrase extraction. We make our code and data freely available.Comment: 11 pages, accepted to EMNLP 201
Analysis of P systems simulation on CUDA
GPUs (Graphics Processing Unit) have been con-
solidated as a massively data-parallel coprocessor to
develop many general purpose computations, and en-
able developers to utilize several levels of parallelism
to obtain better performance of their applications.
The massively parallel nature of certain computa-
tions leads to use GPUs as an underlying architec-
ture, becoming a good alternative to other paral-
lel approaches. P systems or membrane systems
are theoretical devices inspired in the way that liv-
ing cells work, providing computational models and
a high level computational modeling framework for
biological systems. They are massively parallel dis-
tributed, and non-deterministic systems. In this pa-
per, we evaluate the GPU as the underlying archi-
tecture to simulate the class of recognizer P systems
with active membranes. We analyze the performance
of three simulators implemented on CPU, CPU-GPU
and GPU respectively. We compare them using a pre-
sented P system as a benchmark, showing that the
GPU is better suited than the CPU to simulate those
P systems due to its massively parallel nature.Ministerio de Educación y Ciencia TIN2006-13425Junta de Andalucía P08–TIC0420
Implementing P Systems Parallelism by Means of GPUs
Software development for Membrane Computing is growing
up yielding new applications. Nowadays, the efficiency of P systems simulators
have become a critical point when working with instances of large
size. The newest generation of GPUs (Graphics Processing Units) provide
a massively parallel framework to compute general purpose computations.
We present GPUs as an alternative to obtain better performance
in the simulation of P systems and we illustrate it by giving a solution
to the N-Queens problem as an example.Ministerio de Educación y Ciencia TIN2006-13425Junta de Andalucía P08–TIC-0420
Relay: A New IR for Machine Learning Frameworks
Machine learning powers diverse services in industry including search,
translation, recommendation systems, and security. The scale and importance of
these models require that they be efficient, expressive, and portable across an
array of heterogeneous hardware devices. These constraints are often at odds;
in order to better accommodate them we propose a new high-level intermediate
representation (IR) called Relay. Relay is being designed as a
purely-functional, statically-typed language with the goal of balancing
efficient compilation, expressiveness, and portability. We discuss the goals of
Relay and highlight its important design constraints. Our prototype is part of
the open source NNVM compiler framework, which powers Amazon's deep learning
framework MxNet
- …