253,595 research outputs found
Simulating Spiking Neural P systems without delays using GPUs
We present in this paper our work regarding simulating a type of P system
known as a spiking neural P system (SNP system) using graphics processing units
(GPUs). GPUs, because of their architectural optimization for parallel
computations, are well-suited for highly parallelizable problems. Due to the
advent of general purpose GPU computing in recent years, GPUs are not limited
to graphics and video processing alone, but include computationally intensive
scientific and mathematical applications as well. Moreover P systems, including
SNP systems, are inherently and maximally parallel computing models whose
inspirations are taken from the functioning and dynamics of a living cell. In
particular, SNP systems try to give a modest but formal representation of a
special type of cell known as the neuron and their interactions with one
another. The nature of SNP systems allowed their representation as matrices,
which is a crucial step in simulating them on highly parallel devices such as
GPUs. The highly parallel nature of SNP systems necessitate the use of hardware
intended for parallel computations. The simulation algorithms, design
considerations, and implementation are presented. Finally, simulation results,
observations, and analyses using an SNP system that generates all numbers in
- {1} are discussed, as well as recommendations for future work.Comment: 19 pages in total, 4 figures, listings/algorithms, submitted at the
9th Brainstorming Week in Membrane Computing, University of Seville, Spai
Adaptive-optics Optical Coherence Tomography Processing Using a Graphics Processing Unit
Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability
JANUS: an FPGA-based System for High Performance Scientific Computing
This paper describes JANUS, a modular massively parallel and reconfigurable
FPGA-based computing system. Each JANUS module has a computational core and a
host. The computational core is a 4x4 array of FPGA-based processing elements
with nearest-neighbor data links. Processors are also directly connected to an
I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for,
but not limited to, the requirements of a class of hard scientific applications
characterized by regular code structure, unconventional data manipulation
instructions and not too large data-base size. We discuss the architecture of
this configurable machine, and focus on its use on Monte Carlo simulations of
statistical mechanics. On this class of application JANUS achieves impressive
performances: in some cases one JANUS processing element outperfoms high-end
PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of
scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted
to Computing in Science & Engineerin
Implement of a high-performance computing system for parallel processing of scientific applications and the teaching of multicore and parallel programming
[EN] Increasingly complex algorithms for the modeling and resolution of different problems, which are currently facing humanity, has made it necessary the advent of new data processing requirements and the consequent implementation of high performance computing systems; but due to the high economic cost of this type of equipment and considering that an education institution cannot acquire, it is necessary to develop and implement computable architectures that are economical and scalable in their construction, such as heterogeneous distributed computing systems, constituted by several clustering of multicore processing elements with shared and distributed memory systems. This paper presents the analysis, design and implementation of a high-performance computing system called Liebres InTELigentes, whose purpose is the design and execution of intrinsically parallel algorithms, which require high amounts of storage and excessive processing times. The proposed computer system is constituted by conventional computing equipment (desktop computers, lap top equipment and servers), linked by a high-speed network. The main objective of this research is to build technology for the purposes of scientific and educational research.This project is sponsored by Tecnologico Nacional de México TecNM. 2018-2 110Velarde Martinez, A. (2019). Implement of a high-performance computing system for parallel processing of scientific applications and the teaching of multicore and parallel programming. En INNODOCT/18. International Conference on Innovation, Documentation and Education. Editorial Universitat Politècnica de València. 203-213. https://doi.org/10.4995/INN2018.2018.8908OCS20321
Significantly reducing the processing times of high-speed photometry data sets using a distributed computing model
The scientific community is in the midst of a data analysis crisis. The increasing capacity of scientific CCD instrumentation and their falling costs is contributing to an explosive generation of raw photometric data. This data must go through a process of cleaning and reduction before it can be used for high precision photometric analysis. Many existing data processing pipelines either assume a relatively small dataset or are batch processed by a High Performance Computing centre. A radical overhaul of these processing pipelines is required to allow reduction and cleaning rates to process terabyte sized datasets at near capture rates using an elastic processing architecture. The ability to access computing resources and to allow them to grow and shrink as demand fluctuates is essential, as is exploiting the parallel nature of the datasets. A distributed data processing pipeline is required. It should incorporate lossless data compression, allow for data segmentation and support processing of data segments in parallel. Academic institutes can collaborate and provide an elastic computing model without the requirement for large centralized high performance computing data centers. This paper demonstrates how a base 10 order of magnitude improvement in overall processing time has been achieved using the ACN pipeline , a distributed pipeline spanning multiple academic institutes
Performance of scientific processing in networks of workstations
The growing processing power of standard workstations, along with the relatively easy way in which they can be available for parallel processing, have both contributed to their increasing use in computation intensive application areas. Usually, computation intensive areas have been referred to as scientific processing; one of them being linear algebra, where a great effort has been made to optimize solution methods for serial as well as for parallel computing.\nSince the appearance of software libraries for parallel environments such as PVM (Parallel Virtual Machine) [4] and implementations of MPI (Message Passing Interface) [5], the distributed processing power of networks of workstations has been available for parallel processing as well.\nAlso, a strong emphasis has been made on the heterogeneous computing facility provided by these libraries over networks of workstations. However, there is a lack of published results on the performance obtained on this kind of parallel (more specifically distributed) processing architectures.\nFrom the whole area of linear algebra applications, the most challenging (in terms of performance) operations to be solved are the so called Level 3 BLAS (Basic Linear Algebra Subprograms). In Level 3 BLAS, all of the processing can be expressed (and solved) in terms of matrix-matrix operations. Even more specifically, the most studied operation has been matrix multiplication, which is in fact a benchmark in this application area.Eje: Procesamiento Concurrente, paralelo y distribuido. Rede
- …