253,595 research outputs found

    Simulating Spiking Neural P systems without delays using GPUs

    Get PDF
    We present in this paper our work regarding simulating a type of P system known as a spiking neural P system (SNP system) using graphics processing units (GPUs). GPUs, because of their architectural optimization for parallel computations, are well-suited for highly parallelizable problems. Due to the advent of general purpose GPU computing in recent years, GPUs are not limited to graphics and video processing alone, but include computationally intensive scientific and mathematical applications as well. Moreover P systems, including SNP systems, are inherently and maximally parallel computing models whose inspirations are taken from the functioning and dynamics of a living cell. In particular, SNP systems try to give a modest but formal representation of a special type of cell known as the neuron and their interactions with one another. The nature of SNP systems allowed their representation as matrices, which is a crucial step in simulating them on highly parallel devices such as GPUs. The highly parallel nature of SNP systems necessitate the use of hardware intended for parallel computations. The simulation algorithms, design considerations, and implementation are presented. Finally, simulation results, observations, and analyses using an SNP system that generates all numbers in N\mathbb N - {1} are discussed, as well as recommendations for future work.Comment: 19 pages in total, 4 figures, listings/algorithms, submitted at the 9th Brainstorming Week in Membrane Computing, University of Seville, Spai

    Adaptive-optics Optical Coherence Tomography Processing Using a Graphics Processing Unit

    Get PDF
    Graphics processing units are increasingly being used for scientific computing for their powerful parallel processing abilities, and moderate price compared to super computers and computing grids. In this paper we have used a general purpose graphics processing unit to process adaptive-optics optical coherence tomography (AOOCT) images in real time. Increasing the processing speed of AOOCT is an essential step in moving the super high resolution technology closer to clinical viability

    JANUS: an FPGA-based System for High Performance Scientific Computing

    Get PDF
    This paper describes JANUS, a modular massively parallel and reconfigurable FPGA-based computing system. Each JANUS module has a computational core and a host. The computational core is a 4x4 array of FPGA-based processing elements with nearest-neighbor data links. Processors are also directly connected to an I/O node attached to the JANUS host, a conventional PC. JANUS is tailored for, but not limited to, the requirements of a class of hard scientific applications characterized by regular code structure, unconventional data manipulation instructions and not too large data-base size. We discuss the architecture of this configurable machine, and focus on its use on Monte Carlo simulations of statistical mechanics. On this class of application JANUS achieves impressive performances: in some cases one JANUS processing element outperfoms high-end PCs by a factor ~ 1000. We also discuss the role of JANUS on other classes of scientific applications.Comment: 11 pages, 6 figures. Improved version, largely rewritten, submitted to Computing in Science & Engineerin

    Implement of a high-performance computing system for parallel processing of scientific applications and the teaching of multicore and parallel programming

    Full text link
    [EN] Increasingly complex algorithms for the modeling and resolution of different problems, which are currently facing humanity, has made it necessary the advent of new data processing requirements and the consequent implementation of high performance computing systems; but due to the high economic cost of this type of equipment and considering that an education institution cannot acquire, it is necessary to develop and implement computable architectures that are economical and scalable in their construction, such as heterogeneous distributed computing systems, constituted by several clustering of multicore processing elements with shared and distributed memory systems. This paper presents the analysis, design and implementation of a high-performance computing system called Liebres InTELigentes, whose purpose is the design and execution of intrinsically parallel algorithms, which require high amounts of storage and excessive processing times. The proposed computer system is constituted by conventional computing equipment (desktop computers, lap top equipment and servers), linked by a high-speed network. The main objective of this research is to build technology for the purposes of scientific and educational research.This project is sponsored by Tecnologico Nacional de México TecNM. 2018-2 110Velarde Martinez, A. (2019). Implement of a high-performance computing system for parallel processing of scientific applications and the teaching of multicore and parallel programming. En INNODOCT/18. International Conference on Innovation, Documentation and Education. Editorial Universitat Politècnica de València. 203-213. https://doi.org/10.4995/INN2018.2018.8908OCS20321

    Significantly reducing the processing times of high-speed photometry data sets using a distributed computing model

    Get PDF
    The scientific community is in the midst of a data analysis crisis. The increasing capacity of scientific CCD instrumentation and their falling costs is contributing to an explosive generation of raw photometric data. This data must go through a process of cleaning and reduction before it can be used for high precision photometric analysis. Many existing data processing pipelines either assume a relatively small dataset or are batch processed by a High Performance Computing centre. A radical overhaul of these processing pipelines is required to allow reduction and cleaning rates to process terabyte sized datasets at near capture rates using an elastic processing architecture. The ability to access computing resources and to allow them to grow and shrink as demand fluctuates is essential, as is exploiting the parallel nature of the datasets. A distributed data processing pipeline is required. It should incorporate lossless data compression, allow for data segmentation and support processing of data segments in parallel. Academic institutes can collaborate and provide an elastic computing model without the requirement for large centralized high performance computing data centers. This paper demonstrates how a base 10 order of magnitude improvement in overall processing time has been achieved using the ACN pipeline , a distributed pipeline spanning multiple academic institutes

    Performance of scientific processing in networks of workstations

    Get PDF
    The growing processing power of standard workstations, along with the relatively easy way in which they can be available for parallel processing, have both contributed to their increasing use in computation intensive application areas. Usually, computation intensive areas have been referred to as scientific processing; one of them being linear algebra, where a great effort has been made to optimize solution methods for serial as well as for parallel computing.\nSince the appearance of software libraries for parallel environments such as PVM (Parallel Virtual Machine) [4] and implementations of MPI (Message Passing Interface) [5], the distributed processing power of networks of workstations has been available for parallel processing as well.\nAlso, a strong emphasis has been made on the heterogeneous computing facility provided by these libraries over networks of workstations. However, there is a lack of published results on the performance obtained on this kind of parallel (more specifically distributed) processing architectures.\nFrom the whole area of linear algebra applications, the most challenging (in terms of performance) operations to be solved are the so called Level 3 BLAS (Basic Linear Algebra Subprograms). In Level 3 BLAS, all of the processing can be expressed (and solved) in terms of matrix-matrix operations. Even more specifically, the most studied operation has been matrix multiplication, which is in fact a benchmark in this application area.Eje: Procesamiento Concurrente, paralelo y distribuido. Rede
    corecore