120,774 research outputs found

    Angpow: a software for the fast computation of accurate tomographic power spectra

    Full text link
    The statistical distribution of galaxies is a powerful probe to constrain cosmological models and gravity. In particular the matter power spectrum P(k)P(k) brings information about the cosmological distance evolution and the galaxy clustering together. However the building of P(k)P(k) from galaxy catalogues needs a cosmological model to convert angles on the sky and redshifts into distances, which leads to difficulties when comparing data with predicted P(k)P(k) from other cosmological models, and for photometric surveys like LSST. The angular power spectrum Cℓ(z1,z2)C_\ell(z_1,z_2) between two bins located at redshift z1z_1 and z2z_2 contains the same information than the matter power spectrum, is free from any cosmological assumption, but the prediction of Cℓ(z1,z2)C_\ell(z_1,z_2) from P(k)P(k) is a costly computation when performed exactly. The Angpow software aims at computing quickly and accurately the auto (z1=z2z_1=z_2) and cross (z1≠z2z_1 \neq z_2) angular power spectra between redshift bins. We describe the developed algorithm, based on developments on the Chebyshev polynomial basis and on the Clenshaw-Curtis quadrature method. We validate the results with other codes, and benchmark the performance. Angpow is flexible and can handle any user defined power spectra, transfer functions, and redshift selection windows. The code is fast enough to be embedded inside programs exploring large cosmological parameter spaces through the Cℓ(z1,z2)C_\ell(z_1,z_2) comparison with data. We emphasize that the Limber's approximation, often used to fasten the computation, gives wrong CℓC_\ell values for cross-correlations.Comment: Published in Astronomy & Astrophysic

    Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators

    Full text link
    We propose a distributed system based on lowpower embedded FPGAs designed for edge computing applications focused on exploring distributing scheduling optimizations for Deep Learning (DL) workloads to obtain the best performance regarding latency and power efficiency. Our cluster was modular throughout the experiment, and we have implementations that consist of up to 12 Zynq-7020 chip-based boards as well as 5 UltraScale+ MPSoC FPGA boards connected through an ethernet switch, and the cluster will evaluate configurable Deep Learning Accelerator (DLA) Versatile Tensor Accelerator (VTA). This adaptable distributed architecture is distinguished by its capacity to evaluate and manage neural network workloads in numerous configurations which enables users to conduct multiple experiments tailored to their specific application needs. The proposed system can simultaneously execute diverse Neural Network (NN) models, arrange the computation graph in a pipeline structure, and manually allocate greater resources to the most computationally intensive layers of the NN graph.Comment: 4 pages of content, 1 page for references. 4 Figures, 1 table. Conference Paper (IEEE International Conference on Electro Information Technology (eit2023) at Lewis University in Romeoville, IL

    Finite Computational Structures and Implementations

    Full text link
    What is computable with limited resources? How can we verify the correctness of computations? How to measure computational power with precision? Despite the immense scientific and engineering progress in computing, we still have only partial answers to these questions. In order to make these problems more precise, we describe an abstract algebraic definition of classical computation, generalizing traditional models to semigroups. The mathematical abstraction also allows the investigation of different computing paradigms (e.g. cellular automata, reversible computing) in the same framework. Here we summarize the main questions and recent results of the research of finite computation.Comment: 12 pages, 3 figures, will be presented at CANDAR'16 and final version published by IEEE Computer Societ

    Approximate FPGA-based LSTMs under Computation Time Constraints

    Full text link
    Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks. However, the models are becoming increasingly demanding in terms of computational and memory load. Emerging latency-sensitive applications including mobile robots and autonomous vehicles often operate under stringent computation time constraints. In this paper, we address the challenge of deploying computationally demanding LSTMs at a constrained time budget by introducing an approximate computing scheme that combines iterative low-rank compression and pruning, along with a novel FPGA-based LSTM architecture. Combined in an end-to-end framework, the approximation method's parameters are optimised and the architecture is configured to address the problem of high-performance LSTM execution in time-constrained applications. Quantitative evaluation on a real-life image captioning application indicates that the proposed methods required up to 6.5x less time to achieve the same application-level accuracy compared to a baseline method, while achieving an average of 25x higher accuracy under the same computation time constraints.Comment: Accepted at the 14th International Symposium in Applied Reconfigurable Computing (ARC) 201

    Simulation of Rapidly-Exploring Random Trees in Membrane Computing with P-Lingua and Automatic Programming

    Get PDF
    Methods based on Rapidly-exploring Random Trees (RRTs) have been widely used in robotics to solve motion planning problems. On the other hand, in the membrane computing framework, models based on Enzymatic Numerical P systems (ENPS) have been applied to robot controllers, but today there is a lack of planning algorithms based on membrane computing for robotics. With this motivation, we provide a variant of ENPS called Random Enzymatic Numerical P systems with Proteins and Shared Memory (RENPSM) addressed to implement RRT algorithms and we illustrate it by simulating the bidirectional RRT algorithm. This paper is an extension of [21]a. The software presented in [21] was an ad-hoc simulator, i.e, a tool for simulating computations of one and only one model that has been hard-coded. The main contribution of this paper with respect to [21] is the introduction of a novel solution for membrane computing simulators based on automatic programming. First, we have extended the P-Lingua syntax –a language to define membrane computing models– to write RENPSM models. Second, we have implemented a new parser based on Flex and Bison to read RENPSM models and produce source code in C language for multicore processors with OpenMP. Finally, additional experiments are presented.Ministerio de Economía, Industria y Competitividad TIN2017-89842-
    • …
    corecore