120,774 research outputs found
Angpow: a software for the fast computation of accurate tomographic power spectra
The statistical distribution of galaxies is a powerful probe to constrain
cosmological models and gravity. In particular the matter power spectrum
brings information about the cosmological distance evolution and the galaxy
clustering together. However the building of from galaxy catalogues
needs a cosmological model to convert angles on the sky and redshifts into
distances, which leads to difficulties when comparing data with predicted
from other cosmological models, and for photometric surveys like LSST.
The angular power spectrum between two bins located at
redshift and contains the same information than the matter power
spectrum, is free from any cosmological assumption, but the prediction of
from is a costly computation when performed exactly.
The Angpow software aims at computing quickly and accurately the auto
() and cross () angular power spectra between redshift
bins. We describe the developed algorithm, based on developments on the
Chebyshev polynomial basis and on the Clenshaw-Curtis quadrature method. We
validate the results with other codes, and benchmark the performance. Angpow is
flexible and can handle any user defined power spectra, transfer functions, and
redshift selection windows. The code is fast enough to be embedded inside
programs exploring large cosmological parameter spaces through the
comparison with data. We emphasize that the Limber's
approximation, often used to fasten the computation, gives wrong
values for cross-correlations.Comment: Published in Astronomy & Astrophysic
Reconfigurable Distributed FPGA Cluster Design for Deep Learning Accelerators
We propose a distributed system based on lowpower embedded FPGAs designed for
edge computing applications focused on exploring distributing scheduling
optimizations for Deep Learning (DL) workloads to obtain the best performance
regarding latency and power efficiency. Our cluster was modular throughout the
experiment, and we have implementations that consist of up to 12 Zynq-7020
chip-based boards as well as 5 UltraScale+ MPSoC FPGA boards connected through
an ethernet switch, and the cluster will evaluate configurable Deep Learning
Accelerator (DLA) Versatile Tensor Accelerator (VTA). This adaptable
distributed architecture is distinguished by its capacity to evaluate and
manage neural network workloads in numerous configurations which enables users
to conduct multiple experiments tailored to their specific application needs.
The proposed system can simultaneously execute diverse Neural Network (NN)
models, arrange the computation graph in a pipeline structure, and manually
allocate greater resources to the most computationally intensive layers of the
NN graph.Comment: 4 pages of content, 1 page for references. 4 Figures, 1 table.
Conference Paper (IEEE International Conference on Electro Information
Technology (eit2023) at Lewis University in Romeoville, IL
Finite Computational Structures and Implementations
What is computable with limited resources? How can we verify the correctness
of computations? How to measure computational power with precision? Despite the
immense scientific and engineering progress in computing, we still have only
partial answers to these questions. In order to make these problems more
precise, we describe an abstract algebraic definition of classical computation,
generalizing traditional models to semigroups. The mathematical abstraction
also allows the investigation of different computing paradigms (e.g. cellular
automata, reversible computing) in the same framework. Here we summarize the
main questions and recent results of the research of finite computation.Comment: 12 pages, 3 figures, will be presented at CANDAR'16 and final version
published by IEEE Computer Societ
Approximate FPGA-based LSTMs under Computation Time Constraints
Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM)
networks have demonstrated state-of-the-art accuracy in several emerging
Artificial Intelligence tasks. However, the models are becoming increasingly
demanding in terms of computational and memory load. Emerging latency-sensitive
applications including mobile robots and autonomous vehicles often operate
under stringent computation time constraints. In this paper, we address the
challenge of deploying computationally demanding LSTMs at a constrained time
budget by introducing an approximate computing scheme that combines iterative
low-rank compression and pruning, along with a novel FPGA-based LSTM
architecture. Combined in an end-to-end framework, the approximation method's
parameters are optimised and the architecture is configured to address the
problem of high-performance LSTM execution in time-constrained applications.
Quantitative evaluation on a real-life image captioning application indicates
that the proposed methods required up to 6.5x less time to achieve the same
application-level accuracy compared to a baseline method, while achieving an
average of 25x higher accuracy under the same computation time constraints.Comment: Accepted at the 14th International Symposium in Applied
Reconfigurable Computing (ARC) 201
Simulation of Rapidly-Exploring Random Trees in Membrane Computing with P-Lingua and Automatic Programming
Methods based on Rapidly-exploring Random Trees (RRTs) have been
widely used in robotics to solve motion planning problems. On the other hand, in the
membrane computing framework, models based on Enzymatic Numerical P systems
(ENPS) have been applied to robot controllers, but today there is a lack of planning
algorithms based on membrane computing for robotics. With this motivation, we
provide a variant of ENPS called Random Enzymatic Numerical P systems with
Proteins and Shared Memory (RENPSM) addressed to implement RRT algorithms
and we illustrate it by simulating the bidirectional RRT algorithm. This paper is an
extension of [21]a. The software presented in [21] was an ad-hoc simulator, i.e, a tool
for simulating computations of one and only one model that has been hard-coded.
The main contribution of this paper with respect to [21] is the introduction of a novel
solution for membrane computing simulators based on automatic programming. First,
we have extended the P-Lingua syntax –a language to define membrane computing
models– to write RENPSM models. Second, we have implemented a new parser based
on Flex and Bison to read RENPSM models and produce source code in C language
for multicore processors with OpenMP. Finally, additional experiments are presented.Ministerio de EconomĂa, Industria y Competitividad TIN2017-89842-
- …