5,164 research outputs found
A Many-Core Overlay for High-Performance Embedded Computing on FPGAs
In this work, we propose a configurable many-core overlay for
high-performance embedded computing. The size of internal memory, supported
operations and number of ports can be configured independently for each core of
the overlay. The overlay was evaluated with matrix multiplication, LU
decomposition and Fast-Fourier Transform (FFT) on a ZYNQ-7020 FPGA platform.
The results show that using a system-level many-core overlay avoids complex
hardware design and still provides good performance results.Comment: Presented at First International Workshop on FPGAs for Software
Programmers (FSP 2014) (arXiv:1408.4423
HALLS: An Energy-Efficient Highly Adaptable Last Level STT-RAM Cache for Multicore Systems
Spin-Transfer Torque RAM (STT-RAM) is widely considered a promising
alternative to SRAM in the memory hierarchy due to STT-RAM's non-volatility,
low leakage power, high density, and fast read speed. The STT-RAM's small
feature size is particularly desirable for the last-level cache (LLC), which
typically consumes a large area of silicon die. However, long write latency and
high write energy still remain challenges of implementing STT-RAMs in the CPU
cache. An increasingly popular method for addressing this challenge involves
trading off the non-volatility for reduced write speed and write energy by
relaxing the STT-RAM's data retention time. However, in order to maximize
energy saving potential, the cache configurations, including STT-RAM's
retention time, must be dynamically adapted to executing applications' variable
memory needs. In this paper, we propose a highly adaptable last level STT-RAM
cache (HALLS) that allows the LLC configurations and retention time to be
adapted to applications' runtime execution requirements. We also propose
low-overhead runtime tuning algorithms to dynamically determine the best
(lowest energy) cache configurations and retention times for executing
applications. Compared to prior work, HALLS reduced the average energy
consumption by 60.57% in a quad-core system, while introducing marginal latency
overhead.Comment: To Appear on IEEE Transactions on Computers (TC
GPU acceleration of brain image proccessing
Durante los últimos años se ha venido demostrando el alto poder computacional
que ofrecen las GPUs a la hora de resolver determinados problemas.
Al mismo tiempo, existen campos en los que no es posible beneficiarse completamente
de las mejoras conseguidas por los investigadores, debido principalmente
a que los tiempos de ejecución de las aplicaciones llegan a ser extremadamente
largos. Este es por ejemplo el caso del registro de imágenes en medicina.
A pesar de que se han conseguido aceleraciones sobre el registro de imágenes,
su uso en la práctica clÃnica es aún limitado. Entre otras cosas, esto se debe
al rendimiento conseguido.
Por lo tanto se plantea como objetivo de este proyecto, conseguir mejorar los
tiempos de ejecución de una aplicación dedicada al resgitro de imágenes en medicina,
con el fin de ayudar a aliviar este problema
Beyond XSPEC: Towards Highly Configurable Analysis
We present a quantitative comparison between software features of the defacto
standard X-ray spectral analysis tool, XSPEC, and ISIS, the Interactive
Spectral Interpretation System. Our emphasis is on customized analysis, with
ISIS offered as a strong example of configurable software. While noting that
XSPEC has been of immense value to astronomers, and that its scientific core is
moderately extensible--most commonly via the inclusion of user contributed
"local models"--we identify a series of limitations with its use beyond
conventional spectral modeling. We argue that from the viewpoint of the
astronomical user, the XSPEC internal structure presents a Black Box Problem,
with many of its important features hidden from the top-level interface, thus
discouraging user customization. Drawing from examples in custom modeling,
numerical analysis, parallel computation, visualization, data management, and
automated code generation, we show how a numerically scriptable, modular, and
extensible analysis platform such as ISIS facilitates many forms of advanced
astrophysical inquiry.Comment: Accepted by PASP, for July 2008 (15 pages
- …