12,448 research outputs found
Demonstration of Run-time Spatial Mapping of Streaming Applications to a Heterogeneous Multi-Processor System-on-Chip (MPSoC)
In this paper, the problem of spatial mapping is defined. Reasons are presented to show why performing spatial mappings at run-time is both necessary and desirable and criteria for the qualitative comparison of spatial mappings are introduced. An algorithm is described that implements a preliminary spatial mapper. The methods used in the algorithm are demonstrated with an illustrative example
Time-Shared Execution of Realtime Computer Vision Pipelines by Dynamic Partial Reconfiguration
This paper presents an FPGA runtime framework that demonstrates the
feasibility of using dynamic partial reconfiguration (DPR) for time-sharing an
FPGA by multiple realtime computer vision pipelines. The presented time-sharing
runtime framework manages an FPGA fabric that can be round-robin time-shared by
different pipelines at the time scale of individual frames. In this new
use-case, the challenge is to achieve useful performance despite high
reconfiguration time. The paper describes the basic runtime support as well as
four optimizations necessary to achieve realtime performance given the
limitations of DPR on today's FPGAs. The paper provides a characterization of a
working runtime framework prototype on a Xilinx ZC706 development board. The
paper also reports the performance of realtime computer vision pipelines when
time-shared
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions
In the past decade, Convolutional Neural Networks (CNNs) have demonstrated
state-of-the-art performance in various Artificial Intelligence tasks. To
accelerate the experimentation and development of CNNs, several software
frameworks have been released, primarily targeting power-hungry CPUs and GPUs.
In this context, reconfigurable hardware in the form of FPGAs constitutes a
potential alternative platform that can be integrated in the existing deep
learning ecosystem to provide a tunable balance between performance, power
consumption and programmability. In this paper, a survey of the existing
CNN-to-FPGA toolflows is presented, comprising a comparative study of their key
characteristics which include the supported applications, architectural
choices, design space exploration methods and achieved performance. Moreover,
major challenges and objectives introduced by the latest trends in CNN
algorithmic research are identified and presented. Finally, a uniform
evaluation methodology is proposed, aiming at the comprehensive, complete and
in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal,
201
GraphR: Accelerating Graph Processing Using ReRAM
This paper presents GRAPHR, the first ReRAM-based graph processing
accelerator. GRAPHR follows the principle of near-data processing and explores
the opportunity of performing massive parallel analog operations with low
hardware and energy cost. The analog computation is suit- able for graph
processing because: 1) The algorithms are iterative and could inherently
tolerate the imprecision; 2) Both probability calculation (e.g., PageRank and
Collaborative Filtering) and typical graph algorithms involving integers (e.g.,
BFS/SSSP) are resilient to errors. The key insight of GRAPHR is that if a
vertex program of a graph algorithm can be expressed in sparse matrix vector
multiplication (SpMV), it can be efficiently performed by ReRAM crossbar. We
show that this assumption is generally true for a large set of graph
algorithms. GRAPHR is a novel accelerator architecture consisting of two
components: memory ReRAM and graph engine (GE). The core graph computations are
performed in sparse matrix format in GEs (ReRAM crossbars). The
vector/matrix-based graph computation is not new, but ReRAM offers the unique
opportunity to realize the massive parallelism with unprecedented energy
efficiency and low hardware cost. With small subgraphs processed by GEs, the
gain of performing parallel operations overshadows the wastes due to sparsity.
The experiment results show that GRAPHR achieves a 16.01x (up to 132.67x)
speedup and a 33.82x energy saving on geometric mean compared to a CPU baseline
system. Com- pared to GPU, GRAPHR achieves 1.69x to 2.19x speedup and consumes
4.77x to 8.91x less energy. GRAPHR gains a speedup of 1.16x to 4.12x, and is
3.67x to 10.96x more energy efficiency compared to PIM-based architecture.Comment: Accepted to HPCA 201
The Road Ahead for Networking: A Survey on ICN-IP Coexistence Solutions
In recent years, the current Internet has experienced an unexpected paradigm
shift in the usage model, which has pushed researchers towards the design of
the Information-Centric Networking (ICN) paradigm as a possible replacement of
the existing architecture. Even though both Academia and Industry have
investigated the feasibility and effectiveness of ICN, achieving the complete
replacement of the Internet Protocol (IP) is a challenging task.
Some research groups have already addressed the coexistence by designing
their own architectures, but none of those is the final solution to move
towards the future Internet considering the unaltered state of the networking.
To design such architecture, the research community needs now a comprehensive
overview of the existing solutions that have so far addressed the coexistence.
The purpose of this paper is to reach this goal by providing the first
comprehensive survey and classification of the coexistence architectures
according to their features (i.e., deployment approach, deployment scenarios,
addressed coexistence requirements and architecture or technology used) and
evaluation parameters (i.e., challenges emerging during the deployment and the
runtime behaviour of an architecture). We believe that this paper will finally
fill the gap required for moving towards the design of the final coexistence
architecture.Comment: 23 pages, 16 figures, 3 table
Real-time optical manipulation of cardiac conduction in intact hearts
Optogenetics has provided new insights in cardiovascular research, leading to new methods for cardiac pacing, resynchronization therapy and cardioversion. Although these interventions have clearly demonstrated the feasibility of cardiac manipulation, current optical stimulation strategies do not take into account cardiac wave dynamics in real time. Here, we developed an all‐optical platform complemented by integrated, newly developed software to monitor and control electrical activity in intact mouse hearts. The system combined a wide‐field mesoscope with a digital projector for optogenetic activation. Cardiac functionality could be manipulated either in free‐run mode with submillisecond temporal resolution or in a closed‐loop fashion: a tailored hardware and software platform allowed real‐time intervention capable of reacting within 2 ms. The methodology was applied to restore normal electrical activity after atrioventricular block, by triggering the ventricle in response to optically mapped atrial activity with appropriate timing. Real‐time intraventricular manipulation of the propagating electrical wavefront was also demonstrated, opening the prospect for real‐time resynchronization therapy and cardiac defibrillation. Furthermore, the closed‐loop approach was applied to simulate a re‐entrant circuit across the ventricle demonstrating the capability of our system to manipulate heart conduction with high versatility even in arrhythmogenic conditions. The development of this innovative optical methodology provides the first proof‐of‐concept that a real‐time optically based stimulation can control cardiac rhythm in normal and abnormal conditions, promising a new approach for the investigation of the (patho)physiology of the heart
On autonomic platform-as-a-service: characterisation and conceptual model
In this position paper, we envision a Platform-as-a-Service conceptual and architectural solution for large-scale and data intensive applications. Our architectural approach is based on autonomic principles, therefore, its ultimate goal is to reduce human intervention, the cost, and the perceived complexity by enabling the autonomic platform to manage such applications itself in accordance with highlevel policies. Such policies allow the platform to (i) interpret the application specifications; (ii) to map the specifications onto the target computing infrastructure, so that the applications are executed and their Quality of Service (QoS), as specified in their SLA, enforced; and, most importantly, (iii) to adapt automatically such previously established mappings when unexpected behaviours violate the expected. Such adaptations may involve modifications in the arrangement of the computational infrastructure, i.e. by re-designing a different communication network topology that dictates how computational resources interact, or even the live-migration to a different computational infrastructure. The ultimate goal of these challenges is to (de)provision computational machines, storage and networking links and their required topologies in order to supply for the application the virtualised infrastructure that better meets the SLAs. Generic architectural blueprints and principles have been provided for designing and implementing an autonomic computing system.We revisit them in order to provide a customised and specific viewfor PaaS platforms and integrate emerging paradigms such as DevOps for automate deployments, Monitoring as a Service for accurate and large-scale monitoring, or well-known formalisms such as Petri Nets for building performance models
- …