5,890 research outputs found
ParaFPGA 2013: Harnessing Programs, Power and Performance in Parallel FPGA applications
Future computing systems will require dedicated accelerators to achieve high-performance. The mini-symposium ParaFPGA explores parallel computing with FPGAs as an interesting avenue to reduce the gap between the architecture and the application. Topics discussed are the power of functional and dataflow languages, the performance of high-level synthesis tools, the automatic creation of hardware multi-cores using C-slow retiming, dynamic power management to control the energy consumption, real-time reconfiguration of streaming image processing filters and memory optimized event image segmentation
Workload-aware Automatic Parallelization for Multi-GPU DNN Training
Deep neural networks (DNNs) have emerged as successful solutions for variety
of artificial intelligence applications, but their very large and deep models
impose high computational requirements during training. Multi-GPU
parallelization is a popular option to accelerate demanding computations in DNN
training, but most state-of-the-art multi-GPU deep learning frameworks not only
require users to have an in-depth understanding of the implementation of the
frameworks themselves, but also apply parallelization in a straight-forward way
without optimizing GPU utilization. In this work, we propose a workload-aware
auto-parallelization framework (WAP) for DNN training, where the work is
automatically distributed to multiple GPUs based on the workload
characteristics. We evaluate WAP using TensorFlow with popular DNN benchmarks
(AlexNet and VGG-16), and show competitive training throughput compared with
the state-of-the-art frameworks, and also demonstrate that WAP automatically
optimizes GPU assignment based on the workload's compute requirements, thereby
improving energy efficiency.Comment: This paper is accepted in ICASSP201
Improving the scalability of parallel N-body applications with an event driven constraint based execution model
The scalability and efficiency of graph applications are significantly
constrained by conventional systems and their supporting programming models.
Technology trends like multicore, manycore, and heterogeneous system
architectures are introducing further challenges and possibilities for emerging
application domains such as graph applications. This paper explores the space
of effective parallel execution of ephemeral graphs that are dynamically
generated using the Barnes-Hut algorithm to exemplify dynamic workloads. The
workloads are expressed using the semantics of an Exascale computing execution
model called ParalleX. For comparison, results using conventional execution
model semantics are also presented. We find improved load balancing during
runtime and automatic parallelism discovery improving efficiency using the
advanced semantics for Exascale computing.Comment: 11 figure
Programs as Polypeptides
We describe a visual programming language for defining behaviors manifested
by reified actors in a 2D virtual world that can be compiled into programs
comprised of sequences of combinators that are themselves reified as actors.
This makes it possible to build programs that build programs from components of
a few fixed types delivered by diffusion using processes that resemble
chemistry as much as computation.Comment: in European Conference on Artificial Life (ECAL '15), York, UK, 201
- …