90,485 research outputs found
Extensible Component Based Architecture for FLASH, A Massively Parallel, Multiphysics Simulation Code
FLASH is a publicly available high performance application code which has
evolved into a modular, extensible software system from a collection of
unconnected legacy codes. FLASH has been successful because its capabilities
have been driven by the needs of scientific applications, without compromising
maintainability, performance, and usability. In its newest incarnation, FLASH3
consists of inter-operable modules that can be combined to generate different
applications. The FLASH architecture allows arbitrarily many alternative
implementations of its components to co-exist and interchange with each other,
resulting in greater flexibility. Further, a simple and elegant mechanism
exists for customization of code functionality without the need to modify the
core implementation of the source. A built-in unit test framework providing
verifiability, combined with a rigorous software maintenance process, allow the
code to operate simultaneously in the dual mode of production and development.
In this paper we describe the FLASH3 architecture, with emphasis on solutions
to the more challenging conflicts arising from solver complexity, portable
performance requirements, and legacy codes. We also include results from user
surveys conducted in 2005 and 2007, which highlight the success of the code.Comment: 33 pages, 7 figures; revised paper submitted to Parallel Computin
DPP-PMRF: Rethinking Optimization for a Probabilistic Graphical Model Using Data-Parallel Primitives
We present a new parallel algorithm for probabilistic graphical model
optimization. The algorithm relies on data-parallel primitives (DPPs), which
provide portable performance over hardware architecture. We evaluate results on
CPUs and GPUs for an image segmentation problem. Compared to a serial baseline,
we observe runtime speedups of up to 13X (CPU) and 44X (GPU). We also compare
our performance to a reference, OpenMP-based algorithm, and find speedups of up
to 7X (CPU).Comment: LDAV 2018, October 201
A 0.18µm CMOS DDCCII for Portable LV-LP Filters
In this paper a current mode very low voltage (LV) (1V) and low power (LP) (21 µW) differential difference second generation current conveyor (CCII) is presented. The circuit is developed by applying the current sensing technique to a fully balanced version of a differential difference amplifier (DDA) so to design a suitable LV LP integrated version of the so-called differential difference CCII (DDCCII). Post-layout results, using a 0.18µm SMIC CMOS technology, have shown good general circuit performances making the proposed circuit suitable for fully integration in battery portable systems as, for examples, fully differential Sallen-Key bandpass filter
Octopus - an energy-efficient architecture for wireless multimedia systems
Multimedia computing and mobile computing are two trends that will lead to a new application domain in the near future. However, the technological challenges to establishing this paradigm of computing are non-trivial. Personal mobile computing offers a vision of the future with a much richer and more exciting set of architecture research challenges than extrapolations of the current desktop architectures. In particular, these devices will have limited battery resources, will handle diverse data types, and will operate in environments that are insecure, dynamic and which vary significantly in time and location. The approach we made to achieve such a system is to use autonomous, adaptable modules, interconnected by a switch rather than by a bus, and to offload as much as work as possible from the CPU to programmable modules that is placed in the data streams. A reconfigurable internal communication network switch called Octopus exploits locality of reference and eliminates wasteful data copies
Learning from the Success of MPI
The Message Passing Interface (MPI) has been extremely successful as a
portable way to program high-performance parallel computers. This success has
occurred in spite of the view of many that message passing is difficult and
that other approaches, including automatic parallelization and directive-based
parallelism, are easier to use. This paper argues that MPI has succeeded
because it addresses all of the important issues in providing a parallel
programming model.Comment: 12 pages, 1 figur
- …