36,642 research outputs found
A study of the use of abstract types for the representation of engineering units in integration and test applications
Physical quantities using various units of measurement can be well represented in Ada by the use of abstract types. Computation involving these quantities (electric potential, mass, volume) can also automatically invoke the computation and checking of some of the implicitly associable attributes of measurements. Quantities can be held internally in SI units, transparently to the user, with automatic conversion. Through dimensional analysis, the type of the derived quantity resulting from a computation is known, thereby allowing dynamic checks of the equations used. The impact of the possible implementation of these techniques in integration and test applications is discussed. The overhead of computing and transporting measurement attributes is weighed against the advantages gained by their use. The construction of a run time interpreter using physical quantities in equations can be aided by the dynamic equation checks provided by dimensional analysis. The effects of high levels of abstraction on the generation and maintenance of software used in integration and test applications are also discussed
A Domain-Specific Language and Editor for Parallel Particle Methods
Domain-specific languages (DSLs) are of increasing importance in scientific
high-performance computing to reduce development costs, raise the level of
abstraction and, thus, ease scientific programming. However, designing and
implementing DSLs is not an easy task, as it requires knowledge of the
application domain and experience in language engineering and compilers.
Consequently, many DSLs follow a weak approach using macros or text generators,
which lack many of the features that make a DSL a comfortable for programmers.
Some of these features---e.g., syntax highlighting, type inference, error
reporting, and code completion---are easily provided by language workbenches,
which combine language engineering techniques and tools in a common ecosystem.
In this paper, we present the Parallel Particle-Mesh Environment (PPME), a DSL
and development environment for numerical simulations based on particle methods
and hybrid particle-mesh methods. PPME uses the meta programming system (MPS),
a projectional language workbench. PPME is the successor of the Parallel
Particle-Mesh Language (PPML), a Fortran-based DSL that used conventional
implementation strategies. We analyze and compare both languages and
demonstrate how the programmer's experience can be improved using static
analyses and projectional editing. Furthermore, we present an explicit domain
model for particle abstractions and the first formal type system for particle
methods.Comment: Submitted to ACM Transactions on Mathematical Software on Dec. 25,
201
High throughput spatial convolution filters on FPGAs
Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility
GRAPE-6: The massively-parallel special-purpose computer for astrophysical particle simulation
In this paper, we describe the architecture and performance of the GRAPE-6
system, a massively-parallel special-purpose computer for astrophysical
-body simulations. GRAPE-6 is the successor of GRAPE-4, which was completed
in 1995 and achieved the theoretical peak speed of 1.08 Tflops. As was the case
with GRAPE-4, the primary application of GRAPE-6 is simulation of collisional
systems, though it can be used for collisionless systems. The main differences
between GRAPE-4 and GRAPE-6 are (a) The processor chip of GRAPE-6 integrates 6
force-calculation pipelines, compared to one pipeline of GRAPE-4 (which needed
3 clock cycles to calculate one interaction), (b) the clock speed is increased
from 32 to 90 MHz, and (c) the total number of processor chips is increased
from 1728 to 2048. These improvements resulted in the peak speed of 64 Tflops.
We also discuss the design of the successor of GRAPE-6.Comment: Accepted for publication in PASJ, scheduled to appear in Vol. 55, No.
Evaluating critical bits in arithmetic operations due to timing violations
Various error models are being used in simulation of voltage-scaled arithmetic units to examine application-level tolerance of timing violations. The selection of an error model needs further consideration, as differences in error models drastically affect the performance of the application. Specifically, floating point arithmetic units (FPUs) have architectural characteristics that characterize its behavior. We examine the architecture of FPUs and design a new error model, which we call Critical Bit. We run selected benchmark applications with Critical Bit and other widely used error injection models to demonstrate the differences
A Reconfigurable Vector Instruction Processor for Accelerating a Convection Parametrization Model on FPGAs
High Performance Computing (HPC) platforms allow scientists to model
computationally intensive algorithms. HPC clusters increasingly use
General-Purpose Graphics Processing Units (GPGPUs) as accelerators; FPGAs
provide an attractive alternative to GPGPUs for use as co-processors, but they
are still far from being mainstream due to a number of challenges faced when
using FPGA-based platforms. Our research aims to make FPGA-based high
performance computing more accessible to the scientific community. In this work
we present the results of investigating the acceleration of a particular
atmospheric model, Flexpart, on FPGAs. We focus on accelerating the most
computationally intensive kernel from this model. The key contribution of our
work is the architectural exploration we undertook to arrive at a solution that
best exploits the parallelism available in the legacy code, and is also
convenient to program, so that eventually the compilation of high-level legacy
code to our architecture can be fully automated. We present the three different
types of architecture, comparing their resource utilization and performance,
and propose that an architecture where there are a number of computational
cores, each built along the lines of a vector instruction processor, works best
in this particular scenario, and is a promising candidate for a generic
FPGA-based platform for scientific computation. We also present the results of
experiments done with various configuration parameters of the proposed
architecture, to show its utility in adapting to a range of scientific
applications.Comment: This is an extended pre-print version of work that was presented at
the international symposium on Highly Efficient Accelerators and
Reconfigurable Technologies (HEART2014), Sendai, Japan, June 911, 201
- …