521 research outputs found
Evaluating kernels on Xeon Phi to accelerate Gysela application
This work describes the challenges presented by porting parts ofthe Gysela
code to the Intel Xeon Phi coprocessor, as well as techniques used for
optimization, vectorization and tuning that can be applied to other
applications. We evaluate the performance of somegeneric micro-benchmark on Phi
versus Intel Sandy Bridge. Several interpolation kernels useful for the Gysela
application are analyzed and the performance are shown. Some memory-bound and
compute-bound kernels are accelerated by a factor 2 on the Phi device compared
to Sandy architecture. Nevertheless, it is hard, if not impossible, to reach a
large fraction of the peek performance on the Phi device,especially for
real-life applications as Gysela. A collateral benefit of this optimization and
tuning work is that the execution time of Gysela (using 4D advections) has
decreased on a standard architecture such as Intel Sandy Bridge.Comment: submitted to ESAIM proceedings for CEMRACS 2014 summer school version
reviewe
NASA SBIR abstracts of 1991 phase 1 projects
The objectives of 301 projects placed under contract by the Small Business Innovation Research (SBIR) program of the National Aeronautics and Space Administration (NASA) are described. These projects were selected competitively from among proposals submitted to NASA in response to the 1991 SBIR Program Solicitation. The basic document consists of edited, non-proprietary abstracts of the winning proposals submitted by small businesses. The abstracts are presented under the 15 technical topics within which Phase 1 proposals were solicited. Each project was assigned a sequential identifying number from 001 to 301, in order of its appearance in the body of the report. Appendixes to provide additional information about the SBIR program and permit cross-reference of the 1991 Phase 1 projects by company name, location by state, principal investigator, NASA Field Center responsible for management of each project, and NASA contract number are included
PROGRAPE-1: A Programmable, Multi-Purpose Computer for Many-Body Simulations
We have developed PROGRAPE-1 (PROgrammable GRAPE-1), a programmable
multi-purpose computer for many-body simulations. The main difference between
PROGRAPE-1 and "traditional" GRAPE systems is that the former uses FPGA (Field
Programmable Gate Array) chips as the processing elements, while the latter
rely on the hardwired pipeline processor specialized to gravitational
interactions. Since the logic implemented in FPGA chips can be reconfigured, we
can use PROGRAPE-1 to calculate not only gravitational interactions but also
other forms of interactions such as van der Waals force, hydrodynamical
interactions in SPH calculation and so on. PROGRAPE-1 comprises two Altera
EPF10K100 FPGA chips, each of which contains nominally 100,000 gates. To
evaluate the programmability and performance of PROGRAPE-1, we implemented a
pipeline for gravitational interaction similar to that of GRAPE-3. One pipeline
fitted into a single FPGA chip, which operated at 16 MHz clock. Thus, for
gravitational interaction, PROGRAPE-1 provided the speed of 0.96
Gflops-equivalent. PROGRAPE will prove to be useful for wide-range of
particle-based simulations in which the calculation cost of interactions other
than gravity is high, such as the evaluation of SPH interactions.Comment: 20 pages with 9 figures; submitted to PAS
NASA information sciences and human factors program
The FY-90 descriptions of technical accomplishments are contained in seven sections: Automation and Robotics, Communications, Computer Sciences, Controls and Guidance, Data Systems, Human Factors, and Sensor Technology
The instruction of systolic array (ISA) and simulation of parallel algorithms
Systolic arrays have proved to be well suited for Very Large
Scale Integrated technology (VLSI) since they:
-Consist of a regular network of simple processing cells,
-Use local communication between the processing cells only,
-Exploit a maximal degree of parallelism.
However, systolic arrays have one main disadvantage compared with
other parallel computer architectures: they are special purpose
architectures only capable of executing one algorithm, e.g., a
systolic array designed for sorting cannot be used to form matrix
multiplication.
Several approaches have been made to make systolic arrays more
flexible, in order to be able to handle different problems on a
single systolic array.
In this thesis an alternative concept to a VLSI-architecture
the Soft-Systolic Simulation System (SSSS), is introduced and
developed as a working model of virtual machine with the power to
simulate hard systolic arrays and more general forms of concurrency
such as the SIMD and MIMD models of computation.
The virtual machine includes a processing element consisting of
a soft-systolic processor implemented in the virtual.machine language.
The processing element considered here was a very general element
which allows the choice of a wide range of arithmetic and logical
operators and allows the simulation of a wide class of algorithms
but in principle extra processing cells can be added making a library
and this library be tailored to individual needs.
The virtual machine chosen for this implementation is the
Instruction Systolic Array (ISA). The ISA has a number of interesting
features, firstly it has been used to simulate all SIMD algorithms
and many MIMD algorithms by a simple program transformation technique,
further, the ISA can also simulate the so-called wavefront processor
algorithms, as well as many hard systolic algorithms. The ISA removes
the need for the broadcasting of data which is a feature of SIMD
algorithms (limiting the size of the machine and its cycle time) and also presents a fairly simple communication structure for MIMD
algorithms.
The model of systolic computation developed from the VLSI
approach to systolic arrays is such that the processing surface is
fixed, as are the processing elements or cells by virtue of their
being embedded in the processing surface.
The VLSI approach therefore freezes instructions and hardware
relative to the movement of data with the virtual machine and softsystolic
programming retaining the constructions of VLSI for array
design features such as regularity, simplicity and local communication,
allowing the movement of instructions with respect to data. Data can
be frozen into the structure with instructions moving systolically.
Alternatively both the data and instructions can move systolically
around the virtual processors, (which are deemed fixed relative to
the underlying architecture).
The ISA is implemented in OCCAM programs whose execution and
output implicitly confirm the correctness of the design.
The soft-systolic preparation comprises of the usual operating
system facilities for the creation and modification of files during
the development of new programs and ISA processor elements. We allow
any concurrent high level language to be used to model the softsystolic
program. Consequently the Replicating Instruction Systolic
Array Language (RI SAL) was devised to provide a very primitive program
environment to the ISA but adequate for testing. RI SAL accepts
instructions in an assembler-like form, but is fairly permissive
about the format of statements, subject of course to syntax.
The RI SAL compiler is adopted to transform the soft-systolic
program description (RISAL) into a form suitable for the virtual
machine (simulating the algorithm) to run.
Finally we conclude that the principles mentioned here can form
the basis for a soft-systolic simulator using an orthogonally
connected mesh of processors. The wide range of algorithms which the
ISA can simulate make it suitable for a virtual simulating grid
- …