1,214 research outputs found
VERILOG DESIGN AND FPGA PROTOTYPE OF A NANOCONTROLLER SYSTEM
Many new fabrication technologies, from nanotechnology and MEMS to printed organic semiconductors, center on constructing arrays of large numbers of sensors, actuators, or other devices on a single substrate. The utility of such an array could be greatly enhanced if each device could be managed by a programmable controller and all of these controllers could coordinate their actions as a massively-parallel computer. Kentucky Architecture nanocontroller array with very low per controller circuit complexity can provide efficient control of nanotechnology devices.
This thesis provides a detailed description of the control hierarchy of a digital system needed to build nanocontrollers suitable for controlling millions of devices on a single chip. A Verilog design and FPGA prototype of a nanocontroller system is provided to meet the constraints associated with a massively-parallel programmable controller system
Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program
We describe the parallel implementation of our generalized stellar atmosphere
and NLTE radiative transfer computer program PHOENIX. We discuss the parallel
algorithms we have developed for radiative transfer, spectral line opacity, and
NLTE opacity and rate calculations. Our implementation uses a MIMD design based
on a relatively small number of MPI library calls. We report the results of
test calculations on a number of different parallel computers and discuss the
results of scalability tests.Comment: To appear in ApJ, 1997, vol 483. LaTeX, 34 pages, 3 Figures, uses
AASTeX macros and styles natbib.sty, and psfig.st
Turbomachinery CFD on parallel computers
The role of multistage turbomachinery simulation in the development of propulsion system models is discussed. Particularly, the need for simulations with higher fidelity and faster turnaround time is highlighted. It is shown how such fast simulations can be used in engineering-oriented environments. The use of parallel processing to achieve the required turnaround times is discussed. Current work by several researchers in this area is summarized. Parallel turbomachinery CFD research at the NASA Lewis Research Center is then highlighted. These efforts are focused on implementing the average-passage turbomachinery model on MIMD, distributed memory parallel computers. Performance results are given for inviscid, single blade row and viscous, multistage applications on several parallel computers, including networked workstations
Recommended from our members
A strategy for mapping unstructured mesh computational mechanics programs onto distributed memory parallel architectures
The motivation of this thesis was to develop strategies that would enable unstructured mesh based computational mechanics codes to exploit the computational advantages offered by distributed memory parallel processors. Strategies that successfully map structured mesh codes onto parallel machines have been developed over the previous decade and used to build a toolkit for automation of the parallelisation process. Extension of the capabilities of this toolkit to include unstructured mesh codes requires new strategies to be developed.
This thesis examines the method of parallelisation by geometric domain decomposition using the single program multi data programming paradigm with explicit message passing. This technique involves splitting (decomposing) the problem definition into P parts that may be distributed over P processors in a parallel machine. Each processor runs the same program and operates only on its part of the problem. Messages passed between the processors allow data exchange to maintain consistency with the original algorithm.
The strategies developed to parallelise unstructured mesh codes should meet a number of requirements:
The algorithms are faithfully reproduced in parallel.
The code is largely unaltered in the parallel version.
The parallel efficiency is maximised.
The techniques should scale to highly parallel systems.
The parallelisation process should become automated.
Techniques and strategies that meet these requirements are developed and tested in this dissertation using a state of the art integrated computational fluid dynamics and solid mechanics code. The results presented demonstrate the importance of the problem partition in the definition of inter-processor communication and hence parallel performance.
The classical measure of partition quality based on the number of cut edges in the mesh partition can be inadequate for real parallel machines. Consideration of the topology of the parallel machine in the mesh partition is demonstrated to be a more significant factor than the number of cut edges in the achieved parallel efficiency. It is shown to be advantageous to allow an increase in the volume of communication in order to achieve an efficient mapping dominated by localised communications. The limitation to parallel performance resulting from communication startup latency is clearly revealed together with strategies to minimise the effect.
The generic application of the techniques to other unstructured mesh codes is discussed in the context of automation of the parallelisation process. Automation of parallelisation based on the developed strategies is presented as possible through the use of run time inspector loops to accurately determine the dependencies that define the necessary inter-processor communication
An Efficient Transport Protocol for delivery of Multimedia An Efficient Transport Protocol for delivery of Multimedia Content in Wireless Grids
A grid computing system is designed for solving complicated scientific and
commercial problems effectively,whereas mobile computing is a traditional
distributed system having computing capability with mobility and adopting
wireless communications. Media and Entertainment fields can take advantage from
both paradigms by applying its usage in gaming applications and multimedia data
management. Multimedia data has to be stored and retrieved in an efficient and
effective manner to put it in use. In this paper, we proposed an application
layer protocol for delivery of multimedia data in wireless girds i.e.
multimedia grid protocol (MMGP). To make streaming efficient a new video
compression algorithm called dWave is designed and embedded in the proposed
protocol. This protocol will provide faster, reliable access and render an
imperceptible QoS in delivering multimedia in wireless grid environment and
tackles the challenging issues such as i) intermittent connectivity, ii) device
heterogeneity, iii) weak security and iv) device mobility.Comment: 20 pages, 15 figures, Peer Reviewed Journa
Shared versus distributed memory multiprocessors
The question of whether multiprocessors should have shared or distributed memory has attracted a great deal of attention. Some researchers argue strongly for building distributed memory machines, while others argue just as strongly for programming shared memory multiprocessors. A great deal of research is underway on both types of parallel systems. Special emphasis is placed on systems with a very large number of processors for computation intensive tasks and considers research and implementation trends. It appears that the two types of systems will likely converge to a common form for large scale multiprocessors
A New Parallel N-body Gravity Solver: TPM
We have developed a gravity solver based on combining the well developed
Particle-Mesh (PM) method and TREE methods. It is designed for and has been
implemented on parallel computer architectures. The new code can deal with tens
of millions of particles on current computers, with the calculation done on a
parallel supercomputer or a group of workstations. Typically, the spatial
resolution is enhanced by more than a factor of 20 over the pure PM code with
mass resolution retained at nearly the PM level. This code runs much faster
than a pure TREE code with the same number of particles and maintains almost
the same resolution in high density regions. Multiple time step integration has
also been implemented with the code, with second order time accuracy. The
performance of the code has been checked in several kinds of parallel computer
configuration, including IBM SP1, SGI Challenge and a group of workstations,
with the speedup of the parallel code on a 32 processor IBM SP2 supercomputer
nearly linear (efficiency ) in the number of processors. The
computation/communication ratio is also very high (), which means the
code spends of its CPU time in computation.Comment: 21 Pages Latex file Figures available from anonymous ftp to
astro.princeton.edu under /xu/tpm.ps, POP-57
- …