1,680 research outputs found

    Parallel Implementation of the PHOENIX Generalized Stellar Atmosphere Program. II: Wavelength Parallelization

    Get PDF
    We describe an important addition to the parallel implementation of our generalized NLTE stellar atmosphere and radiative transfer computer program PHOENIX. In a previous paper in this series we described data and task parallel algorithms we have developed for radiative transfer, spectral line opacity, and NLTE opacity and rate calculations. These algorithms divided the work spatially or by spectral lines, that is distributing the radial zones, individual spectral lines, or characteristic rays among different processors and employ, in addition task parallelism for logically independent functions (such as atomic and molecular line opacities). For finite, monotonic velocity fields, the radiative transfer equation is an initial value problem in wavelength, and hence each wavelength point depends upon the previous one. However, for sophisticated NLTE models of both static and moving atmospheres needed to accurately describe, e.g., novae and supernovae, the number of wavelength points is very large (200,000--300,000) and hence parallelization over wavelength can lead both to considerable speedup in calculation time and the ability to make use of the aggregate memory available on massively parallel supercomputers. Here, we describe an implementation of a pipelined design for the wavelength parallelization of PHOENIX, where the necessary data from the processor working on a previous wavelength point is sent to the processor working on the succeeding wavelength point as soon as it is known. Our implementation uses a MIMD design based on a relatively small number of standard MPI library calls and is fully portable between serial and parallel computers.Comment: AAS-TeX, 15 pages, full text with figures available at ftp://calvin.physast.uga.edu/pub/preprints/Wavelength-Parallel.ps.gz ApJ, in pres

    Scalable Parallel Computers for Real-Time Signal Processing

    Get PDF
    We assess the state-of-the-art technology in massively parallel processors (MPPs) and their variations in different architectural platforms. Architectural and programming issues are identified in using MPPs for time-critical applications such as adaptive radar signal processing. We review the enabling technologies. These include high-performance CPU chips and system interconnects, distributed memory architectures, and various latency hiding mechanisms. We characterize the concept of scalability in three areas: resources, applications, and technology. Scalable performance attributes are analytically defined. Then we compare MPPs with symmetric multiprocessors (SMPs) and clusters of workstations (COWs). The purpose is to reveal their capabilities, limits, and effectiveness in signal processing. We evaluate the IBM SP2 at MHPCC, the Intel Paragon at SDSC, the Gray T3D at Gray Eagan Center, and the Gray T3E and ASCI TeraFLOP system proposed by Intel. On the software and programming side, we evaluate existing parallel programming environments, including the models, languages, compilers, software tools, and operating systems. Some guidelines for program parallelization are provided. We examine data-parallel, shared-variable, message-passing, and implicit programming models. Communication functions and their performance overhead are discussed. Available software tools and communication libraries are also introducedpublished_or_final_versio

    Parallel algorithm for singular value decomposition as applied to failure tolerant manipulators, A

    Get PDF
    Includes bibliographical references (pages [348-349]).The system of equations that govern kinematically redundant manipulators is commonly solved by finding the singular value decomposition (SVD) of the corresponding Jacobian matrix. This can require considerable amounts of time to compute, thus a parallel SVD algorithm minimizing execution time is sought. The approach employed here lends itself to parallelization by using Givens rotations and information from previous decompositions. The key contributions of this research include the presentation and implementation of a new variation of a parallel SVD algorithm to compute the SVD for a set of post-fault Jacobians. Results from implementation of the algorithm on a MasPar MP-1 and an IBM SP2 are provided. Specific issues considered for each implementation include how data is mapped to the processing elements, the effect that increasing the number of processing elements has on execution time, and the type of parallel architecture used

    CASCH: a tool for computer-aided scheduling

    Get PDF
    A software tool called Computer-Aided Scheduling (CASCH) for parallel processing on distributed-memory multiprocessors in a complete parallel programming environment is presented. A compiler automatically converts sequential applications into parallel codes to perform program parallelization. The parallel code that executes on a target machine is optimized by CASCH through proper scheduling and mapping.published_or_final_versio

    An Analysis for Evaluating the Cost/Profit Effectiveness of Parallel Systems

    Get PDF
    A new domain of commercial applications demands the development of inexpensive parallel computing platforms to lower the cost of operations and increase the business profit. The calculation of returns on an IT investment is now important to justify the decision of upgrading or replacing parallel systems. This thesis presents a framework of the performance and economic factors that are considered when evaluating a parallel system. We introduce a metric called the cost/profit effective metric, which measures the effectiveness of a parallel system in terms of performance, cost and profit. This metric describes the profit obtained from the performance of three different domains for scaling: speed-up, throughput and/or scale-up. Cost is measured by the actual costs of a parallel system. We present two cases of study to demonstrate the application of this metric and analyze the results to support the evaluation of the parallel system on each case

    High Performance Computing Applications in Remote Sensing Studies for Land Cover Dynamics

    Get PDF
    Global and regional land cover studies require the ability to apply complex models on selected subsets of large amounts of multi-sensor and multi-temporal data sets that have been derived from raw instrument measurements using widely accepted pre-processing algorithms. The computational and storage requirements of most such studies far exceed what is possible on a single workstation environment. We have been pursuing a new approach that couples scalable and open distributed heterogeneous hardware with the development of high performance software for processing, indexing, and organizing remotely sensed data. Hierarchical data management tools are used to ingest raw data, create metadata, and organize the archived data so as to automatically achieve computational load balancing among the available nodes and minimize I/O overheads. We illustrate our approach with four specific examples. The first is the development of the first fast operational scheme for the atmospheric correction of Landsat TM scenes, while the second example focuses on image segmentation using a novel hierarchical connected components algorithm. Retrieval of global BRDF (Bidirectional Reflectance Distribution Function) in the red and near infrared wavelengths using four years (1983 to 1986) of Pathfinder AVHRR Land (PAL) data set is the focus of our third example. The fourth example is the development of a hierarchical data organization scheme that allows on-demand processing and retrieval of regional and global AVHRR data sets. Our results show that substantial improvements in computational times can be achieved by using the high performance computing technology

    Achieving Extreme Resolution in Numerical Cosmology Using Adaptive Mesh Refinement: Resolving Primordial Star Formation

    Full text link
    As an entry for the 2001 Gordon Bell Award in the "special" category, we describe our 3-d, hybrid, adaptive mesh refinement (AMR) code, Enzo, designed for high-resolution, multiphysics, cosmological structure formation simulations. Our parallel implementation places no limit on the depth or complexity of the adaptive grid hierarchy, allowing us to achieve unprecedented spatial and temporal dynamic range. We report on a simulation of primordial star formation which develops over 8000 subgrids at 34 levels of refinement to achieve a local refinement of a factor of 10^12 in space and time. This allows us to resolve the properties of the first stars which form in the universe assuming standard physics and a standard cosmological model. Achieving extreme resolution requires the use of 128-bit extended precision arithmetic (EPA) to accurately specify the subgrid positions. We describe our EPA AMR implementation on the IBM SP2 Blue Horizon system at the San Diego Supercomputer Center.Comment: 23 pages, 5 figures. Peer reviewed technical paper accepted to the proceedings of Supercomputing 2001. This entry was a Gordon Bell Prize finalist. For more information visit http://www.TomAbel.com/GB
    • …
    corecore