55 research outputs found

    Parallel implementation of QRD algorithms on the Fujitsu AP1000

    Get PDF
    This report addresses several important aspects of parallel implementation of QR decomposition of a matrix on a distributed memory MIMD machine, the Fujitsu AP1000. They include: Among various QR decomposition algorithms, which one is most suitable for implementation on the AP1000? With the total number of cells given, what is the best aspect ratio of the array to achieve optimal performance? How efficient is the AP1000 in computing the QR decomposition of a matrix? To help answer these questions we have implemented various orthogonal factorisation algorithms on a 128-cell AP1000 located at the Australian National University. After extensive experiments some interesting results have been obtained and are presented in the report

    A parallel architecture for query processing over a terabyte of text

    Get PDF
    The Parallel Document Retrieval Engine (PADRE) has previously demonstrated that full text scanning methods supported by parallel hardware permit powerful query constructors and rapid response to changing document collections. Extensions to PADRE have been designed and implemented which make use of parallel secondary storage to allow each processing node to handle data up to 32 times the size of its primary memory. Using the largest purchasable machine on which PADRE currently runs, these increase the maximum possible collection size to one terabyte. This paper addresses the practicality of achieving this limit and the extent to which the performance, responsiveness, functionality and scalability of the full text scanning PADRE are preserved in the extended version

    Automatic visual recognition using parallel machines

    Get PDF
    Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity. In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods. Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration. A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture

    Efficient algorithms and implementations for signal processing

    Get PDF
    A scheme is presented to regain a finite number of lost samples from a Nyquist-rate-sampled band-limited signal f of finite energy by replenishing new sample values of the same number. The result can also be viewed as the solution to a special non-uniform sampling problem. A scheme is also presented to recover a band-limited function f of finite energy from its sampling values on real sequences with an accumulation point. The result given here can also be viewed as an approach to the extrapolation problem of determination a band-limited function in terms of its given values on a finite interval. An error estimate is also obtained. The existence of two kinds of frames, Weyl-Heisenberg frames and affine frames, is studied. The conditions given in this dissertation improve the known conditions and, in addition, are easy to verify. A parallel algorithm for the two-dimensional forward fast wavelet transform is developed and implemented on the AP1000 multiprocessor system. The algorithm is carefully analyzed before implementation. Experiments are performed on different input sizes on different numbers of processors. The results from the experiments coincide with the theoretical analysis. The parallel algorithm gains expected speedup on the mesh architecture. Further work is suggested

    The design and implementation of a parallel document retrieval engine

    Get PDF
    Document retrieval as traditionally formulated is an inherently parallel task because the document collection can be divided into N sub-collections each of which may be searched independently. Document retrieval software can potentially exploit the power and capacity of a large-scale parallel machine to improve speed, to extend the size of the largest collection which can be processed, to respond quickly to changes in the document collection and/or to increase the power and expressivity of the retrieval query language. This paper includes discussion of the issues involved in the design of a practical parallel document retrieval engine for a distributed-memory multicomputer and a description of the implementation of PADRE, a retrieval engine for the Fujitsu AP1000. Performance results are presented and scope of applicability of the techniques is discussed

    A Multilevel in Space and Energy Solver for Multigroup Diffusion and Coarse Mesh Finite Difference Eigenvalue Problems

    Full text link
    In reactor physics, the efficient solution of the multigroup neutron diffusion eigenvalue problem is desired for various applications. The diffusion problem is a lower-order but reasonably accurate approximation to the higher-fidelity multigroup neutron transport eigenvalue problem. In cases where the full-fidelity of the transport solution is needed, the solution of the diffusion problem can be used to accelerate the convergence of transport solvers via methods such as Coarse Mesh Finite Difference (CMFD). The diffusion problem can have O(108) unknowns, and, despite being orders of magnitude smaller than a typical transport problem, obtaining its solution is still not a trivial task. In the Michigan Parallel Characteristics Transport (MPACT) code, the lack of an efficient CMFD solver has resulted in a computational bottleneck at the CMFD step. Solving the CMFD system can comprise 50% or more of the overall runtime in MPACT when the de facto default CMFD solver is used; addressing this bottleneck is the motivation for our work. The primary focus of this thesis is the theory, development, implementation, and testing of a new Multilevel-in-Space-and-Energy Diffusion (MSED) method for efficiently solving multigroup diffusion and CMFD eigenvalue problems. As its name suggests, MSED efficiently converges multigroup diffusion and CMFD problems by leveraging lower-order systems with coarsened energy and/or spatial grids. The efficiency of MSED is verified via various Fourier analyses of its components and via testing in a 1-D diffusion code. In the later chapters of this thesis, the MSED method is tested on a variety of reactor problems in MPACT. Compared to the default CMFD solver, our implementation of MSED in MPACT has resulted in an ~8-12x reduction in the CMFD runtime required by MPACT for single statepoint calculations on 3-D, full-core, 51-group reactor models. The number of transport sweeps is also typically reduced by the use of MSED, which is able to better converge the CMFD system than the default CMFD solver. This leads to a further savings in overall runtime that is not captured by the differences in CMFD runtime.PHDNuclear Engineering & Radiological SciencesUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttps://deepblue.lib.umich.edu/bitstream/2027.42/146075/1/bcyee_1.pd

    Compilation techniques for multicomputers

    No full text
    This thesis considers problems in process and data partitioning when compiling programs for distributed-memory parallel computers (or multicomputers). These partitions may be specified by the user through the use of language constructs, or automatically determined by the compiler. Data and process partitioning techniques are developed for two models of compilation. The first compilation model focusses on the loop nests present in a serial program. Executing the iterations of these loop nests in parallel accounts for a significant amount of the parallelism which can be exploited in these programs. The parallelism is exploited by applying a set of transformations to the loop nests. The iterations of the transformed loop nests are in a form which can be readily distributed amongst the processors of a multicomputer. The manner in which the arrays, referenced within these loop nests, are partitioned between the processors is determined by the distribution of the loop iterations. The second compilation model is based on the data parallel paradigm, in which operations are applied to many different data items collectively. High Performance Fortran is used as an example of this paradigm. Novel collective communication routines are developed, and are applied to provide the communication associated with the data partitions for both compilation models. Furthermore, it is shown that by using these routines the communication associated with partitioning data on a multicomputer is greatly simplified. These routines are developed as part of this thesis. The experimental context for this thesis is the development of a compiler for the Fujitsu AP1000 multicomputer. A prototype compiler is presented. Experimental results for a variety of applications are included

    Research Projects, Technical Reports and Publications

    Get PDF
    The Research Institute for Advanced Computer Science (RIACS) was established by the Universities Space Research Association (USRA) at the NASA Ames Research Center (ARC) on June 6, 1983. RIACS is privately operated by USRA, a consortium of universities with research programs in the aerospace sciences, under contract with NASA. The primary mission of RIACS is to provide research and expertise in computer science and scientific computing to support the scientific missions of NASA ARC. The research carried out at RIACS must change its emphasis from year to year in response to NASA ARC's changing needs and technological opportunities. A flexible scientific staff is provided through a university faculty visitor program, a post doctoral program, and a student visitor program. Not only does this provide appropriate expertise but it also introduces scientists outside of NASA to NASA problems. A small group of core RIACS staff provides continuity and interacts with an ARC technical monitor and scientific advisory group to determine the RIACS mission. RIACS activities are reviewed and monitored by a USRA advisory council and ARC technical monitor. Research at RIACS is currently being done in the following areas: Advanced Methods for Scientific Computing High Performance Networks During this report pefiod Professor Antony Jameson of Princeton University, Professor Wei-Pai Tang of the University of Waterloo, Professor Marsha Berger of New York University, Professor Tony Chan of UCLA, Associate Professor David Zingg of University of Toronto, Canada and Assistant Professor Andrew Sohn of New Jersey Institute of Technology have been visiting RIACS. January 1, 1996 through September 30, 1996 RIACS had three staff scientists, four visiting scientists, one post-doctoral scientist, three consultants, two research associates and one research assistant. RIACS held a joint workshop with Code 1 29-30 July 1996. The workshop was held to discuss needs and opportunities in basic research in computer science in and for NASA applications. There were 14 talks given by NASA, industry and university scientists and three open discussion sessions. There were approximately fifty participants. A proceedings is being prepared. It is planned to have similar workshops on an annual basis. RIACS technical reports are usually preprints of manuscripts that have been submitted to research 'ournals or conference proceedings. A list of these reports for the period January i 1, 1996 through September 30, 1996 is in the Reports and Abstracts section of this report

    The scalability of parallel computers for sparse QR factorisation

    Get PDF
    Sparse linear systems occur in areas such as finite element methods and statistics. These systems are often solved on parallel computers due to their size. In this paper a theoretical analysis of parallel sparse QR factorisation using a multifrontal method is undertaken. The analysis is quantified by some estimates of parallel speeds up for various parallel computers. These estimates show that only moderate parallel speedups can be attained
    • …
    corecore