388 research outputs found

    Parallel processing and expert systems

    Get PDF
    Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited

    A bibliography on parallel and vector numerical algorithms

    Get PDF
    This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also

    Acceleration of stereo-matching on multi-core CPU and GPU

    Get PDF
    This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding robot with real-time and high resolution requirements for the vision system. The performance analysis shows that the parallelised stereo-matching algorithm has been significantly accelerated, maintaining 12x and 176x speed-up respectively for multi-core CPU and GPU, compared with non-SIMD singlethread CPU. To analyse the origin of the speed-up and gain deeper understanding about the choice of the optimal hardware, the algorithm was broken into key sub-tasks and the performance was tested for four different hardware architectures

    Dynamic Systolization for Developing Multiprocessor Supercomputers

    Get PDF
    A dynamic network approach is introduced for developing reconfigurable, systolic arrays or wavefront processors; This allows one to design very powerful and flexible processors to be used in a general-purpose, reconfigurable, and fault-tolerant, multiprocessor computer system. The concepts of macro-dataflow and multitasking can be integrated to handle variable-resolution granularities in computationally intensive algorithms. A multiprocessor architecture, Remps, is proposed based on these design methodologies. The Remps architecture is generalized from the Cedar, HEP, Cray X- MP, Trac, NYU ultracomputer, S-l, Pumps, Chip, and SAM projects. Our goal is to provide a multiprocessor research model for developing design methodologies, multiprocessing and multitasking supports, dynamic systolic/wavefront array processors, interconnection networks, reconfiguration techniques, and performance analysis tools. These system design and operational techniques should be useful to those who are developing or evaluating multiprocessor supercomputers

    Solution of partial differential equations on vector and parallel computers

    Get PDF
    The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed

    Low power processor architecture and multicore approach for embedded systems

    Get PDF
    13301甲第4319号博士(工学)金沢大学博士論文本文Full 以下に掲載:1.IEICE Transactions Vol. E98-C(7) pp.544-549 2015. IEICE. 共著者: S. Otani, H. Kondo. /2.Reuse 許可エビデンス送

    Techniques for Reducing Consistency-Related Communication in Distributed Shared Memory System

    Get PDF
    Distributed shared memory 8DSM) is an abstraction of shared memory on a distributed memory machine. Hardware DSM systems support this abstraction at the architecture level; software DSM systems support the abstraction within the runtime system. One of the key problems in building an efficient software DSM system is to reduce the amount of communication needed to keep the distributed memories consistent. In this paper we present four techniques for doing so: 1) software release consistency; 2) multiple consistency protocols; (3) write-shared protocols; and (4) an update-with-timeout mechanism. These techniques have been implemented in the Munin DSM system. We compare the performance of seven Munin application programs, first to their performance when implemented using message passing, and then to their performance when running on a conventional software DSM system that does not embody the above techniques. On a 16-processor cluster of workstations, Munin’s performance is within 5% of message passing for four out of the seven applications. For the other three, performance is within 29% to 33%. Detailed analysis of two of these three applications indicates that the addition of a function shipping capability would bring their performance to within 7% of the message passing performance. Compared to a conventional DSM system, Munin achieves performance improvements ranging from a few to several hundred percent, depending on the application

    Dynamic adaptive parallel architecture integrates advanced technologies for petaflops-scale computing

    Get PDF
    Teraflops-scale computing systems are becoming available to an increasingly broad range of users as the performance of the constituent processing elements increases and their relative cost (e.g. per Mflops) decreases. To the original DOE ASCI Red machine has been added the ASCI Blue systems and additional 1 Teraflops commercial systems at key national centers. Clusters of low cost PCs employing COTS network technologies (e.g. Beowulf-class systems) will make peak Teraflops performance available for less than 2M in the near future for certain classes of well behaved problems. Future larger systems include the Japanese Earth Simulator with a peak performance of 40 Teraflops and three larger ASCI systems anticipated to provide peak performance of 10, 30, and 100 Teraflops culminating in 2005. These systems use existing or near term conventional technologies and architectures with some specialized integration logic and networking. While the peak performance goals can be satisfied through this strategy over the next decade, two major challenges confront the high performance computing community: (1) how to aggressively accelerate performance to the operational regime beyond a Petaflops, and (2) how to achieve high efficiency for a wide range of applications. The Hybrid Technology Multithreaded (HTMT) computer is under development by an interdisciplinary team of investigators to address both problems through an innovative combination of advanced technologies and dynamic adaptive architecture. This paper describes the strategy embodied by the HTMT architecture and discusses the key factors that may enable it to achieve two to three orders of magnitude performance with respect to today's largest systems at a cost and power consumption of only a factor of two to three times those same present day systems
    corecore