625 research outputs found
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited
Some aspects of the efficient use of multiprocessor control systems
Computer technology, particularly at the circuit level, is fast
approaching its physical limitations. As future needs for greater
power from computing systems grows, increases in circuit switching
speed (and thus instruction speed) will be unable to match these
requirements.
Greater power can also be obtained by incorporating several processing
units into a single system. This ability to increase the performance
of a system by the addition of processing units is one of the major
advantages of multiprocessor systems. Four major characteristics of
multiprocessor systems have been identified (28) which demonstrate
their advantage. These are:-
Throughput
Flexibility
Availability
Reliability
The additional throughput obtained from a multiprocessor has been
mentioned above.. This increase in the power of the system can be
obtained in a modular fashion with extra processors being added as
greater processing needs arise. The addition of extra processors
also has (in general) the desirable advantage of giving a smoother
cost - performance curve ( 63). Flexibility is obtained from the
increased ability to construct a system matching the user 'requirements
at a given time without placing restrictions upon future expansion.
With multiprocessor systems; the potential also exists of making
greater use of the resources within the system.
Availability and reliability are inter-related. Increased availability
is achieved, in a well designed system, by ensuring that processing
capabilities can be provided to the user even if one (or more) of the
processing units has failed. The service provided, however, will
probably be degraded due to the reduction in processing capacity.
Increased reliability is obtained by the ability of the processing
units to compensate for the failure of one of their number. This
recovery may involve complex software checks and a consequent decrease
in available power even when all the units are functioning
Exploration of Dynamic Memory
Since the advent of the Java programming language and the development of real-time garbage collection, Java has become an option for implementing real-time applications. The memory management choices provided by real-time garbage collection allow for real-time eJava developers to spend more of their time implementing real-time solutions. Unfortunately, the real-time community is not convinced that real-time garbage collection works in managing memory for Java applications deployed in a real-time context. Consequently, the Real-Time for Java Expert Group formulated the Real-Time Specification for Java (RTSJ) standards to make Java a real-time programming language. In lieu of garbage collection, the RTSJ proposed a new memory model called scopes, and a new type of thread called NoHeapRealTimeThread (NHRT), which takes advantage of scopes. While scopes and NHRTs promise predictable allocation and deallocation behaviors, no asymptotic studies have been conducted to investigate the costs associated with these technologies. To understand the costs associated with using these technologies to manage memory, computations and analyses of time and space overheads associated with scopes and NHRTs are presented. These results provide a framework for comparing the RTSJ’s memory management model with real-time garbage collection. Another facet of this research concerns the optimization of novel approaches to garbage collection on multiprocessor systems. Such approaches yield features that are suitable for real-time systems. Although multiprocessor, concurrent garbage collection is not the same as real-time garbage collection, advancements in multiprocessor concurrent garbage collection have demonstrated the feasibility of building low latency multiprocessor real-time garbage collectors. In the nineteen-sixties, only three garbage collection schemes were available, namely reference counting garbage collection, mark-sweep garbage collection, and copying garbage collection. These classical approaches gave new insight into the discipline of memory management and inspired researchers to develop new, more elaborate memory-management techniques. Those insights resulted in a plethora of automatic memory management algorithms and techniques, and a lack of uniformity in the language used to reason about garbage collection. To bring a sense of uniformity to the language used to reason about garbage collection technologies, a taxonomy for comparing garbage collection technologies is presented
Architectures for reasoning in parallel
The research conducted has dealt with rule-based expert systems. The algorithms that may lead to effective parallelization of them were investigated. Both the forward and backward chained control paradigms were investigated in the course of this work. The best computer architecture for the developed and investigated algorithms has been researched. Two experimental vehicles were developed to facilitate this research. They are Backpac, a parallel backward chained rule-based reasoning system and Datapac, a parallel forward chained rule-based reasoning system. Both systems have been written in Multilisp, a version of Lisp which contains the parallel construct, future. Applying the future function to a function causes the function to become a task parallel to the spawning task. Additionally, Backpac and Datapac have been run on several disparate parallel processors. The machines are an Encore Multimax with 10 processors, the Concert Multiprocessor with 64 processors, and a 32 processor BBN GP1000. Both the Concert and the GP1000 are switch-based machines. The Multimax has all its processors hung off a common bus. All are shared memory machines, but have different schemes for sharing the memory and different locales for the shared memory. The main results of the investigations come from experiments on the 10 processor Encore and the Concert with partitions of 32 or less processors. Additionally, experiments have been run with a stripped down version of EMYCIN
The Lock-free -LSM Relaxed Priority Queue
Priority queues are data structures which store keys in an ordered fashion to
allow efficient access to the minimal (maximal) key. Priority queues are
essential for many applications, e.g., Dijkstra's single-source shortest path
algorithm, branch-and-bound algorithms, and prioritized schedulers.
Efficient multiprocessor computing requires implementations of basic data
structures that can be used concurrently and scale to large numbers of threads
and cores. Lock-free data structures promise superior scalability by avoiding
blocking synchronization primitives, but the \emph{delete-min} operation is an
inherent scalability bottleneck in concurrent priority queues. Recent work has
focused on alleviating this obstacle either by batching operations, or by
relaxing the requirements to the \emph{delete-min} operation.
We present a new, lock-free priority queue that relaxes the \emph{delete-min}
operation so that it is allowed to delete \emph{any} of the smallest
keys, where is a runtime configurable parameter. Additionally, the
behavior is identical to a non-relaxed priority queue for items added and
removed by the same thread. The priority queue is built from a logarithmic
number of sorted arrays in a way similar to log-structured merge-trees. We
experimentally compare our priority queue to recent state-of-the-art lock-free
priority queues, both with relaxed and non-relaxed semantics, showing high
performance and good scalability of our approach.Comment: Short version as ACM PPoPP'15 poste
Using a functional language and graph reduction to program multiprocessor machines or functional control of imperative programs
Journal ArticleThis paper describes an effective means for programming shared memory multiprocessors whereby a set of sequential activities are linked together for execution in parallel. The glue for this linkage is provided by a functional language implemented via graph reduction and demand evaluation. The full power of functional programming is used to obtain succinct, high level specifications of parallel computations. The imperative procedures that constitute the sequential activities facilitate efficient utilization of individual processing elements, while the mechanisms inherent in graph reduction synchronize and schedule these activities. The main contributions of this paper are: 1) an evaluation of the performance implications of parallel graph reduction; 2) a demonstration that the mechanisms of graph reduction can obtain multiprocessor performance uniformly surpassing the best uni-processor implementation of sequential algorithms running on a single node of the same machine, and 3) an illustration of our method used to program a real world fluid flow simulation problem
A survey of checkpointing algorithms for parallel and distributed computers
Checkpoint is defined as a designated place in a program at which normal processing is interrupted specifically to preserve the status information necessary to allow resumption of processing at a later time. Checkpointing is the process of saving the status information. This paper surveys the algorithms which have been reported in the literature for checkpointing parallel/distributed systems. It has been observed that most of the algorithms published for checkpointing in message passing systems are based on the seminal article by Chandy and Lamport. A large number of articles have been published in this area by relaxing the assumptions made in this paper and by extending it to minimise the overheads of coordination and context saving. Checkpointing for shared memory systems primarily extend cache coherence protocols to maintain a consistent memory. All of them assume that the main memory is safe for storing the context. Recently algorithms have been published for distributed shared memory systems, which extend the cache coherence protocols used in shared memory systems. They however also include methods for storing the status of distributed memory in stable storage. Most of the algorithms assume that there is no knowledge about the programs being executed. It is however felt that in development of parallel programs the user has to do a fair amount of work in distributing tasks and this information can be effectively used to simplify checkpointing and rollback recovery
Virtual Machine Support for Many-Core Architectures: Decoupling Abstract from Concrete Concurrency Models
The upcoming many-core architectures require software developers to exploit
concurrency to utilize available computational power. Today's high-level
language virtual machines (VMs), which are a cornerstone of software
development, do not provide sufficient abstraction for concurrency concepts. We
analyze concrete and abstract concurrency models and identify the challenges
they impose for VMs. To provide sufficient concurrency support in VMs, we
propose to integrate concurrency operations into VM instruction sets.
Since there will always be VMs optimized for special purposes, our goal is to
develop a methodology to design instruction sets with concurrency support.
Therefore, we also propose a list of trade-offs that have to be investigated to
advise the design of such instruction sets.
As a first experiment, we implemented one instruction set extension for
shared memory and one for non-shared memory concurrency. From our experimental
results, we derived a list of requirements for a full-grown experimental
environment for further research
- …