49 research outputs found
Space-Efficient Parallel Algorithms for Combinatorial Search Problems
We present space-efficient parallel strategies for two fundamental
combinatorial search problems, namely, backtrack search and branch-and-bound,
both involving the visit of an -node tree of height under the assumption
that a node can be accessed only through its father or its children. For both
problems we propose efficient algorithms that run on a -processor
distributed-memory machine. For backtrack search, we give a deterministic
algorithm running in time, and a Las Vegas algorithm requiring
optimal time, with high probability. Building on the backtrack
search algorithm, we also derive a Las Vegas algorithm for branch-and-bound
which runs in time, with high probability. A
remarkable feature of our algorithms is the use of only constant space per
processor, which constitutes a significant improvement upon previous algorithms
whose space requirements per processor depend on the (possibly huge) tree to be
explored.Comment: Extended version of the paper in the Proc. of 38th International
Symposium on Mathematical Foundations of Computer Science (MFCS
Parallel path consistency
Journal ArticleFiltering algorithms are well accepted as a means of speeding up the solution of the consistent labeling problem (CLP). Despite the fact that path consistency does a better job of filtering than arc consistency, AC is still the preferred technique because it has a much lower time complexity. We are implementing parallel path consistency algorithms on multiprocessors and comparing their performance to the best sequential and parallel arc consistency algorithms. We also intend to categorize the relation between graph structure and algorithm performance. Preliminary work has shown linear performance increases for parallelized path consistency and also shown that in many cases performance is significantly better than the theoretical worst case. These two results lead us to believe that parallel path consistency may be a superior filtering technique, finally, we have explored the use of an outer product computational formation of path consistency and have excellent results of its use on a Connection Machine
Recommended from our members
An Empirical Study of Dynamic Scheduling on Rings of Processors
The authors empirically analyze and compare two distributed low-overhead policies for scheduling dynamic tree-structured computations on rings of identical PEs. The experiments show that both policies give significant parallel speedup on large classes of computations, and that one yields almost optimal speedup on moderate size rings. They believe that the methodology of experiment design and analysis will prove useful in other such studies
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited
Three Highly Parallel Computer Architectures and Their Suitability for Three Representative Artificial Intelligence Problems
Virtually all current Artificial Intelligence (AI) applications are designed to run on sequential (von Neumann) computer architectures. As a result, current systems do not scale up. As knowledge is added to these systems, a point is reached where their performance quickly degrades. The performance of a von Neumann machine is limited by the bandwidth between memory and processor (the von Neumann bottleneck). The bottleneck is avoided by distributing the processing power across the memory of the computer. In this scheme the memory becomes the processor (a smart memory ).
This paper highlights the relationship between three representative AI application domains, namely knowledge representation, rule-based expert systems, and vision, and their parallel hardware realizations. Three machines, covering a wide range of fundamental properties of parallel processors, namely module granularity, concurrency control, and communication geometry, are reviewed: the Connection Machine (a fine-grained SIMD hypercube), DADO (a medium-grained MIMD/SIMD/MSIMD tree-machine), and the Butterfly (a coarse-grained MIMD Butterflyswitch machine)
Performance of arc consistency algorithms on the CRAY
Journal ArticleThe consistent labeling problem arises in high level computer vision when assigning semantic meaning to the regions of a n image. One of the drawbacks of this method is that it is rather slow. By using the consistency tests, node, arc and path consistency [9], the search space is drastically reduced. However, for large problems it takes a fair amount of time. To use these algorithms more efficiently, one can take two approaches. First, is to design special purpose hardware to specifically run these algorithms. Second is t o use faster computers. Here again, one can either take advantage of the multiprocessors, which are becoming very widely available, or use supercomputers like the CRAY, CDC, etc. Here, we present results of the performance of these algorithms in the CRAY supercomputer
TACOS: Topology-Aware Collective Algorithm Synthesizer for Distributed Training
Collective communications are an indispensable part of distributed training.
Running a topology-aware collective algorithm is crucial for optimizing
communication performance by minimizing congestion. Today such algorithms only
exist for a small set of simple topologies, limiting the topologies employed in
training clusters and handling irregular topologies due to network failures. In
this paper, we propose TACOS, an automated topology-aware collective
synthesizer for arbitrary input network topologies. TACOS synthesized 3.73x
faster All-Reduce algorithm over baselines, and synthesized collective
algorithms for 512-NPU system in just 6.1 minutes
A Parallel Computational Approach for String Matching- A Novel Structure with Omega Model
In r e cent day2019;s parallel string matching problem catch the attention of so many researchers because of the importance in different applications like IRS, Genome sequence, data cleaning etc.,. While it is very easily stated and many of the simple algorithms perform very well in practice, numerous works have been published on the subject and research is still very active. In this paper we propose a omega parallel computing model for parallel string matching. The algorithm is designed to work on omega model pa rallel architecture where text is divided for parallel processing and special searching at division point is required for consistent and complete searching. This algorithm reduces the number of comparisons and parallelization improves the time efficiency. Experimental results show that, on a multi - processor system, the omega model implementation of the proposed parallel string matching algorithm can reduce string matching time
Recommended from our members
Working notes of the 1991 spring symposium on constraint-based reasoning
The Use of Parallel Processing in VLSI Computer-Aided Design Application
Coordinated Science Laboratory was formerly known as Control Systems LaboratorySemiconductor Research Corporation / 87-DP-10