98 research outputs found
Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors
Characterization of message-passing overhead on the AP3000 multicomputer
This is a post-peer-review, pre-copyedit version. The final authenticated version is available online at: http://dx.doi.org/10.1109/ICPP.2001.952077[Abstract] The performance of the communication primitives of parallel computers is critical for the overall system performance. The characterization of the communication overhead is very important to estimate the global performance of parallel applications and to detect possible bottlenecks. In this paper, we evaluate, model and compare the performance of the message-passing libraries provided by the Fujitsu AP3000 multicomputer: MPI/AP, PVM/AP and APlib. Our aim is to fairly characterize the communication primitives using general models and performance metrics.Ministerio de Ciencia y Tecnología; 1FD97-0118-C02
The design and implementation of a parallel document retrieval engine
Document retrieval as traditionally formulated is an inherently parallel task because the document collection can be divided into N sub-collections each of which may be searched independently. Document retrieval software can potentially exploit the power and capacity of a large-scale parallel machine to improve speed, to extend the size of the largest collection which can be processed, to respond quickly to changes in the document collection and/or to increase the power and expressivity of the retrieval query language. This paper includes discussion of the issues involved in the design of a practical parallel document retrieval engine for a distributed-memory multicomputer and a description of the implementation of PADRE, a retrieval engine for the Fujitsu AP1000. Performance results are presented and scope of applicability of the techniques is discussed
Parameter Optimisation of a Virtual Synchronous Machine in a Microgrid
Parameters of a virtual synchronous machine in a small microgrid are
optimised. The dynamical behaviour of the system is simulated after a
perturbation, where the system needs to return to its steady state. The cost
functional evaluates the system behaviour for different parameters. This
functional is minimised by Parallel Tempering. Two perturbation scenarios are
investigated and the resulting optimal parameters agree with analytical
predictions. Dependent on the focus of the optimisation different optima are
obtained for each perturbation scenario. During the transient the system leaves
the allowed voltage and frequency bands only for a short time if the
perturbation is within a certain range.Comment: 17 pages, 5 figure
Automatic visual recognition using parallel machines
Invariant features and quick matching algorithms are two major concerns in the area of automatic visual recognition. The former reduces the size of an established model database, and the latter shortens the computation time. This dissertation, will discussed both line invariants under perspective projection and parallel implementation of a dynamic programming technique for shape recognition. The feasibility of using parallel machines can be demonstrated through the dramatically reduced time complexity.
In this dissertation, our algorithms are implemented on the AP1000 MIMD parallel machines. For processing an object with a features, the time complexity of the proposed parallel algorithm is O(n), while that of a uniprocessor is O(n2). The two applications, one for shape matching and the other for chain-code extraction, are used in order to demonstrate the usefulness of our methods.
Invariants from four general lines under perspective projection are also discussed in here. In contrast to the approach which uses the epipolar geometry, we investigate the invariants under isotropy subgroups. Theoretically speaking, two independent invariants can be found for four general lines in 3D space. In practice, we show how to obtain these two invariants from the projective images of four general lines without the need of camera calibration.
A projective invariant recognition system based on a hypothesis-generation-testing scheme is run on the hypercube parallel architecture. Object recognition is achieved by matching the scene projective invariants to the model projective invariants, called transfer. Then a hypothesis-generation-testing scheme is implemented on the hypercube parallel architecture
Compilation techniques for multicomputers
This thesis considers problems in process and data partitioning when compiling
programs for distributed-memory parallel computers (or multicomputers). These
partitions may be specified by the user through the use of language constructs,
or automatically determined by the compiler.
Data and process partitioning techniques are developed for two models of
compilation. The first compilation model focusses on the loop nests present in a
serial program. Executing the iterations of these loop nests in parallel accounts for
a significant amount of the parallelism which can be exploited in these programs.
The parallelism is exploited by applying a set of transformations to the loop
nests. The iterations of the transformed loop nests are in a form which can be
readily distributed amongst the processors of a multicomputer. The manner in
which the arrays, referenced within these loop nests, are partitioned between the
processors is determined by the distribution of the loop iterations. The second
compilation model is based on the data parallel paradigm, in which operations
are applied to many different data items collectively. High Performance Fortran
is used as an example of this paradigm.
Novel collective communication routines are developed, and are applied to
provide the communication associated with the data partitions for both compilation
models. Furthermore, it is shown that by using these routines the
communication associated with partitioning data on a multicomputer is greatly
simplified. These routines are developed as part of this thesis.
The experimental context for this thesis is the development of a compiler for
the Fujitsu AP1000 multicomputer. A prototype compiler is presented. Experimental
results for a variety of applications are included
Recommended from our members
Small Modular Boiling Water Reactor Combined with External Superheaters
In order to transform the current energy supply to low-carbon technology, the trade-off between sustainability, energy security and affordability has to be considered. The path forward lies between two alternatives, reducing the storage costs for the intermittent renewables or developing an affordable and more flexible nuclear power. One of the possible solutions proposed in this thesis is developing a Small Modular Boiling Water Reactor (SMBWR) combined with external superheaters.
The SMBWR is a BWR-type small modular reactor. It is designed to adopt natural recirculation of coolant within its primary system. The SMBWR is also combined with the external superheater system. The system consists of 3 pieces of equipment: a superheater, reheater and economiser. The heat for the external superheaters could be supplied by a conventional gas boiler, waste heat from gas turbines or heat stored in molten salt from Concentrated Solar Power (CSP) plant. By having the external superheaters, the SMBWR power conversion cycle efficiency could be substantially improved, which means more electric power could be generated, improving the economics of the reactor. Furthermore, it offers the possibility for the SMBWR to follow the load only by adjusting the external heat provided to the superheaters, while keeping the reactor power continuously at its maximum nominal level, which would be another major economic advantage of the SMBWR. The objectives of this thesis are to demonstrate that the concept is practical and to quantify a number of hypothesised benefits of the SMBWR with external superheaters.
The investigation on the effect of SMBWR operating pressure showed that increasing the SMBWR operating pressure from 6.5 to 10 MPa has no significant effect on the neutronic performance. It is also found that increase in pressure would reduce the core pressure drop but increase the minimum chimney height required to develop natural circulation. In terms of thermodynamics, it is found that increasing the SMBWR operating pressure from 6.5 to 10.0 MPa will improve its thermal efficiency slightly by Δη of about 1.2%, which is small but not negligible. In order to investigate the trade-off between neutron leakage (neutronics), chimney height requirement for natural circulation (thermal-hydraulics), and dimensions of the core, three different geometry configurations, accounting for different length to diameter ratios were studied. The investigation on the power manoeuvring capability of the SMBWR found that the combined system can reduce its load down to 65% by only reducing the external heat provided to the superheaters, while keeping the reactor operation at full rated power
Memory sharing for interactive ray tracing on clusters
ManuscriptWe present recent results in the application of distributed shared memory to image parallel ray tracing on clusters. Image parallel rendering is traditionally limited to scenes that are small enough to be replicated in the memory of each node, because any processor may require access to any piece of the scene. We solve this problem by making all of a cluster's memory available through software distributed shared memory layers. With gigabit ethernet connections, this mechanism is sufficiently fast for interactive rendering of multi-gigabyte datasets. Object- and page-based distributed shared memories are compared, and optimizations for efficient memory use are discussed
Recommended from our members
Federal Register
Daily publication of the U.S. Office of the Federal Register contains rules and regulations, proposed legislation and rule changes, and other notices, including "Presidential proclamations and Executive Orders, Federal agency documents having general applicability and legal effect, documents required to be published by act of Congress, and other Federal agency documents of public interest" (p. ii). Table of Contents starts on page iii
- …