6,023 research outputs found
Green Parallel Metaheuristics: Design, Implementation, and Evaluation
Fecha de lectura de Tesis Doctoral 14 mayo 2020Green parallel metaheuristics (GPM) is a new concept we want to introduce in this thesis. It is an idea inspired by two facts: (i) parallel metaheuristics could help as unique tools to solve optimization problems in energy savings applications and sustainability, and (ii) these algorithms themselves run on multiprocessors, clusters, and grids of computers and then consume energy, so they need an energy analysis study for their different implementations over multiprocessors. The context for this thesis is to make a modern and competitive effort to extend the capability of present intelligent search optimization techniques. Analyzing the different sequential and parallel metaheuristics considering its energy consumption requires a deep investigation of the numerical performance, the execution time for efficient future designing to these algorithms. We present a study of the speed-up of the different parallel implementations over a different number of computing units. Moreover, we analyze and compare the energy consumption and numerical performance of the sequential/parallel algorithms and their components: a jump in the efficiency of the algorithms that would probably have a wide impact on the domains involved.El Instituto Egipcio en Madrid, dependiente del Gobierno de Egipto
Connected component identification and cluster update on GPU
Cluster identification tasks occur in a multitude of contexts in physics and
engineering such as, for instance, cluster algorithms for simulating spin
models, percolation simulations, segmentation problems in image processing, or
network analysis. While it has been shown that graphics processing units (GPUs)
can result in speedups of two to three orders of magnitude as compared to
serial codes on CPUs for the case of local and thus naturally parallelized
problems such as single-spin flip update simulations of spin models, the
situation is considerably more complicated for the non-local problem of cluster
or connected component identification. I discuss the suitability of different
approaches of parallelization of cluster labeling and cluster update algorithms
for calculations on GPU and compare to the performance of serial
implementations.Comment: 15 pages, 14 figures, one table, submitted to PR
Performance Analysis of a Novel GPU Computation-to-core Mapping Scheme for Robust Facet Image Modeling
Though the GPGPU concept is well-known
in image processing, much more work remains to be done
to fully exploit GPUs as an alternative computation
engine. This paper investigates the computation-to-core
mapping strategies to probe the efficiency and scalability
of the robust facet image modeling algorithm on GPUs.
Our fine-grained computation-to-core mapping scheme
shows a significant performance gain over the standard
pixel-wise mapping scheme. With in-depth performance
comparisons across the two different mapping schemes,
we analyze the impact of the level of parallelism on
the GPU computation and suggest two principles for
optimizing future image processing applications on the
GPU platform
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 1990s cannot enjoy an increased level of autonomy without the efficient implementation of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real-time demands are met for larger systems. Speedup via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial laboratories in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems is surveyed. The survey discusses multiprocessors for expert systems, parallel languages for symbolic computations, and mapping expert systems to multiprocessors. Results to date indicate that the parallelism achieved for these systems is small. The main reasons are (1) the body of knowledge applicable in any given situation and the amount of computation executed by each rule firing are small, (2) dividing the problem solving process into relatively independent partitions is difficult, and (3) implementation decisions that enable expert systems to be incrementally refined hamper compile-time optimization. In order to obtain greater speedups, data parallelism and application parallelism must be exploited
Parallel processing and expert systems
Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited
Instrumentation, performance visualization, and debugging tools for multiprocessors
The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs
A GPU-enabled solver for time-constrained linear sum assignment problems
This paper deals with solving large instances of the Linear Sum Assignment Problems (LSAPs) under realtime constraints, using Graphical Processing Units (GPUs). The motivating scenario is an industrial application for P2P live streaming that is moderated by a central tracker that is periodically solving LSAP instances to optimize the connectivity of thousands of peers. However, our findings are generic enough to be applied in other contexts. Our main contribution is a parallel version of a heuristic algorithm called Deep Greedy Switching (DGS) on GPUs using the CUDA programming language. DGS sacrifices absolute optimality in favor of a substantial speedup in comparison to classical LSAP solvers like the Hungarian and auctioning methods. We show the modifications needed to parallelize the DGS algorithm and the performance gains of our approach compared to a sequential CPU-based implementation of DGS and a mixed CPU/GPU-based implementation of it
GPU-based ultra-fast direct aperture optimization for online adaptive radiation therapy
Online adaptive radiation therapy (ART) has great promise to significantly
reduce normal tissue toxicity and/or improve tumor control through real-time
treatment adaptations based on the current patient anatomy. However, the major
technical obstacle for clinical realization of online ART, namely the inability
to achieve real-time efficiency in treatment re-planning, has yet to be solved.
To overcome this challenge, this paper presents our work on the implementation
of an intensity modulated radiation therapy (IMRT) direct aperture optimization
(DAO) algorithm on graphics processing unit (GPU) based on our previous work on
CPU. We formulate the DAO problem as a large-scale convex programming problem,
and use an exact method called column generation approach to deal with its
extremely large dimensionality on GPU. Five 9-field prostate and five 5-field
head-and-neck IMRT clinical cases with 5\times5 mm2 beamlet size and
2.5\times2.5\times2.5 mm3 voxel size were used to evaluate our algorithm on
GPU. It takes only 0.7~2.5 seconds for our implementation to generate optimal
treatment plans using 50 MLC apertures on an NVIDIA Tesla C1060 GPU card. Our
work has therefore solved a major problem in developing ultra-fast
(re-)planning technologies for online ART
- …