225 research outputs found

    dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

    Get PDF
    Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e. g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we present dOpenCL (Distributed OpenCL) – a uniform approach to programming distributed heterogeneous systems with accelerators. dOpenCL extends the OpenCL standard, such that arbitrary computing devices installed on any node of a distributed system can be used together within a single application. dOpenCL allows moving data and program code to these devices in a transparent, portable manner. Since dOpenCL is designed as a fully-fledged implementation of the OpenCL API, it allows running existing OpenCL applications in a heterogeneous distributed environment without any modifications. We describe in detail the mechanisms that are required to implement OpenCL for distributed systems, including a device management mechanism for running multiple applications concurrently. Using three application studies, we compare the performance of dOpenCL with MPI+OpenCL and a standard OpenCL implementation

    BALLView : a molecular viewer and modeling tool

    Get PDF
    Over the last ten years, many molecular modeling software were developed, but most of them offer only limited capabilities or are rather difficult to use. This motivated us to create our own molecular viewer and modeling tool BALLView, based on our biochemical algorithms library BALL. Through its flexible and intuitive interface, BALLView provides a wide range of features in fields of electrostatic potentials, molecular mechanics, and molecular editing. In addition, BALLView is also a powerful molecular viewer with state-of-the-art graphics: it provides a variety of different models for biomolecular visualization, e.g. ball-and-stick models, molecular surfaces, or ribbon models. Since BALLView features a very intuitive graphical user interface, even inexperienced users have direct access to the full functionality. This makes BALLView particularly useful for teaching. For more advanced users, BALLView is extensible in different ways. First, extension on the level of C++ code is very convenient, since the the underlying code was designed as a modular development framework. Second, an interface to the scripting language Python allows the interactive rapid prototyping of new methods. BALLView is portable and runs on all major platforms (Windows, MacOS X, Linux, most Unix flavors). It is available free of charge under the GNU Public License (GPL) from our website (www.ballview.org).Im Laufe der letzten zehn Jahre wurden viele verschiedene Molecular Modeling Programme geschrieben, aber die meisten bieten nur eingeschränkte Funktionalität, oder sind sehr unintuiv zu bedienen. Dies impliziert, dass viele Forscher Probleme mit diesen Programmen haben und benutzerfreundlichere Software vorziehen würden. Dies inspirierte uns dazu,mit BALLView ein neuartiges Modellierungsprogramm zu entwickeln, basierend auf unserer biochemischen Algorithmenbibliothek BALL. Durch seine flexible Oberfläche bietet BALLView eine reiche Palette an Funktionen in den Bereichen Elektrostatik, Molekularmechanik und dem Edititieren von Molekülen an. Darüberhinaus ist BALLView auch ein leistungsfähiges Programm zur Visualisierung von Molekülen, das über Grafikfähigkeiten verfügt, die dem neuesten Stand der Technik entsprechen. BALLView unterstützt neben allen Standard-Molekülmodellen wie bspw. Stick, Cartoon, Ribbon und Oberflächen auch die Visualisierung von elektrostatischen Feldern. Alle aufgeführten Funktionen können auch von unerfahrenen Benutzern verwendet werden, da BALLView eine sehr intuitive Benutzeroberfläche besitzt. Dadurch ist es hervorragend geeignet zum Einsatz in der Lehre. Für fortgeschrittene Benutzer ist BALLView erweiterbar auf zwei unterschiedlichen Wegen: Durch das Design der zugrundeliegenden Klassenhierarchie sind Erweiterungen auf der Ebene des C++ Programmcodes sehr einfach zu realisieren. Desweiteren bietet BALLView ein Interface zur Skriptsprache Python, die interaktives Rapid-Prototyping von neuen Funktionen erlaubt. BALLView ist portierbar und kann auf allen verbreiteten Plattformen (Windows, MacOS X, Linux, die meisten Unix-Derivate) verwendet werden. Es ist frei verfügbar unter der LGPL Lizenz und kann von unserer Webseite heruntergeladen werden (www.ballview.org)

    Parallel For Loops on Heterogeneous Resources

    Get PDF
    In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously

    Evaluating the graphics processing unit for digital audio synthesis and the development of HyperModels

    Get PDF
    The extraordinary growth in computation in single processors for almost half a century is becoming increasingly difficult to maintain. Future computational growth is expected from parallel processors, as seen in the increasing number of tightly coupled processors inside the conventional modern heterogeneous system. The graphics processing unit (GPU) is a massively parallel processing unit that can be used to accelerate particular digital audio processes; However, digital audio developers are cautious of adopting the GPU into their designs to avoid any complications the GPU architecture may have. For example, linear systems simulated using finite-difference-based physical model synthesis is highly suited for the GPU, but developers will be reluctant to use it without a complete evaluation of the GPU for digital audio. Previously limited by computation, the audio landscape could see future advancement by providing a comprehensive evaluation of the GPU in digital audio and developing a framework for accelerating particular audio processes. This thesis is separated into two parts; Part one evaluates the utility of the GPU as a hardware accelerator for digital audio processing using bespoke performance benchmarking suites. The results suggest that the GPU is appropriate under particular conditions; For example, the sample buffer size dispatched to the GPU must be within 32 to 512 to meet real-time digital audio requirements. However, despite some constraints, the GPU could support linear finite-difference-based physical models with 4X higher resolution than the equivalent CPU version. These results suggest that the GPU is superior to the CPU for high-resolution physical models. Therefore, the second part of this thesis presents the design of the novel HyperModels framework to facilitate the development of real-time linear physical models for interaction and performance. HyperModels uses vector graphics to describe a model's geometry and a domain-specific language (DSL) to define the physics equations that operate in the physical model. An implementation of the HyperModels framework is then objectively evaluated by comparing the performance with manually written CPU and GPU equivalent versions. The automatically generated GPU programs from HyperModels were shown to outperform the CPU versions for resolutions 64x64 and above whilst maintaining similar performance to the manually written GPU versions. To conclude part 2, the expressibility and usability of HyperModels is demonstrated by presenting two instruments built using the framewor

    Real-time fluid simulations under smoothed particle hydrodynamics for coupled kinematic modelling in robotic applications

    Get PDF
    Although solids and fluids can be conceived as continuum media, applications of solid and fluid dynamics differ greatly from each other in their theoretical models and their physical behavior. That is why the computer simulators of each turn to be very disparate and case-oriented. The aim of this research work, captured in this thesis book, is to find a fluid dynamics model that can be implemented in near real-time with GPU processing and that can be adapted to typically large scales found in robotic devices in action with fluid media. More specifically, the objective is to develop these fast fluid simulations, comprising different solid body dynamics, to find a viable time kinematic solution for robotics. The tested cases are: i) the case of a fluid in a closed channel flowing across a cylinder, ii) the case of a fluid flowing across a controlled profile, and iii), the case of a free surface fluid control during pouring. The implementation of the former cases settles the formulations and constraints to the latter applications. The results will allow the reader not only to sustain the implemented models but also to break down the software implementation concepts for better comprehension. A fast GPU-based fluid dynamics simulation is detailed in the main implementation. The results show that it can be used in real-time to allow robotics to perform a blind pouring task with a conventional controller and no special sensing systems nor knowledge-driven prediction models would be necessary.Aunque los sólidos y los fluidos pueden concebirse como medios continuos, las aplicaciones de la dinámica de sólidos y fluidos difieren mucho entre sí en sus modelos teóricos y su comportamiento físico. Es por eso que los simuladores por computadora de cada uno son muy dispares y están orientados al caso de su aplicación. El objetivo de este trabajo de investigación, capturado en este libro de tesis, es encontrar un modelo de dinámica de fluidos que se pueda implementar cercano al tiempo real con procesamiento GPU y que se pueda adaptar a escalas típicamente grandes que se encuentran en dispositivos robóticos en acción con medios fluidos. Específicamente, el propósito es desarrollar estas simulaciones de fluidos rápidos, que comprenden diferentes dinámicas de cuerpos sólidos, para encontrar una solución cinemática viable para robótica. Los casos probados son: i) el caso de un fluido en canal cerrado que fluye a través de un cilindro, ii) el caso de un fluido que fluye a través de un alabe controlado, y iii), el caso del control de un fluido de superficie libre durante el vertido. La implementación de estos primeros casos establece las formulaciones y limitaciones de aplicaciones futuras. Los resultados permitirán al lector no solo sostener los modelos implementados sino también desglosar los conceptos de la implementación en software para una mejor comprensión. En la implementación principal se consigue una simulación rápida de dinámica de fluidos basada en GPU. Los resultados muestran que esta implementación se puede utilizar en tiempo real para permitir que la robótica realice una tarea de vertido ciego con un controlador convencional sin que sea necesario algún sistema de sensado especial ni algún modelo predictivo basados en el conocimiento.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Carmen Martínez Arévalo.- Secretario: Luis Santiago Garrido Bullón.- Vocal: Benjamín Hernández Arreguí

    MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data

    Get PDF
    The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs

    Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems

    Get PDF
    Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency
    corecore