Search CORE

225 research outputs found

dOpenCL: Towards a Uniform Programming Approach for Distributed Heterogeneous Multi-/Many-Core Systems

Author: Gorlatch Sergei
Kegel Philipp
Steuwer Michel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2012
Field of study

Modern computer systems are becoming increasingly heterogeneous by comprising multi-core CPUs, GPUs, and other accelerators. Current programming approaches for such systems usually require the application developer to use a combination of several programming models (e. g., MPI with OpenCL or CUDA) in order to exploit the full compute capability of a system. In this paper, we present dOpenCL (Distributed OpenCL) – a uniform approach to programming distributed heterogeneous systems with accelerators. dOpenCL extends the OpenCL standard, such that arbitrary computing devices installed on any node of a distributed system can be used together within a single application. dOpenCL allows moving data and program code to these devices in a transparent, portable manner. Since dOpenCL is designed as a fully-fledged implementation of the OpenCL API, it allows running existing OpenCL applications in a heterogeneous distributed environment without any modifications. We describe in detail the mechanisms that are required to implement OpenCL for distributed systems, including a device management mechanism for running multiple applications concurrently. Using three application studies, we compare the performance of dOpenCL with MPI+OpenCL and a standard OpenCL implementation

Crossref

Enlighten

BALLView : a molecular viewer and modeling tool

Author: Moll Andreas
Publication venue: Fakultät 6 - Naturwissenschaftlich-Technische Fakultät I. Fachrichtung 6.2 - Informatik
Publication date: 01/01/2007
Field of study

Over the last ten years, many molecular modeling software were developed, but most of them offer only limited capabilities or are rather difficult to use. This motivated us to create our own molecular viewer and modeling tool BALLView, based on our biochemical algorithms library BALL. Through its flexible and intuitive interface, BALLView provides a wide range of features in fields of electrostatic potentials, molecular mechanics, and molecular editing. In addition, BALLView is also a powerful molecular viewer with state-of-the-art graphics: it provides a variety of different models for biomolecular visualization, e.g. ball-and-stick models, molecular surfaces, or ribbon models. Since BALLView features a very intuitive graphical user interface, even inexperienced users have direct access to the full functionality. This makes BALLView particularly useful for teaching. For more advanced users, BALLView is extensible in different ways. First, extension on the level of C++ code is very convenient, since the the underlying code was designed as a modular development framework. Second, an interface to the scripting language Python allows the interactive rapid prototyping of new methods. BALLView is portable and runs on all major platforms (Windows, MacOS X, Linux, most Unix flavors). It is available free of charge under the GNU Public License (GPL) from our website (www.ballview.org).Im Laufe der letzten zehn Jahre wurden viele verschiedene Molecular Modeling Programme geschrieben, aber die meisten bieten nur eingeschränkte Funktionalität, oder sind sehr unintuiv zu bedienen. Dies impliziert, dass viele Forscher Probleme mit diesen Programmen haben und benutzerfreundlichere Software vorziehen würden. Dies inspirierte uns dazu,mit BALLView ein neuartiges Modellierungsprogramm zu entwickeln, basierend auf unserer biochemischen Algorithmenbibliothek BALL. Durch seine flexible Oberfläche bietet BALLView eine reiche Palette an Funktionen in den Bereichen Elektrostatik, Molekularmechanik und dem Edititieren von Molekülen an. Darüberhinaus ist BALLView auch ein leistungsfähiges Programm zur Visualisierung von Molekülen, das über Grafikfähigkeiten verfügt, die dem neuesten Stand der Technik entsprechen. BALLView unterstützt neben allen Standard-Molekülmodellen wie bspw. Stick, Cartoon, Ribbon und Oberflächen auch die Visualisierung von elektrostatischen Feldern. Alle aufgeführten Funktionen können auch von unerfahrenen Benutzern verwendet werden, da BALLView eine sehr intuitive Benutzeroberfläche besitzt. Dadurch ist es hervorragend geeignet zum Einsatz in der Lehre. Für fortgeschrittene Benutzer ist BALLView erweiterbar auf zwei unterschiedlichen Wegen: Durch das Design der zugrundeliegenden Klassenhierarchie sind Erweiterungen auf der Ebene des C++ Programmcodes sehr einfach zu realisieren. Desweiteren bietet BALLView ein Interface zur Skriptsprache Python, die interaktives Rapid-Prototyping von neuen Funktionen erlaubt. BALLView ist portierbar und kann auf allen verbreiteten Plattformen (Windows, MacOS X, Linux, die meisten Unix-Derivate) verwendet werden. Es ist frei verfügbar unter der LGPL Lizenz und kann von unserer Webseite heruntergeladen werden (www.ballview.org)

Universaar

Acronym

Parallel For Loops on Heterogeneous Resources

Author: Weber Frederick Edward
Publication venue: TRACE: Tennessee Research and Creative Exchange
Publication date: 01/12/2012
Field of study

In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously

University of Tennessee, Knoxville: Trace

Evaluating the graphics processing unit for digital audio synthesis and the development of HyperModels

Author: Renney Harri
Publication venue
Publication date
Field of study

The extraordinary growth in computation in single processors for almost half a century is becoming increasingly difficult to maintain. Future computational growth is expected from parallel processors, as seen in the increasing number of tightly coupled processors inside the conventional modern heterogeneous system. The graphics processing unit (GPU) is a massively parallel processing unit that can be used to accelerate particular digital audio processes; However, digital audio developers are cautious of adopting the GPU into their designs to avoid any complications the GPU architecture may have. For example, linear systems simulated using finite-difference-based physical model synthesis is highly suited for the GPU, but developers will be reluctant to use it without a complete evaluation of the GPU for digital audio. Previously limited by computation, the audio landscape could see future advancement by providing a comprehensive evaluation of the GPU in digital audio and developing a framework for accelerating particular audio processes. This thesis is separated into two parts; Part one evaluates the utility of the GPU as a hardware accelerator for digital audio processing using bespoke performance benchmarking suites. The results suggest that the GPU is appropriate under particular conditions; For example, the sample buffer size dispatched to the GPU must be within 32 to 512 to meet real-time digital audio requirements. However, despite some constraints, the GPU could support linear finite-difference-based physical models with 4X higher resolution than the equivalent CPU version. These results suggest that the GPU is superior to the CPU for high-resolution physical models. Therefore, the second part of this thesis presents the design of the novel HyperModels framework to facilitate the development of real-time linear physical models for interaction and performance. HyperModels uses vector graphics to describe a model's geometry and a domain-specific language (DSL) to define the physics equations that operate in the physical model. An implementation of the HyperModels framework is then objectively evaluated by comparing the performance with manually written CPU and GPU equivalent versions. The automatically generated GPU programs from HyperModels were shown to outperform the CPU versions for resolutions 64x64 and above whilst maintaining similar performance to the manually written GPU versions. To conclude part 2, the expressibility and usability of HyperModels is demonstrated by presenting two instruments built using the framewor

UWE Bristol Research Repository

Real-time fluid simulations under smoothed particle hydrodynamics for coupled kinematic modelling in robotic applications

Author: Camporredondo Díaz Gabriel
Publication venue: 'IATED Academy'
Publication date: 25/02/2021
Field of study

Although solids and fluids can be conceived as continuum media, applications of solid and fluid dynamics differ greatly from each other in their theoretical models and their physical behavior. That is why the computer simulators of each turn to be very disparate and case-oriented. The aim of this research work, captured in this thesis book, is to find a fluid dynamics model that can be implemented in near real-time with GPU processing and that can be adapted to typically large scales found in robotic devices in action with fluid media. More specifically, the objective is to develop these fast fluid simulations, comprising different solid body dynamics, to find a viable time kinematic solution for robotics. The tested cases are: i) the case of a fluid in a closed channel flowing across a cylinder, ii) the case of a fluid flowing across a controlled profile, and iii), the case of a free surface fluid control during pouring. The implementation of the former cases settles the formulations and constraints to the latter applications. The results will allow the reader not only to sustain the implemented models but also to break down the software implementation concepts for better comprehension. A fast GPU-based fluid dynamics simulation is detailed in the main implementation. The results show that it can be used in real-time to allow robotics to perform a blind pouring task with a conventional controller and no special sensing systems nor knowledge-driven prediction models would be necessary.Aunque los sólidos y los fluidos pueden concebirse como medios continuos, las aplicaciones de la dinámica de sólidos y fluidos difieren mucho entre sí en sus modelos teóricos y su comportamiento físico. Es por eso que los simuladores por computadora de cada uno son muy dispares y están orientados al caso de su aplicación. El objetivo de este trabajo de investigación, capturado en este libro de tesis, es encontrar un modelo de dinámica de fluidos que se pueda implementar cercano al tiempo real con procesamiento GPU y que se pueda adaptar a escalas típicamente grandes que se encuentran en dispositivos robóticos en acción con medios fluidos. Específicamente, el propósito es desarrollar estas simulaciones de fluidos rápidos, que comprenden diferentes dinámicas de cuerpos sólidos, para encontrar una solución cinemática viable para robótica. Los casos probados son: i) el caso de un fluido en canal cerrado que fluye a través de un cilindro, ii) el caso de un fluido que fluye a través de un alabe controlado, y iii), el caso del control de un fluido de superficie libre durante el vertido. La implementación de estos primeros casos establece las formulaciones y limitaciones de aplicaciones futuras. Los resultados permitirán al lector no solo sostener los modelos implementados sino también desglosar los conceptos de la implementación en software para una mejor comprensión. En la implementación principal se consigue una simulación rápida de dinámica de fluidos basada en GPU. Los resultados muestran que esta implementación se puede utilizar en tiempo real para permitir que la robótica realice una tarea de vertido ciego con un controlador convencional sin que sea necesario algún sistema de sensado especial ni algún modelo predictivo basados en el conocimiento.Programa de Doctorado en Ingeniería Eléctrica, Electrónica y Automática por la Universidad Carlos III de MadridPresidente: Carmen Martínez Arévalo.- Secretario: Luis Santiago Garrido Bullón.- Vocal: Benjamín Hernández Arreguí

Universidad Carlos III de Madrid e-Archivo

MR-CUDASW - GPU accelerated Smith-Waterman algorithm for medium-length (meta)genomic data

Author: Muhammadzadeh Amir
Publication venue: 'University of Saskatchewan Library'
Publication date
Field of study

The idea of using a graphics processing unit (GPU) for more than simply graphic output purposes has been around for quite some time in scientific communities. However, it is only recently that its benefits for a range of bioinformatics and life sciences compute-intensive tasks has been recognized. This thesis investigates the possibility of improving the performance of the overlap determination stage of an Overlap Layout Consensus (OLC)-based assembler by using a GPU-based implementation of the Smith-Waterman algorithm. In this thesis an existing GPU-accelerated sequence alignment algorithm is adapted and expanded to reduce its completion time. A number of improvements and changes are made to the original software. Workload distribution, query profile construction, and thread scheduling techniques implemented by the original program are replaced by custom methods specifically designed to handle medium-length reads. Accordingly, this algorithm is the first highly parallel solution that has been specifically optimized to process medium-length nucleotide reads (DNA/RNA) from modern sequencing machines (i.e. Ion Torrent). Results show that the software reaches up to 82 GCUPS (Giga Cell Updates Per Second) on a single-GPU graphic card running on a commodity desktop hardware. As a result it is the fastest GPU-based implemen- tation of the Smith-Waterman algorithm tailored for processing medium-length nucleotide reads. Despite being designed for performing the Smith-Waterman algorithm on medium-length nucleotide sequences, this program also presents great potential for improving heterogeneous computing with CUDA-enabled GPUs in general and is expected to make contributions to other research problems that require sensitive pairwise alignment to be applied to a large number of reads. Our results show that it is possible to improve the performance of bioinformatics algorithms by taking full advantage of the compute resources of the underlying commodity hardware and further, these results are especially encouraging since GPU performance grows faster than multi-core CPUs

eCommons@USASK

University of Saskatchewan Research Archive

Recommended from our members

Technological framework for ubiquitous interactions using context–aware mobile devices

Author: Papakonstantinou Stylianos
Publication venue
Publication date
Field of study

This report presents research and development of dedicated system architecture, designed to enable its users to interact with each other as well as to access information on Points of Interest that exist in their immediate environment. This is accomplished through managing personal preferences and contextual information in a distributed manner and in real-time. The advantage of this system architecture is that it uses mobile devices, heterogeneous sensors and a selection of user interface paradigms to produce a sociotechnical framework to enhance the perception of the environment and promote intuitive interactions. The thrust of the work has been on software development and component integration. Iterative prototyping was adopted as a development method in order to effectively implement the users’ feedback and establish a platform for collaboration that closely meets the requirements and aids their decision-making process. The requirement acquisition was followed by the system-modelling phase in order to produce a robust software prototype. The implementation includes component-based development and extensive use of design patterns over native programming. Conclusively, the software product has become the means to evaluate differences in the use of mixed reality technologies in a ubiquitous scenario. The prototype can query a number of context sources such as sensors, or details of the personal profile, to acquire relevant data. The data (and metadata) is stored in opensource structures, so that they are accessible at every layer of the system architecture and at any time. By proactively processing the acquired context, the system can assist the users in their tasks (e.g. navigation) without explicit input – e.g. by simply creating a gesture with the device. However, advanced interaction with the application via the user interface is available for requests that are more complex. Representations of the real world objects, their spatial relations and other captured features of interest are visualised on scalable interfaces, ranging from 2D to 3D models and from photorealism to stylised clues and symbols. Two principal modes of operation have been implemented; one, using geo-referenced virtual reality models of the environment, updated in real time, and second, using the overlay of descriptive annotations and graphics on the video images of the surroundings, captured by a video camera. The latter is referred to as augmented reality. The continuous feed of the device position and orientation data, from the GPS receiver and the digital compass, into the application, makes the framework fit for use in unknown environments and therefore suitable for ubiquitous operation. This is one of the novelties of the proposed framework, because it enables a whole range of social, peer-to-peer interactions to take place. The scenarios of how the system could be employed to pursue these remote interactions and collaborative efforts on mobile devices are addressed in the context of urban navigation. The conceptual design and implementation of the novel location and orientation based algorithm for mobile AR are presented in detail. The system is, however, multifaceted and capable of supporting peer-to-peer exchange of information in a pervasive fashion, usable in various contexts. The modalities of these interactions are explored and laid out in several scenarios, but particularly in the context of user adoption. Two evaluation tasks took place. The preliminary evaluation examined certain aspects that influence user interaction while being immersed in a virtual environment, whereas the second summative evaluation compared the utility and certain usability aspects of the AR and VR interfaces

City Research Online

Illustrative volume rendering on consumer graphics hardware

Author: van Pelt R.F.P.
Publication venue
Publication date: 31/01/2008
Field of study

Pure OAI Repository

Scheduling and Tuning Kernels for High-performance on Heterogeneous Processor Systems

Author: Fang Ye
Publication venue: LSU Digital Commons
Publication date: 01/01/2016
Field of study

Accelerated parallel computing techniques using devices such as GPUs and Xeon Phis (along with CPUs) have proposed promising solutions of extending the cutting edge of high-performance computer systems. A significant performance improvement can be achieved when suitable workloads are handled by the accelerator. Traditional CPUs can handle those workloads not well suited for accelerators. Combination of multiple types of processors in a single computer system is referred to as a heterogeneous system. This dissertation addresses tuning and scheduling issues in heterogeneous systems. The first section presents work on tuning scientific workloads on three different types of processors: multi-core CPU, Xeon Phi massively parallel processor, and NVIDIA GPU; common tuning methods and platform-specific tuning techniques are presented. Then, analysis is done to demonstrate the performance characteristics of the heterogeneous system on different input data. This section of the dissertation is part of the GeauxDock project, which prototyped a few state-of-art bioinformatics algorithms, and delivered a fast molecular docking program. The second section of this work studies the performance model of the GeauxDock computing kernel. Specifically, the work presents an extraction of features from the input data set and the target systems, and then uses various regression models to calculate the perspective computation time. This helps understand why a certain processor is faster for certain sets of tasks. It also provides the essential information for scheduling on heterogeneous systems. In addition, this dissertation investigates a high-level task scheduling framework for heterogeneous processor systems in which, the pros and cons of using different heterogeneous processors can complement each other. Thus a higher performance can be achieve on heterogeneous computing systems. A new scheduling algorithm with four innovations is presented: Ranked Opportunistic Balancing (ROB), Multi-subject Ranking (MR), Multi-subject Relative Ranking (MRR), and Automatic Small Tasks Rearranging (ASTR). The new algorithm consistently outperforms previously proposed algorithms with better scheduling results, lower computational complexity, and more consistent results over a range of performance prediction errors. Finally, this work extends the heterogeneous task scheduling algorithm to handle power capping feature. It demonstrates that a power-aware scheduler significantly improves the power efficiencies and saves the energy consumption. This suggests that, in addition to performance benefits, heterogeneous systems may have certain advantages on overall power efficiency

Louisiana State University