10 research outputs found

    From Physics Model to Results: An Optimizing Framework for Cross-Architecture Code Generation

    Full text link
    Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.Comment: 18 pages, 4 figures, accepted for publication in Scientific Programmin

    Real-time motion tracking using optical flow on multiple GPUs

    No full text
    Motion tracking algorithms are widely used in computer vision related research. However, the new video standards, especially those in high resolutions, cause that current implementations, even running on modern hardware, no longer meet the needs of real-time processing. To overcome this challenge several GPU (Graphics Processing Unit) computing approaches have recently been proposed. Although they present a great potential of a GPU platform, hardly any is able to process high definition video sequences efficiently. Thus, a need arose to develop a tool being able to address the outlined problem. In this paper we present software that implements optical flow motion tracking using the Lucas-Kanade algorithm. It is also integrated with the Harris corner detector and therefore the algorithm may perform sparse tracking, i.e. tracking of the meaningful pixels only. This allows to substantially lower the computational burden of the method. Moreover, both parts of the algorithm, i.e. corner selection and tracking, are implemented on GPU and, as a result, the software is immensely fast, allowing for real-time motion tracking on videos in Full HD or even 4K format. In order to deliver the highest performance, it also supports multiple GPU systems, where it scales up very well

    G-PAS 2.0 - an improved version of protein alignment tool with an efficient backtracking routine on multiple GPUs

    No full text
    Several highly efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units). G-PAS (GPU-based Pairwise Alignment Software) was one of them, however, with a couple of interesting features that made it unique. Nevertheless, in order to adapt it to a new computational architecture some changes had to be introduced. In this paper we present G-PAS 2.0 - a new version of the software for performing high-throughput alignment. Results show, that the new version is faster nearly by a fourth on the same hardware, reaching over 20 GCUPS (Giga Cell Updates Per Second)

    Parallel Architecture Benchmarking: From Embedded Computing to HPC, a FiPS Project Perspective

    No full text
    Conference of 12th IEEE International Conference on Embedded and Ubiquitous Computing, EUC 2014 ; Conference Date: 26 August 2014 Through 28 August 2014; Conference Code:109350International audienceWith the growing numbers of both parallel architectures and related programming models, the benchmarking tasks become very tricky since parallel programming requires architecture-dependent compilers and languages as well as high programming expertise. More than just comparing architectures with synthetic benchmarks, benchmarking is also more and more used to design specialized systems composed of heterogeneous computing resources to optimize the performance or performance/watt ratio (e.g. embedded systems designers build System-on-Chip (SoC) out of dedicated and well-chosen components). In the High-Performance-Computing (HPC) domain, systems are designed with symmetric and scalable computing nodes built to deliver the highest performance on a wide variety of applications. However, HPC is now facing cost and power consumption issues which motivate the design of heterogeneous systems. This is one of the rationales of the European FiPS project, which proposes to develop hardware architecture and software methodology easing the design of such systems. Thus, having a fair comparison between architectures while considering an application is of growing importance. Unfortunately, porting it on all available architectures using the related programming models is impossible. To tackle this challenge, we introduced a novel methodology to evaluate and to compare parallel architectures in order to ease the work of the programmer. Based on the usage of micro benchmarks, code profiling and characterization tools, this methodology introduces a semi-automatic prediction of sequential applications performances on a set of parallel architectures. In addition, performance estimation is correlated with the cost of other criteria such as power or portability effort. Introduced for targeting vision-based embedded applications, our methodology is currently being extended to target more complex applications from HPC world. This paper extends our work with new experiments and early results on a real HPC application of DNA sequencing

    G-DNA – a highly efficient multi-GPU/MPI tool for aligning nucleotide reads

    No full text
    DNA/RNA sequencing has recently become a primary way researchers generate biological data for further analysis. Assembling algorithms are an integral part of this process. However, some of them require pairwise alignment to be applied to a great deal of reads. Although several efficient alignment tools have been released over the past few years, including those taking advantage of GPUs (Graphics Processing Units), none of them directly targets high-throughput sequencing data. As a result, a need arose to create software that could handle such data as effectively as possible. G-DNA (GPU-based DNA aligner) is the first highly parallel solution that has been optimized to process nucleotide reads (DNA/RNA) from modern sequencing machines. Results show that the software reaches up to 89 GCUPS (Giga Cell Updates Per Second) on a single GPU and as a result it is the fastest tool in its class. Moreover, it scales up well on multiple GPUs systems, including MPI-based computational clusters, where its performance is counted in TCUPS (Tera CUPS)

    The influence of cognitive reserve on cognition in Parkinson's disease

    No full text
    <p>There are considerable differences in cognition between individuals with Parkinson's Disease (PD) which might be explained by the theory of cognitive reserve. This theory states that premorbid factors, such as high intellectual capacities, provide a buffer against cognitive impairments. This study determines whether cognitive reserve influences cognition in PD. Forty-eight PD patients were included. All were assessed with two proxies of cognitive reserve, tests of cognition and both measures of disease characteristic and symptoms of depression. After accounting for age, gender, disease characteristics and depression, cognitive reserve was an independent predictor of cognitive performance. In conclusion, cognitive reserve influences cognition in PD, i.e. PD patients with a high premorbid intellectual ability show fewer cognitive impairments than patients with a low premorbid ability. This indicates that cognitive reserve needs to be taken into account when monitoring the evolution of cognition in PD, however, verification of results on a larger patient sample would be desirable.</p>

    M2DC-A novel heterogeneous hyperscale microserver platform

    No full text
    The Modular Microserver Datacentre (M2DC) project targets the development of a new class of energy-efficient TCO-optimized appliances with built-in efficiency and dependability enhancements. The appliances will be easy to integrate with a broad ecosystem of management software and fully software defined to enable optimization for a variety of future demanding applications in a cost-effective way. The highly flexible M2DC server platform will enable customization and smooth adaptation to various types of applications, while advanced management strategies and system efficiency enhancements (SEE) will be used to improve energy efficiency, performance, security, and reliability. Data center capable abstraction of the underlying heterogeneity of the server is provided by an OpenStack-based middleware. In this chapter, we focus in particular on the architecture of the server platform including a dedicated high-speed, low latency communication infrastructure, give a short introduction into the software stack including thermal management strategies, and provide an overview of the targeted applications

    Large bone distractor for open reconstruction of articular fractures of the calcaneus

    No full text
    The results of operative treatment of two groups of patients with articular fractures of the calcaneus were evaluated. Twenty-three cases were treated surgically using a standard reconstruction procedure. In the second group of 19 patients a large bone distractor was used; it held the soft tissue flap retracted, while aiding in articular and tuberosity fragment reduction and increasing visualisation by distraction of the posterior talocalcaneal joint. After a year, the anatomical and functional results, together with the operative time, were evaluated. All fractures healed with good or very good anatomical results. All cases, except those with complications (n = 3), achieved good (n = 28) or very good (n = 11) functional scoring. The distractor group had significantly shorter operative times, and less manpower was needed during surgery. We conclude that the large bone distractor is a useful tool in open reconstruction of articular calcaneal fractures
    corecore