535 research outputs found
Computing Platforms for Big Biological Data Analytics: Perspectives and Challenges.
The last decade has witnessed an explosion in the amount of available biological sequence data, due to the rapid progress of high-throughput sequencing projects. However, the biological data amount is becoming so great that traditional data analysis platforms and methods can no longer meet the need to rapidly perform data analysis tasks in life sciences. As a result, both biologists and computer scientists are facing the challenge of gaining a profound insight into the deepest biological functions from big biological data. This in turn requires massive computational resources. Therefore, high performance computing (HPC) platforms are highly needed as well as efficient and scalable algorithms that can take advantage of these platforms. In this paper, we survey the state-of-the-art HPC platforms for big biological data analytics. We first list the characteristics of big biological data and popular computing platforms. Then we provide a taxonomy of different biological data analysis applications and a survey of the way they have been mapped onto various computing platforms. After that, we present a case study to compare the efficiency of different computing platforms for handling the classical biological sequence alignment problem. At last we discuss the open issues in big biological data analytics
Portability and performance in heterogeneous many core Systems
Dissertação de mestrado em InformáticaCurrent computing systems have a multiplicity of computational resources with different
architectures, such as multi-core CPUs and GPUs. These platforms are known as
heterogeneous many-core systems (HMS) and as computational resources evolve they
are o ering more parallelism, as well as becoming more heterogeneous. Exploring these
devices requires the programmer to be aware of the multiplicity of associated architectures,
computing models and development framework. Portability issues, disjoint
memory address spaces, work distribution and irregular workload patterns are major
examples that need to be tackled in order to e ciently explore the computational
resources of an HMS.
This dissertation goal is to design and evaluate a base architecture that enables
the identi cation and preliminary evaluation of the potential bottlenecks and limitations
of a runtime system that addresses HMS. It proposes a runtime system that
eases the programmer burden of handling all the devices available in a heterogeneous
system. The runtime provides a programming and execution model with a uni ed
address space managed by a data management system. An API is proposed in order
to enable the programmer to express applications and data in an intuitive way. Four
di erent scheduling approaches are evaluated that combine di erent data partitioning
mechanisms with di erent work assignment policies and a performance model is used
to provide some performance insights to the scheduler.
The runtime e ciency was evaluated with three di erent applications - matrix multiplication,
image convolution and n-body Barnes-Hut simulation - running in multicore
CPUs and GPUs.
In terms of productivity the results look promising, however, combining scheduling
and data partitioning revealed some ine ciencies that compromise load balancing and
needs to be revised, as well as the data management system that plays a crucial role in
such systems. Performance model driven decisions were also evaluated which revealed
that the accuracy of a performance model is also a compromising component
A framework for efficient execution of data parallel irregular applications on heterogeneous systems
Exploiting the computing power of the diversity of resources available on heterogeneous
systems is mandatory but a very challenging task. The diversity of architectures, execution
models and programming tools, together with disjoint address spaces and di erent
computing capabilities, raise a number of challenges that severely impact on application
performance and programming productivity. This problem is further compounded in the
presence of data parallel irregular applications.
This paper presents a framework that addresses development and execution of data
parallel irregular applications in heterogeneous systems. A uni ed task-based programming
and execution model is proposed, together with inter and intra-device scheduling,
which, coupled with a data management system, aim to achieve performance scalability
across multiple devices, while maintaining high programming productivity. Intradevice
scheduling on wide SIMD/SIMT architectures resorts to consumer-producer kernels,
which, by allowing dynamic generation and rescheduling of new work units, enable
balancing irregular workloads and increase resource utilization.
Results show that regular and irregular applications scale well with the number of
devices, while requiring minimal programming e ort. Consumer-producer kernels are
able to sustain signi cant performance gains as long as the workload per basic work
unit is enough to compensate overheads associated with intra-device scheduling. This
not being the case, consumer kernels can still be used for the irregular application.
Comparisons with an alternative framework, StarPU, which targets regular workloads,
consistently demonstrate signi cant speedups. This is, to the best of our knowledge, the
rst published integrated approach that successfully handles irregular workloads over
heterogeneous systems.This work is funded by National Funds through the FCT - Fundação para a Ciência
e a Tecnologia (Portuguese Foundation for Science and Technology) and by ERDF -
European Regional Development Fund through the COMPETE Programme (operational
programme for competitiveness) within projects PEst-OE/EEI/UI0752/2014
and FCOMP-01-0124-FEDER-010067. Also by the School of Engineering, Universidade
do Minho within project P2SHOCS - Performance Portability on Scalable
Heterogeneous Computing Systems
Parallel For Loops on Heterogeneous Resources
In recent years, Graphics Processing Units (GPUs) have piqued the interest of researchers in scientific computing. Their immense floating point throughput and massive parallelism make them ideal for not just graphical applications, but many general algorithms as well. Load balancing applications and taking advantage of all computational resources in a machine is a difficult challenge, especially when the resources are heterogeneous. This dissertation presents the clUtil library, which vastly simplifies developing OpenCL applications for heterogeneous systems. The core focus of this dissertation lies in clUtil\u27s ParallelFor construct and our novel PINA scheduler which can efficiently load balance work onto multiple GPUs and CPUs simultaneously
High-Level GPU Programming: Domain-Specific Optimization and Inference
When writing computer software one is often forced to balance the need for high run-time performance with high programmer productivity. By using a high-level language it is often possible to cut development times, but this typically comes at the cost of reduced run-time performance. Using a lower-level language, programs can be made very efficient but at the cost of increased development time. Real-time computer graphics is an area where there are very high demands on both performance and visual quality. Typically, large portions of such applications are written in lower-level languages and also rely on dedicated hardware, in the form of programmable graphics processing units (GPUs), for handling computationally demanding rendering algorithms. These GPUs are parallel stream processors, specialized towards computer graphics, that have computational performance more than a magnitude higher than corresponding CPUs. This has revolutionized computer graphics and also led to GPUs being used to solve more general numerical problems, such as fluid and physics simulation, protein folding, image processing, and databases. Unfortunately, the highly specialized nature of GPUs has also made them difficult to program. In this dissertation we show that GPUs can be programmed at a higher level, while maintaining performance, compared to current lower-level languages. By constructing a domain-specific language (DSL), which provides appropriate domain-specific abstractions and user-annotations, it is possible to write programs in a more abstract and modular manner. Using knowledge of the domain it is possible for the DSL compiler to generate very efficient code. We show that, by experiment, the performance of our DSLs is equal to that of GPU programs written by hand using current low-level languages. Also, control over the trade-offs between visual quality and performance is retained. In the papers included in this dissertation, we present domain-specific languages targeted at numerical processing and computer graphics, respectively. These DSL have been implemented as embedded languages in Python, a dynamic programming language that provide a rich set of high-level features. In this dissertation we show how these features can be used to facilitate the construction of embedded languages
- …