3,863 research outputs found
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
Ansor : Generating High-Performance Tensor Programs for Deep Learning
High-performance tensor programs are crucial to guarantee efficient execution
of deep neural networks. However, obtaining performant tensor programs for
different operators on various hardware platforms is notoriously challenging.
Currently, deep learning systems rely on vendor-provided kernel libraries or
various search strategies to get performant tensor programs. These approaches
either require significant engineering effort to develop platform-specific
optimization code or fall short of finding high-performance programs due to
restricted search space and ineffective exploration strategy.
We present Ansor, a tensor program generation framework for deep learning
applications. Compared with existing search strategies, Ansor explores many
more optimization combinations by sampling programs from a hierarchical
representation of the search space. Ansor then fine-tunes the sampled programs
with evolutionary search and a learned cost model to identify the best
programs. Ansor can find high-performance programs that are outside the search
space of existing state-of-the-art approaches. In addition, Ansor utilizes a
task scheduler to simultaneously optimize multiple subgraphs in deep neural
networks. We show that Ansor improves the execution performance of deep neural
networks relative to the state-of-the-art on the Intel CPU, ARM CPU, and NVIDIA
GPU by up to , , and , respectively.Comment: Published in OSDI 202
Self-adjustable domain adaptation in personalized ECG monitoring integrated with IR-UWB radar
To enhance electrocardiogram (ECG) monitoring systems in personalized detections, deep neural networks (DNNs) are applied to overcome individual differences by periodical retraining. As introduced previously [4], DNNs relieve individual differences by fusing ECG with impulse radio ultra-wide band (IR-UWB) radar. However, such DNN-based ECG monitoring system tends to overfit into personal small datasets and is difficult to generalize to newly collected unlabeled data. This paper proposes a self-adjustable domain adaptation (SADA) strategy to prevent from overfitting and exploit unlabeled data. Firstly, this paper enlarges the database of ECG and radar data with actual records acquired from 28 testers and expanded by the data augmentation. Secondly, to utilize unlabeled data, SADA combines self organizing maps with the transfer learning in predicting labels. Thirdly, SADA integrates the one-class classification with domain adaptation algorithms to reduce overfitting. Based on our enlarged database and standard databases, a large dataset of 73200 records and a small one of 1849 records are built up to verify our proposal. Results show SADA\u27s effectiveness in predicting labels and increments in the sensitivity of DNNs by 14.4% compared with existing domain adaptation algorithms
- …