475 research outputs found
MINIMUM TIME CONTROL OF PARALLELED BOOST CONVERTERS
Demand for electrification is booming in both, traditional and upcoming generations of technological advancements. One of the constituent blocks of these electrified systems is Power conversion. Power conversion systems are often constructed by paralleling multiple power converter blocks for high performance and reliability of overall system. An advanced control technique is developed with an aim to optimize system states of heterogeneous power converters within minimum time while maintaining feasible stress level on individual power converter blocks. Practical implementation of real-time controller and performance improvement strategies are addressed. Experimental results validating high performance control scheme, and sensitivity analysis of system states as measure of system robustness are also presented
ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code
Automatic code optimization is a complex process that typically involves the
application of multiple discrete algorithms that modify the program structure
irreversibly. However, the design of these algorithms is often monolithic, and
they require repetitive implementation to perform similar analyses due to the
lack of cooperation. To address this issue, modern optimization techniques,
such as equality saturation, allow for exhaustive term rewriting at various
levels of inputs, thereby simplifying compiler design.
In this paper, we propose equality saturation to optimize sequential codes
utilized in directive-based programming for GPUs. Our approach simultaneously
realizes less computation, less memory access, and high memory throughput. Our
fully-automated framework constructs single-assignment forms from inputs to be
entirely rewritten while keeping dependencies and extracts optimal cases.
Through practical benchmarks, we demonstrate a significant performance
improvement on several compilers. Furthermore, we highlight the advantages of
computational reordering and emphasize the significance of memory-access order
for modern GPUs
Parallel and Distributed Computing
The 14 chapters presented in this book cover a wide variety of representative works ranging from hardware design to application development. Particularly, the topics that are addressed are programmable and reconfigurable devices and systems, dependability of GPUs (General Purpose Units), network topologies, cache coherence protocols, resource allocation, scheduling algorithms, peertopeer networks, largescale network simulation, and parallel routines and algorithms. In this way, the articles included in this book constitute an excellent reference for engineers and researchers who have particular interests in each of these topics in parallel and distributed computing
Sensitivity Prewarping for Local Surrogate Modeling
In the continual effort to improve product quality and decrease operations
costs, computational modeling is increasingly being deployed to determine
feasibility of product designs or configurations. Surrogate modeling of these
computer experiments via local models, which induce sparsity by only
considering short range interactions, can tackle huge analyses of complicated
input-output relationships. However, narrowing focus to local scale means that
global trends must be re-learned over and over again. In this article, we
propose a framework for incorporating information from a global sensitivity
analysis into the surrogate model as an input rotation and rescaling
preprocessing step. We discuss the relationship between several sensitivity
analysis methods based on kernel regression before describing how they give
rise to a transformation of the input variables. Specifically, we perform an
input warping such that the "warped simulator" is equally sensitive to all
input directions, freeing local models to focus on local dynamics. Numerical
experiments on observational data and benchmark test functions, including a
high-dimensional computer simulator from the automotive industry, provide
empirical validation
Investigating Single Precision Floating General Matrix Multiply in Heterogeneous Hardware
The fundamental operation of matrix multiplication is ubiquitous across a myriad of disciplines. Yet, the identification of new optimizations for matrix multiplication remains relevant for emerging hardware architectures and heterogeneous systems. Frameworks such as OpenCL enable computation orchestration on existing systems, and its availability using the Intel High Level Synthesis compiler allows users to architect new designs for reconfigurable hardware using C/C++. Using the HARPv2 as a vehicle for exploration, we investigate the utility of several of the most notable matrix multiplication optimizations to better understand the performance portability of OpenCL and the implications for such optimizations on this and future heterogeneous architectures. Our results give targeted insights into the applicability of best practices that were for existing architectures when used on emerging heterogeneous systems
Deep Gaussian Process Emulation using Stochastic Imputation
We propose a novel deep Gaussian process (DGP) inference method for computer
model emulation using stochastic imputation. By stochastically imputing the
latent layers, the approach transforms the DGP into the linked GP, a
state-of-the-art surrogate model formed by linking a system of feed-forward
coupled GPs. This transformation renders a simple while efficient DGP training
procedure that only involves optimizations of conventional stationary GPs. In
addition, the analytically tractable mean and variance of the linked GP allows
one to implement predictions from DGP emulators in a fast and accurate manner.
We demonstrate the method in a series of synthetic examples and real-world
applications, and show that it is a competitive candidate for efficient DGP
surrogate modeling in comparison to the variational inference and the
fully-Bayesian approach. A package
implementing the method is also produced and available at
https://github.com/mingdeyu/DGP
- …