286,533 research outputs found
Analysis and Design of Tuned Turbo Codes
It has been widely observed that there exists a fundamental trade-off between
the minimum (Hamming) distance properties and the iterative decoding
convergence behavior of turbo-like codes. While capacity achieving code
ensembles typically are asymptotically bad in the sense that their minimum
distance does not grow linearly with block length, and they therefore exhibit
an error floor at moderate-to-high signal to noise ratios, asymptotically good
codes usually converge further away from channel capacity. In this paper, we
introduce the concept of tuned turbo codes, a family of asymptotically good
hybrid concatenated code ensembles, where asymptotic minimum distance growth
rates, convergence thresholds, and code rates can be traded-off using two
tuning parameters, {\lambda} and {\mu}. By decreasing {\lambda}, the asymptotic
minimum distance growth rate is reduced in exchange for improved iterative
decoding convergence behavior, while increasing {\lambda} raises the asymptotic
minimum distance growth rate at the expense of worse convergence behavior, and
thus the code performance can be tuned to fit the desired application. By
decreasing {\mu}, a similar tuning behavior can be achieved for higher rate
code ensembles.Comment: Accepted for publication in IEEE Transactions on Information Theor
Parameter-Efficient Finetuning of Transformers for Source Code
Pretrained Transformers achieve state-of-the-art performance in various
code-processing tasks but may be too large to be deployed. As software
development tools often incorporate modules for various purposes which may
potentially use a single instance of the pretrained model, it appears relevant
to utilize parameter-efficient fine-tuning for the pretrained models of code.
In this work, we test two widely used approaches, adapters and LoRA, which were
initially tested on NLP tasks, on four code-processing tasks. We find that
though the efficient fine-tuning approaches may achieve comparable or higher
performance than the standard, full, fine-tuning in code understanding tasks,
they underperform full fine-tuning in code-generative tasks. These results
underline the importance of testing efficient fine-tuning approaches on other
domains than NLP and motivate future research in efficient fine-tuning for
source code
Non regression testing for the JOREK code
Non Regression Testing (NRT) aims to check if software modifications result
in undesired behaviour. Suppose the behaviour of the application previously
known, this kind of test makes it possible to identify an eventual regression,
a bug. Improving and tuning a parallel code can be a time-consuming and
difficult task, especially whenever people from different scientific fields
interact closely. The JOREK code aims at investing Magnetohydrodynamic (MHD)
instabilities in a Tokamak plasma. This paper describes the NRT procedure that
has been tuned for this simulation code. Automation of the NRT is one keypoint
to keeping the code healthy in a source code repository.Comment: No. RR-8134 (2012
Tuning and optimization for a variety of many-core architectures without changing a single line of implementation code using the Alpaka library
We present an analysis on optimizing performance of a single C++11 source
code using the Alpaka hardware abstraction library. For this we use the general
matrix multiplication (GEMM) algorithm in order to show that compilers can
optimize Alpaka code effectively when tuning key parameters of the algorithm.
We do not intend to rival existing, highly optimized DGEMM versions, but merely
choose this example to prove that Alpaka allows for platform-specific tuning
with a single source code. In addition we analyze the optimization potential
available with vendor-specific compilers when confronted with the heavily
templated abstractions of Alpaka. We specifically test the code for bleeding
edge architectures such as Nvidia's Tesla P100, Intel's Knights Landing (KNL)
and Haswell architecture as well as IBM's Power8 system. On some of these we
are able to reach almost 50\% of the peak floating point operation performance
using the aforementioned means. When adding compiler-specific #pragmas we are
able to reach 5 TFLOPS/s on a P100 and over 1 TFLOPS/s on a KNL system.Comment: Accepted paper for the P\^{}3MA workshop at the ISC 2017 in Frankfur
Multimode synthesis procedure for microwave filters based on thick inductive windows
For several types of microwave filters for space application it is important to manufacture hardware without tuning elements. For this to be possible, one needs a systematic procedure to codvert ideal elements, such as resonators and impedance inverters, into actual waveguide lengths and discontinuities. The situation is further complicated by the fact that waveguide discontinuities excite higher order modes that interacting with each other can have very strong effects. In this paper we first outline the theory behind a very efficient computer code for the simulation of microwave filters based on thick inductive windows. Then we describe in detail a step-by-step procedure that, based on the code developed, allows for the rapid design of this class of microwave filters without any tuning elements. Two actual examples of design are also discussed and comparisons presented between measurements and simulations
Practical Implementation of Lattice QCD Simulation on Intel Xeon Phi Knights Landing
We investigate implementation of lattice Quantum Chromodynamics (QCD) code on
the Intel Xeon Phi Knights Landing (KNL). The most time consuming part of the
numerical simulations of lattice QCD is a solver of linear equation for a large
sparse matrix that represents the strong interaction among quarks. To establish
widely applicable prescriptions, we examine rather general methods for the SIMD
architecture of KNL, such as using intrinsics and manual prefetching, to the
matrix multiplication and iterative solver algorithms. Based on the performance
measured on the Oakforest-PACS system, we discuss the performance tuning on KNL
as well as the code design for facilitating such tuning on SIMD architecture
and massively parallel machines.Comment: 8 pages, 12 figures. Talk given at LHAM'17 "5th International
Workshop on Legacy HPC Application Migration" in CANDAR'17 "The Fifth
International Symposium on Computing and Networking" and to appear in the
proceeding
Rapidly reconfigurable optical phase encoder-decoders based on fiber Bragg gratings
We demonstrate the capacity for fast dynamic reconfiguration of optical code-division multiple access (OCDMA) phase en/decoders based on fiber Bragg gratings and a thermal phase-tuning technique. The tuning time between two different phase codes is measured to be less than 2 s. An OCDMA system using tunable-phase decoders is compared with a system using fixed-phase decoders and, although the system using fixed-phase decoders exhibits a shorter output autocorrelation pulsewidth and lower sidelobes, the system using tunable-phase decoders has advantages of flexibility and a more relaxed requirement on the input pulsewidth
Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations
We propose Paraiso, a domain specific language embedded in functional
programming language Haskell, for automated tuning of explicit solvers of
partial differential equations (PDEs) on GPUs as well as multicore CPUs. In
Paraiso, one can describe PDE solving algorithms succinctly using tensor
equations notation. Hydrodynamic properties, interpolation methods and other
building blocks are described in abstract, modular, re-usable and combinable
forms, which lets us generate versatile solvers from little set of Paraiso
source codes.
We demonstrate Paraiso by implementing a compressive hydrodynamics solver. A
single source code less than 500 lines can be used to generate solvers of
arbitrary dimensions, for both multicore CPUs and GPUs. We demonstrate both
manual annotation based tuning and evolutionary computing based automated
tuning of the program.Comment: 52 pages, 14 figures, accepted for publications in Computational
Science and Discover
Benchmarking and tuning the MILC code on clusters and supercomputers
Recently, we have benchmarked and tuned the MILC code on a number of
architectures including Intel Itanium and Pentium IV (PIV), dual-CPU Athlon,
and the latest Compaq Alpha nodes. Results will be presented for many of these,
and we shall discuss some simple code changes that can result in a very
dramatic speedup of the KS conjugate gradient on processors with more advanced
memory systems such as PIV, IBM SP and Alpha.Comment: Lattice2001(algorithms) 4 pages, includes hep-lat references not in
published versio
- …