314 research outputs found
Fake Run-Time Selection of Template Arguments in C++
C++ does not support run-time resolution of template type arguments. To
circumvent this restriction, we can instantiate a template for all possible
combinations of type arguments at compile time and then select the proper
instance at run time by evaluation of some provided conditions. However, for
templates with multiple type parameters such a solution may easily result in a
branching code bloat. We present a template metaprogramming algorithm called
for_id that allows the user to select the proper template instance at run time
with theoretical minimum sustained complexity of the branching code.Comment: Objects, Models, Components, Patterns (50th International Conference,
TOOLS 2012
Automating embedded analysis capabilities and managing software complexity in multiphysics simulation part I: template-based generic programming
An approach for incorporating embedded simulation and analysis capabilities
in complex simulation codes through template-based generic programming is
presented. This approach relies on templating and operator overloading within
the C++ language to transform a given calculation into one that can compute a
variety of additional quantities that are necessary for many state-of-the-art
simulation and analysis algorithms. An approach for incorporating these ideas
into complex simulation codes through general graph-based assembly is also
presented. These ideas have been implemented within a set of packages in the
Trilinos framework and are demonstrated on a simple problem from chemical
engineering
Static Computation and Reflection
Thesis (PhD) - Indiana University, Computer Sciences, 2008Most programming languages do not allow programs to inspect their
static type information or perform computations on it. C++, however,
lets programmers write template metaprograms, which enable programs to
encode static information, perform compile-time computations,
and make static decisions about run-time behavior. Many C++ libraries
and applications use template metaprogramming to build specialized
abstraction mechanisms, implement domain-specific safety checks, and
improve run-time performance.
Template metaprogramming is an emergent capability of the C++ type
system, and the C++ language specification is informal and imprecise.
As a result, template metaprogramming often involves heroic
programming feats and often leads to code that is difficult to read and
maintain. Furthermore, many template-based code generation and
optimization techniques rely on particular compiler implementations,
rather than language semantics, for performance gains.
Motivated by the capabilities and techniques of C++ template
metaprogramming, this thesis documents some common programming patterns,
including static computation, type analysis, generative programming, and the
encoding of domain-specific static checks. It also documents notable
shortcomings to current practice, including limited support for reflection,
semantic ambiguity, and other issues that arise from the pioneering nature of
template metaprogramming. Finally, this thesis presents the design of a
foundational programming language, motivated by the analysis of template
metaprogramming, that allows programs to statically inspect type information,
perform computations, and generate code. The language is specified as a core
calculus and its capabilities are presented in an idealized setting
PyCUDA and PyOpenCL: A Scripting-Based Approach to GPU Run-Time Code Generation
High-performance computing has recently seen a surge of interest in
heterogeneous systems, with an emphasis on modern Graphics Processing Units
(GPUs). These devices offer tremendous potential for performance and efficiency
in important large-scale applications of computational science. However,
exploiting this potential can be challenging, as one must adapt to the
specialized and rapidly evolving computing environment currently exhibited by
GPUs. One way of addressing this challenge is to embrace better techniques and
develop tools tailored to their needs. This article presents one simple
technique, GPU run-time code generation (RTCG), along with PyCUDA and PyOpenCL,
two open-source toolkits that support this technique.
In introducing PyCUDA and PyOpenCL, this article proposes the combination of
a dynamic, high-level scripting language with the massive performance of a GPU
as a compelling two-tiered computing platform, potentially offering significant
performance and productivity advantages over conventional single-tier, static
systems. The concept of RTCG is simple and easily implemented using existing,
robust infrastructure. Nonetheless it is powerful enough to support (and
encourage) the creation of custom application-specific tools by its users. The
premise of the paper is illustrated by a wide range of examples where the
technique has been applied with considerable success.Comment: Submitted to Parallel Computing, Elsevie
Unwoven Aspect Analysis
Various languages and tools supporting advanced separation of concerns (such as aspect-oriented programming) provide a software developer with the ability to separate functional and non-functional programmatic intentions. Once these separate pieces of the software have been specified, the tools automatically handle interaction points between separate modules, relieving the developer of this chore and permitting more understandable, maintainable code. Many approaches have left traditional compiler analysis and optimization until after the composition has been performed; unfortunately, analyses performed after composition cannot make use of the logical separation present in the original program. Further, for modular systems that can be configured with different sets of features, testing under every possible combination of features may be necessary and time-consuming to avoid bugs in production software. To solve this testing problem, we investigate a feature-aware compiler analysis that runs during composition and discovers features strongly independent of each other. When the their independence can be judged, the number of feature combinations that must be separately tested can be reduced. We develop this approach and discuss our implementation. We look forward to future programming languages in two ways: we implement solutions to problems that are conceptually aspect-oriented but for which current aspect languages and tools fail. We study these cases and consider what language designs might provide even more information to a compiler. We describe some features that such a future language might have, based on our observations of current language deficiencies and our experience with compilers for these languages
Practical Sparse Matrices in C++ with Hybrid Storage and Template-Based Expression Optimisation
Despite the importance of sparse matrices in numerous fields of science,
software implementations remain difficult to use for non-expert users,
generally requiring the understanding of underlying details of the chosen
sparse matrix storage format. In addition, to achieve good performance, several
formats may need to be used in one program, requiring explicit selection and
conversion between the formats. This can be both tedious and error-prone,
especially for non-expert users. Motivated by these issues, we present a
user-friendly and open-source sparse matrix class for the C++ language, with a
high-level application programming interface deliberately similar to the widely
used MATLAB language. This facilitates prototyping directly in C++ and aids the
conversion of research code into production environments. The class internally
uses two main approaches to achieve efficient execution: (i) a hybrid storage
framework, which automatically and seamlessly switches between three underlying
storage formats (compressed sparse column, Red-Black tree, coordinate list)
depending on which format is best suited and/or available for specific
operations, and (ii) a template-based meta-programming framework to
automatically detect and optimise execution of common expression patterns.
Empirical evaluations on large sparse matrices with various densities of
non-zero elements demonstrate the advantages of the hybrid storage framework
and the expression optimisation mechanism.Comment: extended and revised version of an earlier conference paper
arXiv:1805.0338
- …