7,839 research outputs found
Hierarchical Parallelisation of Functional Renormalisation Group Calculations -- hp-fRG
The functional renormalisation group (fRG) has evolved into a versatile tool
in condensed matter theory for studying important aspects of correlated
electron systems. Practical applications of the method often involve a high
numerical effort, motivating the question in how far High Performance Computing
(HPC) can leverage the approach. In this work we report on a multi-level
parallelisation of the underlying computational machinery and show that this
can speed up the code by several orders of magnitude. This in turn can extend
the applicability of the method to otherwise inaccessible cases. We exploit
three levels of parallelisation: Distributed computing by means of Message
Passing (MPI), shared-memory computing using OpenMP, and vectorisation by means
of SIMD units (single-instruction-multiple-data). Results are provided for two
distinct High Performance Computing (HPC) platforms, namely the IBM-based
BlueGene/Q system JUQUEEN and an Intel Sandy-Bridge-based development cluster.
We discuss how certain issues and obstacles were overcome in the course of
adapting the code. Most importantly, we conclude that this vast improvement can
actually be accomplished by introducing only moderate changes to the code, such
that this strategy may serve as a guideline for other researcher to likewise
improve the efficiency of their codes
Using ACL2 to Verify Loop Pipelining in Behavioral Synthesis
Behavioral synthesis involves compiling an Electronic System-Level (ESL)
design into its Register-Transfer Level (RTL) implementation. Loop pipelining
is one of the most critical and complex transformations employed in behavioral
synthesis. Certifying the loop pipelining algorithm is challenging because
there is a huge semantic gap between the input sequential design and the output
pipelined implementation making it infeasible to verify their equivalence with
automated sequential equivalence checking techniques. We discuss our ongoing
effort using ACL2 to certify loop pipelining transformation. The completion of
the proof is work in progress. However, some of the insights developed so far
may already be of value to the ACL2 community. In particular, we discuss the
key invariant we formalized, which is very different from that used in most
pipeline proofs. We discuss the needs for this invariant, its formalization in
ACL2, and our envisioned proof using the invariant. We also discuss some
trade-offs, challenges, and insights developed in course of the project.Comment: In Proceedings ACL2 2014, arXiv:1406.123
A Test Suite for High-Performance Parallel Java
The Java programming language has a number of features that make it attractive for writing high-quality, portable parallel programs. A pure object formulation, strong typing and the exception model make programs easier to create, debug, and maintain. The elegant threading provides a simple route to parallelism on shared-memory machines. Anticipating great improvements in numerical performance, this paper presents a suite of simple programs that indicate how a pure Java Navier-Stokes solver might perform. The suite includes a parallel Euler solver. We present results from a 32-processor Hewlett-Packard machine and a 4-processor Sun server. While speedup is excellent on both machines, indicating a high-quality thread scheduler, the single-processor performance needs much improvement
A Fast Compiler for NetKAT
High-level programming languages play a key role in a growing number of
networking platforms, streamlining application development and enabling precise
formal reasoning about network behavior. Unfortunately, current compilers only
handle "local" programs that specify behavior in terms of hop-by-hop forwarding
behavior, or modest extensions such as simple paths. To encode richer "global"
behaviors, programmers must add extra state -- something that is tricky to get
right and makes programs harder to write and maintain. Making matters worse,
existing compilers can take tens of minutes to generate the forwarding state
for the network, even on relatively small inputs. This forces programmers to
waste time working around performance issues or even revert to using
hardware-level APIs.
This paper presents a new compiler for the NetKAT language that handles rich
features including regular paths and virtual networks, and yet is several
orders of magnitude faster than previous compilers. The compiler uses symbolic
automata to calculate the extra state needed to implement "global" programs,
and an intermediate representation based on binary decision diagrams to
dramatically improve performance. We describe the design and implementation of
three essential compiler stages: from virtual programs (which specify behavior
in terms of virtual topologies) to global programs (which specify network-wide
behavior in terms of physical topologies), from global programs to local
programs (which specify behavior in terms of single-switch behavior), and from
local programs to hardware-level forwarding tables. We present results from
experiments on real-world benchmarks that quantify performance in terms of
compilation time and forwarding table size
- …