1,075 research outputs found
Static analysis of energy consumption for LLVM IR programs
Energy models can be constructed by characterizing the energy consumed by
executing each instruction in a processor's instruction set. This can be used
to determine how much energy is required to execute a sequence of assembly
instructions, without the need to instrument or measure hardware.
However, statically analyzing low-level program structures is hard, and the
gap between the high-level program structure and the low-level energy models
needs to be bridged. We have developed techniques for performing a static
analysis on the intermediate compiler representations of a program.
Specifically, we target LLVM IR, a representation used by modern compilers,
including Clang. Using these techniques we can automatically infer an estimate
of the energy consumed when running a function under different platforms, using
different compilers.
One of the challenges in doing so is that of determining an energy cost of
executing LLVM IR program segments, for which we have developed two different
approaches. When this information is used in conjunction with our analysis, we
are able to infer energy formulae that characterize the energy consumption for
a particular program. This approach can be applied to any languages targeting
the LLVM toolchain, including C and XC or architectures such as ARM Cortex-M or
XMOS xCORE, with a focus towards embedded platforms. Our techniques are
validated on these platforms by comparing the static analysis results to the
physical measurements taken from the hardware. Static energy consumption
estimation enables energy-aware software development, without requiring
hardware knowledge
Estimating resource bounds for software transactions
We present an effect based static analysis to calculate upper and lower bounds on the memory resource consumption in a transactional calculus. The calculus is a concurrent variant of Featherweight Java extended by transactional constructs.
The model supports nested and concurrent transactions. The analysis is compositional and takes into account implicit join synchronizations that arise when more than one thread perfom a join-synchronization when jointly committing a transaction. Central for a compositional and precise analysis is to capture as part of the effects a tree-representation of the future resource consumption and synchronization points (which we call joining commit trees). We show the soundness of the analysis
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
A speculative execution approach to provide semantically aware contention management for concurrent systems
PhD ThesisMost modern platforms offer ample potention for parallel execution of concurrent programs yet concurrency control is required to exploit parallelism while maintaining program correctness. Pessimistic con-
currency control featuring blocking synchronization and mutual ex-
clusion, has given way to transactional memory, which allows the
composition of concurrent code in a manner more intuitive for the
application programmer. An important component in any transactional memory technique however is the policy for resolving conflicts
on shared data, commonly referred to as the contention management
policy.
In this thesis, a Universal Construction is described which provides
contention management for software transactional memory. The technique differs from existing approaches given that multiple execution
paths are explored speculatively and in parallel. In the resolution of
conflicts by state space exploration, we demonstrate that both concur-
rent conflicts and semantic conflicts can be solved, promoting multi-
threaded program progression.
We de ne a model of computation called Many Systems, which defines the execution of concurrent threads as a state space management
problem. An implementation is then presented based on concepts
from the model, and we extend the implementation to incorporate
nested transactions. Results are provided which compare the performance of our approach with an established contention management
policy, under varying degrees of concurrent and semantic conflicts. Finally, we provide performance results from a number of search strategies, when nested transactions are introduced
Inferring Parametric Energy Consumption Functions at Different Software Levels:ISA vs. LLVM IR
The static estimation of the energy consumed by program
executions is an important challenge, which has applications in program
optimization and verification, and is instrumental in energy-aware software
development. Our objective is to estimate such energy consumption
in the form of functions on the input data sizes of programs. We have developed
a tool for experimentation with static analysis which infers such
energy functions at two levels, the instruction set architecture (ISA) and
the intermediate code (LLVM IR) levels, and re
ects it upwards to the
higher source code level. This required the development of a translation
from LLVM IR to an intermediate representation and its integration
with existing components, a translation from ISA to the same representation,
a resource analyzer, an ISA-level energy model, and a mapping
from this model to LLVM IR. The approach has been applied to programs
written in the XC language running on XCore architectures, but
is general enough to be applied to other languages. Experimental results
show that our LLVM IR level analysis is reasonably accurate (less than
6:4% average error vs. hardware measurements) and more powerful than
analysis at the ISA level. This paper provides insights into the trade-off
of precision versus analyzability at these levels
Combining dynamic and static scheduling in high-level synthesis
Field Programmable Gate Arrays (FPGAs) are starting to become mainstream devices for custom computing, particularly deployed in data centres. However, using these FPGA devices requires familiarity with digital design at a low abstraction level. In order to enable software engineers without a hardware background to design custom hardware, high-level synthesis (HLS) tools automatically transform a high-level program, for example in C/C++, into a low-level hardware description.
A central task in HLS is scheduling: the allocation of operations to clock cycles. The classic approach to scheduling is static, in which each operation is mapped to a clock cycle at compile time, but recent years have seen the emergence of dynamic scheduling, in which an operation’s clock cycle is only determined at run-time. Both approaches have their merits: static scheduling can lead to simpler circuitry and more resource sharing, while dynamic scheduling can lead to faster hardware when the computation has a non-trivial control flow.
This thesis proposes a scheduling approach that combines the best of both worlds. My idea is to use existing program analysis techniques in software designs, such as probabilistic analysis and formal verification, to optimize the HLS hardware. First, this thesis proposes a tool named DASS that uses a heuristic-based approach to identify the code regions in the input program that are amenable to static scheduling and synthesises them into statically scheduled components, also known as static islands, leaving the top-level hardware dynamically scheduled. Second, this thesis addresses a problem of this approach: that the analysis of static islands and their dynamically scheduled surroundings are separate, where one treats the other as black boxes. We apply static analysis including dependence analysis between static islands and their dynamically scheduled surroundings to optimize the offsets of static islands for high performance. We also apply probabilistic analysis to estimate the performance of the dynamically scheduled part and use this information to optimize the static islands for high area efficiency. Finally, this thesis addresses the problem of conservatism in using sequential control flow designs which can limit the throughput of the hardware. We show this challenge can be solved by formally proving that certain control flows can be safely parallelised for high performance. This thesis demonstrates how to use automated formal verification to find out-of-order loop pipelining solutions and multi-threading solutions from a sequential program.Open Acces
Approachable Error Bounded Lossy Compression
Compression is commonly used in HPC applications to move and store data. Traditional lossless compression, however, does not provide adequate compression of floating point data often found in scientific codes. Recently, researchers and scientists have turned to lossy compression techniques that approximate the original data rather than reproduce it in order to achieve desired levels of compression. Typical lossy compressors do not bound the errors introduced into the data, leading to the development of error bounded lossy compressors (EBLC). These tools provide the desired levels of compression as mathematical guarantees on the errors introduced. However, the current state of EBLC leaves much to be desired. The existing EBLC all have different interfaces requiring codes to be changed to adopt new techniques; EBLC have many more configuration options than their predecessors, making them more difficult to use; and EBLC typically bound quantities like point wise errors rather than higher level metrics such as spectra, p-values, or test statistics that scientists typically use. My dissertation aims to provide a uniform interface to compression and to develop tools to allow application scientists to understand and apply EBLC. This dissertation proposal presents three groups of work: LibPressio, a standard interface for compression and analysis; FRaZ/LibPressio-Opt frameworks for the automated configuration of compressors using LibPressio; and work on tools for analyzing errors in particular domains
Fault-tolerant parallel applications using a network of workstations
PhD thesisIt is becoming common to employ a Network Of Workstations, often referred to as a NOW, for
general purpose computing since the allocation of an individual workstation offers good interactive
response. However, there may still be a need to perform very large scale computations which exceed
the resources of a single workstation. It may be that the amount of processing implies an inconveniently
long duration or that the data manipulated exceeds available storage. One possibility is to employ a
more powerful single machine for such computations. However, there is growing interest in seeking a
cheaper alternative by harnessing the significant idle time often observed in a NOW and also possibly
employing a number of workstations in parallel on a single problem. Parallelisation permits use of the
combined memories of all participating workstations, but also introduces a need for communication. and
success in any hardware environment depends on the amount of communication relative to the amount
of computation required. In the context of a NOW, much success is reported with applications which
have low communication requirements relative to computation requirements.
Here it is claimed that there is reason for investigation into the use of a NOW for parallel execution
of computations which are demanding in storage, potentially even exceeding the sum of memory in
all available workstations. Another consideration is that where a computation is of sufficient scale,
some provision for tolerating partial failures may be desirable. However, generic support for storage
management and fault-tolerance in computations of this scale for a NOW is not currently available and
the suitability of a NOW for solving such computations has not been investigated to any large extent.
The work described here is concerned with these issues.
The approach employed is to make use of an existing distributed system which supports nested
atomic actions (atomic transactions) to structure fault-tolerant computations with persistent objects.
This system is used to develop a fault-tolerant "bag of tasks" computation model, where the bag and
shared objects are located on secondary storage.
In order to understand the factors that affect the performance of large parallel computations on a
NOW, a number of specific applications are developed. The performance of these applications is ana-
lysed using a semi-empirical model. The same measurements underlying these performance predictions
may be employed in estimation of the performance of alternative application structures. Using services
provided by the distributed system referred to above, each application is implemented. The implement-
ation allows verification of predicted performance and also permits identification of issues regarding
construction of components required to support the chosen application structuring technique. The work
demonstrates that a NOW certainly offers some potential for gain through parallelisation and that for
large grain computations, the cost of implementing fault tolerance is low.Engineering and Physical Sciences Research Counci
- …