940 research outputs found
Xar-Trek: Run-Time Execution Migration among FPGAs and Heterogeneous-ISA CPUs
Datacenter servers are increasingly heterogeneous: from x86 host CPUs, to ARM
or RISC-V CPUs in NICs/SSDs, to FPGAs. Previous works have demonstrated that
migrating application execution at run-time across heterogeneous-ISA CPUs can
yield significant performance and energy gains, with relatively little
programmer effort. However, FPGAs have often been overlooked in that context:
hardware acceleration using FPGAs involves statically implementing select
application functions, which prohibits dynamic and transparent migration. We
present Xar-Trek, a new compiler and run-time software framework that overcomes
this limitation. Xar-Trek compiles an application for several CPU ISAs and
select application functions for acceleration on an FPGA, allowing execution
migration between heterogeneous-ISA CPUs and FPGAs at run-time. Xar-Trek's
run-time monitors server workloads and migrates application functions to an
FPGA or to heterogeneous-ISA CPUs based on a scheduling policy. We develop a
heuristic policy that uses application workload profiles to make scheduling
decisions. Our evaluations conducted on a system with x86-64 server CPUs, ARM64
server CPUs, and an Alveo accelerator card reveal 88%-1% performance gains over
no-migration baselines
Efficient Machine-Independent Programming of High-Performance Multiprocessors
Parallel computing is regarded by most computer scientists as the most
likely approach for significantly improving computing power for scientists
and engineers. Advances in programming languages and parallelizing
compilers are making parallel computers easier to use by providing
a high-level portable programming model that protects software
investment. However, experience has shown that simply finding
parallelism is not always sufficient for obtaining good performance
from today's multiprocessors. The goal of this project is to develop
advanced compiler analysis of data and computation decompositions,
thread placement, communication, synchronization, and memory system
effects needed in order to take advantage of performance-critical
elements in modern parallel architectures
Scalable RDMA performance in PGAS languages
Partitioned global address space (PGAS) languages provide a unique programming model that can span shared-memory multiprocessor (SMP) architectures, distributed memory machines, or cluster ofSMPs. Users can program large scale machines with easy-to-use, shared memory paradigms. In order to exploit large scale machines efficiently, PGAS language implementations and their runtime system must be designed for scalability and performance. The IBM XLUPC compiler and runtime system provide a scalable design through the use of the shared variable directory (SVD). The SVD stores meta-information needed to access shared data. It is dereferenced, in the worst case, for every shared memory access, thus exposing a potential performance problem. In this paper we present a cache of remote addresses as an optimization that will reduce the SVD access overhead and allow the exploitation of native (remote) direct memory accesses. It results in a significant performance improvement while maintaining the run-time portability and scalability.Postprint (published version
Run-time support for parallel object-oriented computing: the NIP lazy task creation technique and the NIP object-based software distributed shared memory
PhD ThesisAdvances in hardware technologies combined with decreased costs
have started a trend towards massively parallel architectures that utilise
commodity components. It is thought unreasonable to expect software
developers to manage the high degree of parallelism that is made
available by these architectures. This thesis argues that a new
programming model is essential for the development of parallel
applications and presents a model which embraces the notions of
object-orientation and implicit identification of parallelism. The new
model allows software engineers to concentrate on development issues,
using the object-oriented paradigm, whilst being freed from the burden
of explicitly managing parallel activity.
To support the programming model, the semantics of an execution
model are defined and implemented as part of a run-time support
system for object-oriented parallel applications. Details of the novel
techniques from the run-time system, in the areas of lazy task creation
and object-based, distributed shared memory, are presented.
The tasklet construct for representing potentially parallel
computation is introduced and further developed by this thesis. Three
caching techniques that take advantage of memory access patterns
exhibited in object-oriented applications are explored. Finally, the
performance characteristics of the introduced run-time techniques are
analysed through a number of benchmark applications
Web Engineering for Workflow-based Applications: Models, Systems and Methodologies
This dissertation presents novel solutions for the construction of Workflow-based Web applications: The Web Engineering DSL Framework, a stakeholder-oriented Web Engineering methodology based on Domain-Specific Languages; the Workflow DSL for the efficient engineering of Web-based Workflows with strong stakeholder involvement; the Dialog DSL for the usability-oriented development of advanced Web-based dialogs; the Web Engineering Reuse Sphere enabling holistic, stakeholder-oriented reuse
Speculative execution by using software transactional memory
Dissertação apresentada na Faculdade de Ciências e Tecnologia da Universidade Nova de Lisboa para a obtenção do Grau de Mestre em Engenharia Informática.Many programs sequentially execute operations that take a long time to complete. Some of these operations may return a highly predictable result. If this is the case, speculative execution can improve the overall performance of the program.
Speculative execution is the execution of code whose result may not be needed. Generally it is used as a performance optimization. Instead of waiting for the result of a costly operation,speculative execution can be used to speculate the operation most probable result and continue
executing based in this speculation. If later the speculation is confirmed to be correct, time had been gained. Otherwise, if the speculation is incorrect, the execution based in the speculation must abort and re-execute with the correct result.
In this dissertation we propose the design of an abstract process to add speculative execution to a program by doing source-to-source transformation. This abstract process is used in the definition of a mechanism and methodology that enable programmer to add speculative execution to the source code of programs. The abstract process is also used in the design of an automatic source-to-source transformation process that adds speculative execution to existing programs without user intervention. Finally, we also evaluate the performance impact of introducing speculative execution in database clients.
Existing proposals for the design of mechanisms to add speculative execution lacked portability in favor of performance. Some were designed to be implemented at kernel or hardware level. The process and mechanisms we propose in this dissertation can add speculative execution to the source of program, independently of the kernel or hardware that is used.
From our experiments we have concluded that database clients can improve their performance by using speculative execution. There is nothing in the system we propose that limits in the scope of database clients. Although this was the scope of the case study, we strongly believe that other programs can benefit from the proposed process and mechanisms for introduction of speculative execution
- …