9 research outputs found
Achieving High Performance and High Productivity in Next Generational Parallel Programming Languages
Processor design has turned toward parallelism and heterogeneity
cores to achieve performance and energy efficiency. Developers
find high-level languages attractive because they use abstraction
to offer productivity and portability over hardware complexities.
To achieve performance, some modern implementations of high-level
languages use work-stealing scheduling for load balancing of
dynamically created tasks. Work-stealing is a promising approach
for effectively exploiting software parallelism on parallel
hardware. A programmer who uses work-stealing explicitly
identifies potential parallelism and the runtime then schedules
work, keeping otherwise idle hardware busy while relieving
overloaded hardware of its burden.
However, work-stealing comes with substantial overheads. These
overheads arise as a necessary side effect of the implementation
and hamper parallel performance. In addition to runtime-imposed
overheads, there is a substantial cognitive load associated with
ensuring that parallel code is data-race free. This dissertation
explores the overheads associated with achieving high performance
parallelism in modern high-level languages.
My thesis is that, by exploiting existing underlying mechanisms
of managed runtimes; and by extending existing language design,
high-level languages will be able to deliver productivity and
parallel performance at the levels necessary for widespread
uptake.
The key contributions of my thesis are: 1) a detailed analysis of
the key sources of overhead associated with a work-stealing
runtime, namely sequential and dynamic overheads; 2) novel
techniques to reduce these overheads that use rich features of
managed runtimes such as the yieldpoint mechanism, on-stack
replacement, dynamic code-patching, exception handling support,
and return barriers; 3) comprehensive analysis of the resulting
benefits, which demonstrate that work-stealing overheads can be
significantly reduced, leading to substantial performance
improvements; and 4) a small set of language extensions that
achieve both high performance and high productivity with minimal
programmer effort.
A managed runtime forms the backbone of any modern implementation
of a high-level language. Managed runtimes enjoy the benefits of
a long history of research and their implementations are highly
optimized. My thesis demonstrates that converging these highly
optimized features together with the expressiveness of high-level
languages, gives further hope for achieving high performance and
high productivity on modern parallel hardwar
Proceedings of the 8th Python in Science conference
International audienceThe SciPy conference provides a unique opportunity to learn and affect what is happening in the realm of scientific computing with Python. Attendees have the opportunity to review the available tools and how they apply to specific problems. By providing a forum for developers to share their Python expertise with the wider commercial, academic, and research communities, this conference fosters collaboration and facilitates the sharing of software components, techniques and a vision for high level language use in scientific computing
Techniques of design optimisation for algorithms implemented in software
The overarching objective of this thesis was to develop tools for parallelising, optimising,
and implementing algorithms on parallel architectures, in particular General Purpose
Graphics Processors (GPGPUs). Two projects were chosen from different application areas
in which GPGPUs are used: a defence application involving image compression, and a
modelling application in bioinformatics (computational immunology). Each project had its
own specific objectives, as well as supporting the overall research goal.
The defence / image compression project was carried out in collaboration with the Jet
Propulsion Laboratories. The specific questions were: to what extent an algorithm designed
for bit-serial for the lossless compression of hyperspectral images on-board unmanned
vehicles (UAVs) in hardware could be parallelised, whether GPGPUs could be used to
implement that algorithm, and whether a software implementation with or without GPGPU
acceleration could match the throughput of a dedicated hardware (FPGA) implementation.
The dependencies within the algorithm were analysed, and the algorithm parallelised. The
algorithm was implemented in software for GPGPU, and optimised. During the optimisation
process, profiling revealed less than optimal device utilisation, but no further optimisations
resulted in an improvement in speed. The design had hit a local-maximum of performance.
Analysis of the arithmetic intensity and data-flow exposed flaws in the standard optimisation
metric of kernel occupancy used for GPU optimisation. Redesigning the implementation
with revised criteria (fused kernels, lower occupancy, and greater data locality) led to a new
implementation with 10x higher throughput. GPGPUs were shown to be viable for on-board
implementation of the CCSDS lossless hyperspectral image compression algorithm,
exceeding the performance of the hardware reference implementation, and providing
sufficient throughput for the next generation of image sensor as well.
The second project was carried out in collaboration with biologists at the University of
Arizona and involved modelling a complex biological system – VDJ recombination involved
in the formation of T-cell receptors (TCRs). Generation of immune receptors (T cell receptor
and antibodies) by VDJ recombination is an enormously complex process, which can
theoretically synthesize greater than 1018 variants. Originally thought to be a random
process, the underlying mechanisms clearly have a non-random nature that preferentially
creates a small subset of immune receptors in many individuals. Understanding this bias is a
longstanding problem in the field of immunology. Modelling the process of VDJ
recombination to determine the number of ways each immune receptor can be synthesized,
previously thought to be untenable, is a key first step in determining how this special
population is made. The computational tools developed in this thesis have allowed
immunologists for the first time to comprehensively test and invalidate a longstanding theory
(convergent recombination) for how this special population is created, while generating the
data needed to develop novel hypothesis
Modular Collaborative Program Analysis
With our world increasingly relying on computers, it is important to ensure the quality, correctness, security, and performance of software systems. Static analysis that computes properties of computer programs without executing them has been an important method to achieve this for decades. However, static analysis faces major chal-
lenges in increasingly complex programming languages and software systems and increasing and sometimes conflicting demands for soundness, precision, and scalability. In order to cope with these challenges, it is necessary to build static analyses for complex problems from small, independent, yet collaborating modules that can be developed in isolation and combined in a plug-and-play manner.
So far, no generic architecture to implement and combine a broad range of dissimilar static analyses exists. The goal of this thesis is thus to design such an architecture and implement it as a generic framework for developing modular, collaborative static analyses. We use several, diverse case-study analyses from which we systematically derive requirements to guide the design of the framework. Based on this, we propose the use of a blackboard-architecture style collaboration of analyses that we implement in the OPAL framework. We also develop a formal model of our architectures core concepts and show how it enables freely composing analyses while retaining their soundness guarantees.
We showcase and evaluate our architecture using the case-study analyses, each of which shows how important and complex problems of static analysis can be addressed using a modular, collaborative implementation style. In particular, we show how a modular architecture for the construction of call graphs ensures consistent soundness of different algorithms. We show how modular analyses for different aspects of immutability mutually benefit each other. Finally, we show how the analysis of method purity can benefit from the use of other complex analyses in a collaborative manner and from exchanging different analysis implementations that exhibit different characteristics. Each of these case studies improves over the respective state of the art in terms of soundness, precision, and/or scalability and shows how our architecture enables experimenting with and fine-tuning trade-offs between these qualities
Design and implementation of an array language for computational science on a heterogeneous multicore architecture
The packing of multiple processor cores onto a single chip has become a mainstream solution to fundamental physical issues relating to the microscopic scales employed in the manufacture of semiconductor components. Multicore architectures provide lower clock speeds per core, while aggregate floating-point capability continues to increase.
Heterogeneous multicore chips, such as the Cell Broadband Engine (CBE) and modern graphics chips, also address the related issue of an increasing mismatch between high processor speeds, and huge latency to main memory. Such chips tackle this memory wall by the provision of addressable caches; increased bandwidth to main memory; and fast thread context switching. An associated cost is often reduced functionality of the individual accelerator cores; and the increased complexity involved in their programming.
This dissertation investigates the application of a programming language supporting the first-class use of arrays; and capable of automatically parallelising array expressions; to the heterogeneous multicore domain of the CBE, as found in the Sony PlayStation 3 (PS3). The language is a pre-existing and well-documented proper subset of Fortran, known as the ‘F’ programming language. A bespoke compiler, referred to as E , is developed to support this aim, and written in the Haskell programming language.
The output of the compiler is in an extended C++ dialect known as Offload C++, which targets the PS3. A significant feature of this language is its use of multiple, statically typed, address spaces. By focusing on generic, polymorphic interfaces for both the generated and hand constructed code, a number of interesting design patterns relating to the memory locality are introduced.
A suite of medium-sized (100-700 lines), real-world benchmark programs are used to evaluate the performance, correctness, and scalability of the compiler technology. Absolute speedup values, well in excess of one, are observed for all of the programs.
The work ultimately demonstrates that an array language can significantly reduce the effort expended to utilise a parallel heterogeneous multicore architecture, while retaining high performance. A substantial, related advantage in using standard ‘F’ is that any Fortran compiler can create debuggable, and competitively performing serial programs
Social work with airports passengers
Social work at the airport is in to offer to passengers social services. The main
methodological position is that people are under stress, which characterized by a
particular set of characteristics in appearance and behavior. In such circumstances
passenger attracts in his actions some attention. Only person whom he trusts can help him
with the documents or psychologically
On Cyprian Norwid. Studies and Essays
The book is the second volume of an extensive four-volume monograph devoted to the work of Cyprian Norwid (1821–1883), one of the most outstanding Polish authors. The impact of Norwid’s oeuvre does not fade, as he addresses fundamental and timeless issues, such as the moral and spiritual condition of man or his place in the world and history, and seeks to answer universal questions. The book contains an extensive selection of contributions by eminent researchers, which represent different approaches to the poet’s work. They cover various areas of research, including interpretation, thematology, genology, and editing
On Cyprian Norwid. Studies and Essays
The book is the second volume of an extensive four-volume monograph devoted to the work of Cyprian Norwid (1821–1883), one of the most outstanding Polish authors. The impact of Norwid’s oeuvre does not fade, as he addresses fundamental and timeless issues, such as the moral and spiritual condition of man or his place in the world and history, and seeks to answer universal questions. The book contains an extensive selection of contributions by eminent researchers, which represent different approaches to the poet’s work. They cover various areas of research, including interpretation, thematology, genology, and editing