11 research outputs found
Branch Prediction For Network Processors
Originally designed to favour flexibility over packet processing performance, the future of the programmable network processor is challenged by the need to meet both increasing line rate as well as providing additional processing capabilities. To meet these requirements, trends within networking research has tended to focus on techniques such as offloading computation intensive tasks to dedicated hardware logic or through increased parallelism. While parallelism retains flexibility, challenges such as load-balancing limit its scope. On the other hand, hardware offloading allows complex algorithms to be implemented at high speed but sacrifice flexibility. To this end, the work in this thesis is focused on a more fundamental aspect of a network processor, the data-plane processing engine.
Performing both system modelling and analysis of packet processing functions; the goal of this thesis is to identify and extract salient information regarding the performance of multi-processor workloads. Following on from a traditional software based analysis of programme workloads, we develop a method of modelling and analysing hardware accelerators when applied to network processors. Using this quantitative information, this thesis proposes an architecture which allows deeply pipelined micro-architectures to be implemented on the data-plane while reducing the branch penalty associated with these architectures
Removing and restoring control flow with the Value State Dependence Graph
This thesis studies the practicality of compiling with only data flow information.
Specifically, we focus on the challenges that arise when using the Value
State Dependence Graph (VSDG) as an intermediate representation (IR).
We perform a detailed survey of IRs in the literature in order to discover
trends over time, and we classify them by their features in a taxonomy. We
see how the VSDG fits into the IR landscape, and look at the divide between
academia and the 'real world' in terms of compiler technology. Since most
data flow IRs cannot be constructed for irreducible programs, we perform an
empirical study of irreducibility in current versions of open source software,
and then compare them with older versions of the same software. We also
study machine-generated C code from a variety of different software tools.
We show that irreducibility is no longer a problem, and is becoming less so
with time. We then address the problem of constructing the VSDG. Since
previous approaches in the literature have been poorly documented or ignored
altogether, we give our approach to constructing the VSDG from a common
IR: the Control Flow Graph. We show how our approach is independent of
the source and target language, how it is able to handle unstructured control
flow, and how it is able to transform irreducible programs on the fly. Once the
VSDG is constructed, we implement Lawrence's proceduralisation algorithm
in order to encode an evaluation strategy whilst translating the program into
a parallel representation: the Program Dependence Graph. From here, we
implement scheduling and then code generation using the LLVM compiler.
We compare our compiler framework against several existing compilers, and
show how removing control flow with the VSDG and then restoring it later
can produce high quality code. We also examine specific situations where the
VSDG can put pressure on existing code generators. Our results show that the
VSDG represents a radically different, yet practical, approach to compilation
Méthodologie de conception d'un modèle comportemental pour la vérification formelle
Mémoire numérisé par la Direction des bibliothèques de l'Université de Montréal
The exploitation of parallelism on shared memory multiprocessors
PhD ThesisWith the arrival of many general purpose shared memory multiple processor
(multiprocessor) computers into the commercial arena during the mid-1980's, a
rift has opened between the raw processing power offered by the emerging
hardware and the relative inability of its operating software to effectively deliver
this power to potential users. This rift stems from the fact that, currently, no
computational model with the capability to elegantly express parallel activity is
mature enough to be universally accepted, and used as the basis for programming
languages to exploit the parallelism that multiprocessors offer. To add to this,
there is a lack of software tools to assist programmers in the processes of designing
and debugging parallel programs.
Although much research has been done in the field of programming languages,
no undisputed candidate for the most appropriate language for programming
shared memory multiprocessors has yet been found. This thesis examines why this
state of affairs has arisen and proposes programming language constructs,
together with a programming methodology and environment, to close the ever
widening hardware to software gap.
The novel programming constructs described in this thesis are intended for use
in imperative languages even though they make use of the synchronisation
inherent in the dataflow model by using the semantics of single assignment when
operating on shared data, so giving rise to the term shared values. As there are
several distinct parallel programming paradigms, matching flavours of shared
value are developed to permit the concise expression of these paradigms.The Science and Engineering Research Council
Optimisation of the enactment of fine-grained distributed data-intensive work flows
The emergence of data-intensive science as the fourth science paradigm has posed a
data deluge challenge for enacting scientific work-flows. The scientific community is
facing an imminent flood of data from the next generation of experiments and simulations,
besides dealing with the heterogeneity and complexity of data, applications and
execution environments. New scientific work-flows involve execution on distributed and
heterogeneous computing resources across organisational and geographical boundaries,
processing gigabytes of live data streams and petabytes of archived and simulation data,
in various formats and from multiple sources. Managing the enactment of such work-flows not only requires larger storage space and faster machines, but the capability to
support scalability and diversity of the users, applications, data, computing resources
and the enactment technologies.
We argue that the enactment process can be made efficient using optimisation techniques
in an appropriate architecture. This architecture should support the creation
of diversified applications and their enactment on diversified execution environments,
with a standard interface, i.e. a work-flow language. The work-flow language should
be both human readable and suitable for communication between the enactment environments.
The data-streaming model central to this architecture provides a scalable
approach to large-scale data exploitation. Data-flow between computational elements
in the scientific work-flow is implemented as streams. To cope with the exploratory
nature of scientific work-flows, the architecture should support fast work-flow prototyping,
and the re-use of work-flows and work-flow components. Above all, the enactment
process should be easily repeated and automated.
In this thesis, we present a candidate data-intensive architecture that includes an intermediate
work-flow language, named DISPEL. We create a new fine-grained measurement
framework to capture performance-related data during enactments, and design
a performance database to organise them systematically. We propose a new enactment
strategy to demonstrate that optimisation of data-streaming work-flows can be
automated by exploiting performance data gathered during previous enactments