614 research outputs found
Multidimensional computation and visualisation for marine controlled source electromagnetic methods
The controlled source electromagnetic method is improving the search for oil and gas in marine settings and is becoming an integral component of many exploration toolkits. While the level of detail and benefit obtained from recorded electromagnetic data sets is limited to the tools available, interpretation is fundamentally restricted by non-unique and equivalent solutions. I create the tools necessary to rapidly compute and visualise multi-dimensional electromagnetic fields generated for a variety of controlled source electromagnetic surveys. This thesis is divided into two parts: the creation of an electromagnetic software framework and the electromagnetic research applications.The creation of a new electromagnetic software framework is covered in Part I. Steps to create and test a modern electromagnetic data structure, three-dimensional visualisation and interactive graphical user interface from the ground up are presented. Bringing together several computer science disciplines ranging from parallel computing, networking and computer human interaction to three-dimensional visualisation, a package specifically tailored to marine controlled source electromagnetic compuation is formed. The electromagnetic framework is comprised of approximately 100,000 lines of new Java code and several third party libraries, which provides low-level graphical, network and execution cross-platform functionality. The software provides a generic framework to integrate most computational engines and algorithms into the coherent global electromagnetic package enabling the interactive forward modelling, inversion and visualisation of electromagnetic data.Part II is comprised of several research applications utilising the developed electromagnetic software framework. Cloud computing and streamline visualisation are covered. These topics are covered to solve several problems in modern controlled source electromagnetic methods. Large 3D electromagnetic modelling and inversion may require days or even weeks to be performed on a single-threaded personal computers. A massively parallelised electromagnetic forward modelling and inversion methods can dramatically was created to improve computational time. The developed ’macro’ parallelisation method facilitated the reduction in computational time by several orders of magnitude with relatively little additional effort and without modification of the internal electromagnetic algorithm. The air wave is a significant component of marine controlled source electromagnetic surveys however there is controversy and confusion over its defintion. The airwave has been described as a reflected, refracted, direct or diffusing wave, which has lead to confusion over its physical reality
A metadata-enhanced framework for high performance visual effects
This thesis is devoted to reducing the interactive latency of image processing computations in
visual effects. Film and television graphic artists depend upon low-latency feedback to receive
a visual response to changes in effect parameters. We tackle latency with a domain-specific optimising
compiler which leverages high-level program metadata to guide key computational and
memory hierarchy optimisations. This metadata encodes static and dynamic information about
data dependence and patterns of memory access in the algorithms constituting a visual effect –
features that are typically difficult to extract through program analysis – and presents it to the
compiler in an explicit form. By using domain-specific information as a substitute for program
analysis, our compiler is able to target a set of complex source-level optimisations that a vendor
compiler does not attempt, before passing the optimised source to the vendor compiler for
lower-level optimisation.
Three key metadata-supported optimisations are presented. The first is an adaptation of
space and schedule optimisation – based upon well-known compositions of the loop fusion and
array contraction transformations – to the dynamic working sets and schedules of a runtimeparameterised
visual effect. This adaptation sidesteps the costly solution of runtime code generation
by specialising static parameters in an offline process and exploiting dynamic metadata to
adapt the schedule and contracted working sets at runtime to user-tunable parameters. The second
optimisation comprises a set of transformations to generate SIMD ISA-augmented source code.
Our approach differs from autovectorisation by using static metadata to identify parallelism, in
place of data dependence analysis, and runtime metadata to tune the data layout to user-tunable
parameters for optimal aligned memory access. The third optimisation comprises a related set
of transformations to generate code for SIMT architectures, such as GPUs. Static dependence
metadata is exploited to guide large-scale parallelisation for tens of thousands of in-flight threads.
Optimal use of the alignment-sensitive, explicitly managed memory hierarchy is achieved by identifying
inter-thread and intra-core data sharing opportunities in memory access metadata.
A detailed performance analysis of these optimisations is presented for two industrially developed
visual effects. In our evaluation we demonstrate up to 8.1x speed-ups on Intel and AMD
multicore CPUs and up to 6.6x speed-ups on NVIDIA GPUs over our best hand-written implementations
of these two effects. Programmability is enhanced by automating the generation of
SIMD and SIMT implementations from a single programmer-managed scalar representation
Molecular simulations and visualization: introduction and overview
Here we provide an introduction and overview of current progress in the field of molecular simulation and visualization, touching on the following topics: (1) virtual and augmented reality for immersive molecular simulations; (2) advanced visualization and visual analytic techniques; (3) new developments in high performance computing; and (4) applications and model building
Profile-driven parallelisation of sequential programs
Traditional parallelism detection in compilers is performed by means of static analysis
and more specifically data and control dependence analysis. The information that
is available at compile time, however, is inherently limited and therefore restricts the
parallelisation opportunities. Furthermore, applications written in C – which represent
the majority of today’s scientific, embedded and system software – utilise many lowlevel
features and an intricate programming style that forces the compiler to even more
conservative assumptions. Despite the numerous proposals to handle this uncertainty
at compile time using speculative optimisation and parallelisation, the software industry
still lacks any pragmatic approaches that extracts coarse-grain parallelism to exploit
the multiple processing units of modern commodity hardware.
This thesis introduces a novel approach for extracting and exploiting multiple forms
of coarse-grain parallelism from sequential applications written in C. We utilise profiling
information to overcome the limitations of static data and control-flow analysis
enabling more aggressive parallelisation. Profiling is performed using an instrumentation
scheme operating at the Intermediate Representation (Ir) level of the compiler.
In contrast to existing approaches that depend on low-level binary tools and debugging
information, Ir-profiling provides precise and direct correlation of profiling information
back to the Ir structures of the compiler. Additionally, our approach is orthogonal to
existing automatic parallelisation approaches and additional fine-grain parallelism may
be exploited.
We demonstrate the applicability and versatility of the proposed methodology using
two studies that target different forms of parallelism. First, we focus on the exploitation
of loop-level parallelism that is abundant in many scientific and embedded
applications. We evaluate our parallelisation strategy against the Nas and Spec Fp
benchmarks and two different multi-core platforms (a shared-memory Intel Xeon Smp
and a heterogeneous distributed-memory Ibm Cell blade). Empirical evaluation shows
that our approach not only yields significant improvements when compared with state-of-
the-art parallelising compilers, but comes close to and sometimes exceeds the performance
of manually parallelised codes. On average, our methodology achieves 96%
of the performance of the hand-tuned parallel benchmarks on the Intel Xeon platform,
and a significant speedup for the Cell platform. The second study, addresses
the problem of partially sequential loops, typically found in implementations of multimedia
codecs. We develop a more powerful whole-program representation based on the Program Dependence Graph (Pdg) that supports profiling, partitioning and codegeneration
for pipeline parallelism. In addition we demonstrate how this enhances
conventional pipeline parallelisation by incorporating support for multi-level loops and
pipeline stage replication in a uniform and automatic way. Experimental results using a
set of complex multimedia and stream processing benchmarks confirm the effectiveness
of the proposed methodology that yields speedups up to 4.7 on a eight-core Intel Xeon
machine
Automatic tolerance inspection through Reverse Engineering: a segmentation technique for plastic injection moulded parts
This work studies segmentations procedures to recognise features in a Reverse Engineering (RE) application that is oriented to computer-aided tolerance inspection of injection moulding die set-up, necessary to manufacture electromechanical components. It will discuss all steps of the procedures, from the initial acquisition to the final measure data management, but specific original developments will be focused on the RE post-processing method, that should solve the problem related to the automation of the surface recognition and then of the inspection process.
As it will be explained in the first two Chapters, automation of the inspection process pertains, eminently, to feature recognition after the segmentation process. This work presents a voxel-based approach with the aim of reducing the computation efforts related to tessellation and curvature analysis, with or without filtering. In fact, a voxel structure approximates the shape through parallelepipeds that include small sub-set of points. In this sense, it represents a filter, since the number of voxels is less than the total number of points, but also a local approximation of the surface, if proper fitting models are applied.
Through sensitivity analysis and industrial applications, limits and perspectives of the proposed algorithms are discussed and validated in terms of accuracy and save of time. Validation case-studies are taken from real applications made in ABB Sace S.p.A., that promoted this research. Plastic injection moulding of electromechanical components has a time-consuming die set-up. It is due to the necessity of providing dies with many cavities, which during the cooling phase may present different stamping conditions, thus defects that include lengths outside their dimensional tolerance, and geometrical errors.
To increase the industrial efficiency, the automation of the inspection is not only due to the automatic recognition of features but also to a computer-aided inspection protocol (path planning and inspection data management). For this reason, also these steps will be faced, as the natural framework of the thesis research activity.
The work structure concerns with six chapters. In Chapter 1, an introduction to the whole procedure is presented, focusing on reasons and utilities of the application of RE techniques in industrial engineering. Chapter 2 analyses acquisition issues and methods that are related to our application, describing: (a) selected hardware; (b) adopted strategy related to the cloud of point acquisition. In Chapter 3, the proposed RE post-processing is described together with a state of art about data segmentation and surface reconstruction. Chapter 4 discusses the proposed algorithms through sensitivity studies concerning thresholds and parameters utilised in segmentation phase and surface reconstruction. Chapter 5 explains briefly the inspection workflow, PDM requirements and solution, together with a preliminary assessing of measures and their reliability. These three chapters (3, 4 and 5) report final sections, called “Discussion”, in which specific considerations are given. Finally, Chapter 6 gives examples of the proposed segmentation technique in the framework of the industrial applications, through specific case studies
Recommended from our members
Guided Automatic Binary Parallelisation
For decades, the software industry has amassed a vast repository of pre-compiled libraries and executables which are still valuable and actively in use. However, for a significant fraction of these binaries, most of the source code is absent or is written in old languages, making it practically impossible to recompile them for new generations of hardware. As the number of cores in chip multi-processors (CMPs) continue to scale, the performance of this legacy software becomes increasingly sub-optimal. Rewriting new optimised and parallel software would be a time-consuming and expensive task. Without source code, existing automatic performance enhancing and parallelisation techniques are not applicable for legacy software or parts of new applications linked with legacy libraries.
In this dissertation, three tools are presented to address the challenge of optimising legacy binaries. The first, GBR (Guided Binary Recompilation), is a tool that recompiles stripped application binaries without the need for the source code or relocation information. GBR performs static binary analysis to determine how recompilation should be undertaken, and produces a domain-specific hint program. This hint program is loaded and interpreted by the GBR dynamic runtime, which is built on top of the open-source dynamic binary translator, DynamoRIO. In this manner, complicated recompilation of the target binary is carried out to achieve optimised execution on a real system. The problem of limited dataflow and type information is addressed through cooperation between the hint program and JIT optimisation. The utility of GBR is demonstrated by software prefetch and vectorisation optimisations to achieve performance improvements compared to their original native execution.
The second tool is called BEEP (Binary Emulator for Estimating Parallelism), an extension to GBR for binary instrumentation.
BEEP is used to identify potential thread-level parallelism through static binary analysis and binary instrumentation.
BEEP performs preliminary static analysis on binaries and encodes all statically-undecided questions into a hint program.
The hint program is interpreted by GBR so that on-demand binary instrumentation codes are inserted to answer the questions from runtime information.
BEEP incorporates a few parallel cost models to evaluate identified parallelism under different parallelisation paradigms.
The third tool is named GABP (Guided Automatic Binary Parallelisation), an extension to GBR for parallelisation. GABP focuses on loops from sequential application binaries and automatically extracts thread-level parallelism from them on-the-fly, under the direction of the hint program, for efficient parallel execution. It employs a range of runtime schemes, such as thread-level speculation and synchronisation, to handle runtime data dependences. GABP achieves a geometric mean of speedup of 1.91x on binaries from SPEC CPU2006 on a real x86-64 eight-core system compared to native sequential execution. Performance is obtained for SPEC CPU2006 executables compiled from a variety of source languages and by different compilers.St John's Benefactor Scholarship
ARM Sponsorshi
Programming models to support data science workflows
Data Science workflows have become a must to progress in many scientific areas such as life, health, and earth sciences. In contrast to traditional HPC workflows, they are more heterogeneous; combining binary executions, MPI simulations, multi-threaded applications, custom analysis (possibly written in Java, Python, C/C++ or R), and real-time processing. Furthermore, in the past, field experts were capable of programming and running small simulations. However, nowadays, simulations requiring hundreds or thousands of cores are widely used and, to this point, efficiently programming them becomes a challenge even for computer sciences. Thus, programming languages and models make a considerable effort to ease the programmability while maintaining acceptable performance.
This thesis contributes to the adaptation of High-Performance frameworks to support the needs and challenges of Data Science workflows by extending COMPSs, a mature, general-purpose, task-based, distributed programming model. First, we enhance our prototype to orchestrate different frameworks inside a single programming model so that non-expert users can build complex workflows where some steps require highly optimised state of the art frameworks. This extension includes the @binary, @OmpSs, @MPI, @COMPSs, and @MultiNode annotations for both Java and Python workflows.
Second, we integrate container technologies to enable developers to easily port, distribute, and scale their applications to distributed computing platforms. This combination provides a straightforward methodology to parallelise applications from sequential codes along with efficient image management and application deployment that ease the packaging and distribution of applications. We distinguish between static, HPC, and dynamic container management and provide representative use cases for each scenario using Docker, Singularity, and Mesos.
Third, we design, implement and integrate AutoParallel, a Python module to automatically find an appropriate task-based parallelisation of affine loop nests and execute them in parallel in a distributed computing infrastructure. It is based on sequential programming and requires one single annotation (the @parallel Python decorator) so that anyone with intermediate-level programming skills can scale up an application to hundreds of cores.
Finally, we propose a way to extend task-based management systems to support continuous input and output data to enable the combination of task-based workflows and dataflows (Hybrid Workflows) using one single programming model. Hence, developers can build complex Data Science workflows with different approaches depending on the requirements without the effort of combining several frameworks at the same time. Also, to illustrate the capabilities of Hybrid Workflows, we have built a Distributed Stream Library that can be easily integrated with existing task-based frameworks to provide support for dataflows. The library provides a homogeneous, generic, and simple representation of object and file streams in both Java and Python; enabling complex workflows to handle any data type without dealing directly with the streaming back-end.Els fluxos de treball de Data Science s’han convertit en una necessitat per progressar en moltes àrees científiques com les ciències de la vida, la salut i la terra. A diferència dels fluxos de treball tradicionals per a la CAP, els fluxos de Data Science són més heterogenis; combinant l’execució de binaris, simulacions MPI, aplicacions multiprocés, anàlisi personalitzats (possiblement escrits en Java, Python, C / C ++ o R) i computacions en temps real. Mentre que en el passat els experts de cada camp eren capaços de programar i executar petites simulacions, avui dia, aquestes simulacions representen un repte fins i tot per als experts ja que requereixen centenars o milers de nuclis. Per aquesta raó, els llenguatges i models de programació actuals s’esforcen considerablement en incrementar la programabilitat mantenint un rendiment acceptable. Aquesta tesi contribueix a l’adaptació de models de programació per a la CAP per afrontar les necessitats i reptes dels fluxos de Data Science estenent COMPSs, un model de programació distribuïda madur, de propòsit general, i basat en tasques. En primer lloc, millorem el nostre prototip per orquestrar diferent programari per a que els usuaris no experts puguin crear fluxos complexos usant un únic model on alguns passos requereixin tecnologies altament optimitzades. Aquesta extensió inclou les anotacions de @binary, @OmpSs, @MPI, @COMPSs, i @MultiNode per a fluxos en Java i Python. En segon lloc, integrem tecnologies de contenidors per permetre als desenvolupadors portar, distribuir i escalar fàcilment les seves aplicacions en plataformes distribuïdes. A més d’una metodologia senzilla per a paral·lelitzar aplicacions a partir de codis seqüencials, aquesta combinació proporciona una gestió d’imatges i una implementació d’aplicacions eficients que faciliten l’empaquetat i la distribució d’aplicacions. Distingim entre la gestió de contenidors estàtica, CAP i dinàmica i proporcionem casos d’ús representatius per a cada escenari amb Docker, Singularity i Mesos. En tercer lloc, dissenyem, implementem i integrem AutoParallel, un mòdul de Python per determinar automàticament la paral·lelització basada en tasques de nius de bucles afins i executar-los en paral·lel en una infraestructura distribuïda. AutoParallel està basat en programació seqüencial, requereix una sola anotació (el decorador @parallel) i permet a un usuari intermig escalar una aplicació a centenars de nuclis. Finalment, proposem una forma d’estendre els sistemes basats en tasques per admetre dades d’entrada i sortida continus; permetent així la combinació de fluxos de treball i dades (Fluxos Híbrids) en un únic model. Conseqüentment, els desenvolupadors poden crear fluxos complexos seguint diferents patrons sense l’esforç de combinar diversos models al mateix temps. A més, per a il·lustrar les capacitats dels Fluxos Híbrids, hem creat una biblioteca (DistroStreamLib) que s’integra fàcilment amb els models basats en tasques per suportar fluxos de dades. La biblioteca proporciona una representació homogènia, genèrica i simple de seqüències contínues d’objectes i arxius en Java i Python; permetent gestionar qualsevol tipus de dades sense tractar directament amb el back-end de streaming.Los flujos de trabajo de Data Science se han convertido en una necesidad para progresar en muchas áreas científicas como las ciencias de la vida, la salud y la tierra. A diferencia de los flujos de trabajo tradicionales para la CAP, los flujos de Data Science son más heterogéneos; combinando la ejecución de binarios, simulaciones MPI, aplicaciones multiproceso, análisis personalizados (posiblemente escritos en Java, Python, C/C++ o R) y computaciones en tiempo real. Mientras que en el pasado los expertos de cada campo eran capaces de programar y ejecutar pequeñas simulaciones, hoy en día, estas simulaciones representan un desafío incluso para los expertos ya que requieren cientos o miles de núcleos. Por esta razón, los lenguajes y modelos de programación actuales se esfuerzan considerablemente en incrementar la programabilidad manteniendo un rendimiento aceptable.
Esta tesis contribuye a la adaptación de modelos de programación para la CAP para
afrontar las necesidades y desafíos de los flujos de Data Science extendiendo COMPSs, un modelo de programación distribuida maduro, de propósito general, y basado en tareas. En primer lugar, mejoramos nuestro prototipo para orquestar diferentes software para que los usuarios no expertos puedan crear flujos complejos usando un único modelo donde algunos pasos requieran tecnologías altamente optimizadas. Esta extensión incluye las anotaciones de @binary, @OmpSs, @MPI, @COMPSs, y @MultiNode para flujos en Java y Python.
En segundo lugar, integramos tecnologías de contenedores para permitir a los desarrolladores portar, distribuir y escalar fácilmente sus aplicaciones en plataformas distribuidas.
Además de una metodología sencilla para paralelizar aplicaciones a partir de códigos secuenciales, esta combinación proporciona una gestión de imágenes y una implementación de aplicaciones eficientes que facilitan el empaquetado y la distribución de aplicaciones.
Distinguimos entre gestión de contenedores estática, CAP y dinámica y proporcionamos casos de uso representativos para cada escenario con Docker, Singularity y Mesos.
En tercer lugar, diseñamos, implementamos e integramos AutoParallel, un módulo de
Python para determinar automáticamente la paralelización basada en tareas de nidos de bucles afines y ejecutarlos en paralelo en una infraestructura distribuida. AutoParallel está basado en programación secuencial, requiere una sola anotación (el decorador @parallel) y permite a un usuario intermedio escalar una aplicación a cientos de núcleos.
Finalmente, proponemos una forma de extender los sistemas basados en tareas para admitir datos de entrada y salida continuos; permitiendo así la combinación de flujos de trabajo y datos (Flujos Híbridos) en un único modelo. Consecuentemente, los desarrolladores pueden crear flujos complejos siguiendo diferentes patrones sin el esfuerzo de combinar varios modelos al mismo tiempo. Además, para ilustrar las capacidades de los Flujos Híbridos, hemos creado una biblioteca (DistroStreamLib) que se integra fácilmente a los modelos basados en tareas para soportar flujos de datos. La biblioteca proporciona una representación homogénea, genérica y simple de secuencias continuas de objetos y archivos en Java y Python; permitiendo manejar cualquier tipo de datos sin tratar directamente con el back-end de streaming
VSpipe, an Integrated Resource for Virtual Screening and Hit Selection: Applications to Protein Tyrosine Phospahatase Inhibition
The use of computational tools for virtual screening provides a cost-efficient approach to select starting points for drug development. We have developed VSpipe, a user-friendly semi-automated pipeline for structure-based virtual screening. VSpipe uses the existing tools AutoDock and OpenBabel together with software developed in-house, to create an end-to-end virtual screening workflow ranging from the preparation of receptor and ligands to the visualisation of results. VSpipe is efficient and flexible, allowing the users to make choices at different steps, and it is amenable to use in both local and cluster mode. We have validated VSpipe using the human protein tyrosine phosphatase PTP1B as a case study. Using a combination of blind and targeted docking VSpipe identified both new and known functional ligand binding sites. Assessment of different binding clusters using the ligand efficiency plots created by VSpipe, defined a drug-like chemical space for development of PTP1B inhibitors with potential applications to other PTPs. In this study, we show that VSpipe can be deployed to identify and compare different modes of inhibition thus guiding the selection of initial hits for drug discovery
- …