196 research outputs found

    Dataflow methods in HPC, visualisation and analysis

    Get PDF
    The processing power available to scientists and engineers using supercomputers over the last few decades has grown exponentially, permitting significantly more sophisticated simulations, and as a consequence, generating proportionally larger output datasets. This change has taken place in tandem with a gradual shift in the design and implementation of simulation and post-processing software, with a shift from simulation as a first step and visualisation/analysis as a second, towards in-situ on the fly methods that provide immediate visual feedback, place less strain on file-systems and reduce overall data-movement and copying. Concurrently, processor speed increases have dramatically slowed and multi and many-core architectures have instead become the norm for virtually all High Performance computing (HPC) machines. This in turn has led to a shift away from the traditional distributed one rank per node model, to one rank per process, using multiple processes per multicore node, and then back towards one rank per node again, using distributed and multi-threaded frameworks combined. This thesis consists of a series of publications that demonstrate how software design for analysis and visualisation has tracked these architectural changes and pushed the boundaries of HPC visualisation using dataflow techniques in distributed environments. The first publication shows how support for the time dimension in parallel pipelines can be implemented, demonstrating how information flow within an application can be leveraged to optimise performance and add features such as analysis of time-dependent flows and comparison of datasets at different timesteps. A method of integrating dataflow pipelines with in-situ visualisation is subsequently presented, using asynchronous coupling of user driven GUI controls and a live simulation running on a supercomputer. The loose coupling of analysis and simulation allows for reduced IO, immediate feedback and the ability to change simulation parameters on the fly. A significant drawback of parallel pipelines is the inefficiency caused by improper load-balancing, particularly during interactive analysis where the user may select between different features of interest, this problem is addressed in the fourth publication by integrating a high performance partitioning library into the visualization pipeline and extending the information flow up and down the pipeline to support it. This extension is demonstrated in the third publication (published earlier) on massive meshes with extremely high complexity and shows that general purpose visualization tools such as ParaView can be made to compete with bespoke software written for a dedicated task. The future of software running on many-core architectures will involve task-based runtimes, with dynamic load-balancing, asynchronous execution based on dataflow graphs, work stealing and concurrent data sharing between simulation and analysis. The final paper of this thesis presents an optimisation for one such runtime, in support of these future HPC applications

    A grid-enabled problem solving environment for parallel computational engineering design

    Get PDF
    This paper describes the development and application of a piece of engineering software that provides a problem solving environment (PSE) capable of launching, and interfacing with, computational jobs executing on remote resources on a computational grid. In particular it is demonstrated how a complex, serial, engineering optimisation code may be efficiently parallelised, grid-enabled and embedded within a PSE. The environment is highly flexible, allowing remote users from different sites to collaborate, and permitting computational tasks to be executed in parallel across multiple grid resources, each of which may be a parallel architecture. A full working prototype has been built and successfully applied to a computationally demanding engineering optimisation problem. This particular problem stems from elastohydrodynamic lubrication and involves optimising the computational model for a lubricant based on the match between simulation results and experimentally observed data

    Visualisation Studio for the analysis of massive datasets

    Get PDF
    This thesis describes the research underpinning and the development of a cross platform application for the analysis of simultaneously recorded multi-dimensional spike trains. These spike trains are believed to carry the neural code that encodes information in a biological brain. A number of statistical methods already exist to analyse the temporal relationships between the spike trains. Historically, hundreds of spike trains have been simultaneously recorded, however as a result of technological advances recording capability has increased. The analysis of thousands of simultaneously recorded spike trains is now a requirement. Effective analysis of large data sets requires software tools that fully exploit the capabilities of modern research computers and effectively manage and present large quantities of data. To be effective such software tools must; be targeted at the field under study, be engineered to exploit the full compute power of research computers and prevent information overload of the researcher despite presenting a large and complex data set. The Visualisation Studio application produced in this thesis brings together the fields of neuroscience, software engineering and information visualisation to produce a software tool that meets these criteria. A visual programming language for neuroscience is produced that allows for extensive pre-processing of spike train data prior to visualisation. The computational challenges of analysing thousands of spike trains are addressed using parallel processing to fully exploit the modern researcher’s computer hardware. In the case of the computationally intensive pairwise cross-correlation analysis the option to use a high performance compute cluster (HPC) is seamlessly provided. Finally the principles of information visualisation are applied to key visualisations in neuroscience so that the researcher can effectively manage and visually explore the resulting data sets. The final visualisations can typically represent data sets 10 times larger than previously while remaining highly interactiv

    An interactive visualisation tool applied to the simulation of glass pressing

    Get PDF

    PRIMAGE project : predictive in silico multiscale analytics to support childhood cancer personalised evaluation empowered by imaging biomarkers

    Get PDF
    PRIMAGE is one of the largest and more ambitious research projects dealing with medical imaging, artificial intelligence and cancer treatment in children. It is a 4-year European Commission-financed project that has 16 European partners in the consortium, including the European Society for Paediatric Oncology, two imaging biobanks, and three prominent European paediatric oncology units. The project is constructed as an observational in silico study involving high-quality anonymised datasets (imaging, clinical, molecular, and genetics) for the training and validation of machine learning and multiscale algorithms. The open cloud-based platform will offer precise clinical assistance for phenotyping (diagnosis), treatment allocation (prediction), and patient endpoints (prognosis), based on the use of imaging biomarkers, tumour growth simulation, advanced visualisation of confidence scores, and machine-learning approaches. The decision support prototype will be constructed and validated on two paediatric cancers: neuroblastoma and diffuse intrinsic pontine glioma. External validation will be performed on data recruited from independent collaborative centres. Final results will be available for the scientific community at the end of the project, and ready for translation to other malignant solid tumours

    Methodology for complex dataflow application development

    Get PDF
    This thesis addresses problems inherent to the development of complex applications for reconfig- urable systems. Many projects fail to complete or take much longer than originally estimated by relying on traditional iterative software development processes typically used with conventional computers. Even though designer productivity can be increased by abstract programming and execution models, e.g., dataflow, development methodologies considering the specific properties of reconfigurable systems do not exist. The first contribution of this thesis is a design methodology to facilitate systematic develop- ment of complex applications using reconfigurable hardware in the context of High-Performance Computing (HPC). The proposed methodology is built upon a careful analysis of the original application, a software model of the intended hardware system, an analytical prediction of performance and on-chip area usage, and an iterative architectural refinement to resolve identi- fied bottlenecks before writing a single line of code targeting the reconfigurable hardware. It is successfully validated using two real applications and both achieve state-of-the-art performance. The second contribution extends this methodology to provide portability between devices in two steps. First, additional tool support for contemporary multi-die Field-Programmable Gate Arrays (FPGAs) is developed. An algorithm to automatically map logical memories to hetero- geneous physical memories with special attention to die boundaries is proposed. As a result, only the proposed algorithm managed to successfully place and route all designs used in the evaluation while the second-best algorithm failed on one third of all large applications. Second, best practices for performance portability between different FPGA devices are collected and evaluated on a financial use case, showing efficient resource usage on five different platforms. The third contribution applies the extended methodology to a real, highly demanding emerging application from the radiotherapy domain. A Monte-Carlo based simulation of dose accumu- lation in human tissue is accelerated using the proposed methodology to meet the real time requirements of adaptive radiotherapy.Open Acces

    Design considerations for workflow management systems use in production genomics research and the clinic

    Get PDF
    Abstract The changing landscape of genomics research and clinical practice has created a need for computational pipelines capable of efficiently orchestrating complex analysis stages while handling large volumes of data across heterogeneous computational environments. Workflow Management Systems (WfMSs) are the software components employed to fill this gap. This work provides an approach and systematic evaluation of key features of popular bioinformatics WfMSs in use today: Nextflow, CWL, and WDL and some of their executors, along with Swift/T, a workflow manager commonly used in high-scale physics applications. We employed two use cases: a variant-calling genomic pipeline and a scalability-testing framework, where both were run locally, on an HPC cluster, and in the cloud. This allowed for evaluation of those four WfMSs in terms of language expressiveness, modularity, scalability, robustness, reproducibility, interoperability, ease of development, along with adoption and usage in research labs and healthcare settings. This article is trying to answer, which WfMS should be chosen for a given bioinformatics application regardless of analysis type?. The choice of a given WfMS is a function of both its intrinsic language and engine features. Within bioinformatics, where analysts are a mix of dry and wet lab scientists, the choice is also governed by collaborations and adoption within large consortia and technical support provided by the WfMS team/community. As the community and its needs continue to evolve along with computational infrastructure, WfMSs will also evolve, especially those with permissive licenses that allow commercial use. In much the same way as the dataflow paradigm and containerization are now well understood to be very useful in bioinformatics applications, we will continue to see innovations of tools and utilities for other purposes, like big data technologies, interoperability, and provenance
    corecore