27 research outputs found

    A comparative study between the cubic spline and b-spline interpolation methods in free energy calculations

    Get PDF
    Numerical methods are essential in computational science, as analytic calculations for large datasets are impractical. Using numerical methods, one can approximate the problem to solve it with basic arithmetic operations. Interpolation is a commonly-used method, inter alia, constructing the value of new data points within an interval of known data points. Furthermore, polynomial interpolation with a sufficiently high degree can make the data set differentiable. One consequence of using high-degree polynomials is the oscillatory behaviour towards the endpoints, also known as Runge's Phenomenon. Spline interpolation overcomes this obstacle by connecting the data points in a piecewise fashion. However, its complex formulation requires nested iterations in higher dimensions, which is time-consuming. In addition, the calculations have to be repeated for computing each partial derivative at the data point, leading to further slowdown. The B-spline interpolation is an alternative representation of the cubic spline method, where a spline interpolation at a point could be expressed as the linear combination of piecewise basis functions. It was proposed that implementing this new formulation can accelerate many scientific computing operations involving interpolation. Nevertheless, there is a lack of detailed comparison to back up this hypothesis, especially when it comes to computing the partial derivatives. Among many scientific research fields, free energy calculations particularly stand out for their use of interpolation methods. Numerical interpolation was implemented in free energy methods for many purposes, from calculating intermediate energy states to deriving forces from free energy surfaces. The results of these calculations can provide insight into reaction mechanisms and their thermodynamic properties. The free energy methods include biased flat histogram methods, which are especially promising due to their ability to accurately construct free energy profiles at the rarely-visited regions of reaction spaces. Free Energies from Adaptive Reaction Coordinates (FEARCF) that was developed by Professor Kevin J. Naidoo has many advantages over the other flat histogram methods. iii Because of its treatment of the atoms in reactions, FEARCF makes it easier to apply interpolation methods. It implements cubic spline interpolation to derive biasing forces from the free energy surface, driving the reaction towards regions with higher energy. A major drawback of the method is the slowdown experienced in higher dimensions due to the complicated nature of the cubic spline routine. If the routine is replaced by a more straightforward B-spline interpolation, sampling and generating free energy surfaces can be accelerated. The dissertation aims to perform a comparative study between the cubic spline interpolation and B-spline interpolation methods. At first, data sets of analytic functions were used instead of numerical data to compare the accuracy and compute the percentage errors of both methods by taking the functions themselves as reference. These functions were used to evaluate the performances of the two methods at the endpoints, inflections points and regions with a steep gradient. Both interpolation methods generated identically approximated values with a percentage error below the threshold of 1%, although they both performed poorly at the endpoints and the points of inflection. Increasing the number of interpolation knots reduced the errors, however, it caused overfitting in the other regions. Although significant speed-up was not observed in the univariate interpolation, cubic spline suffered from a drastic slowdown in higher dimensions with up to 103 in 3D and 105 in 4D interpolations. The same results applied to the classical molecular dynamics simulations with FEARCF with a speed-up of up to 103 when B-spline interpolation was implemented. To conclude, the B-spline interpolation method can enhance the efficiency of the free energy calculations where cubic spline interpolation has been the currently-used method

    System on fabrics utilising distributed computing

    Get PDF
    The main vision of wearable computing is to make electronic systems an important part of everyday clothing in the future which will serve as intelligent personal assistants. Wearable devices have the potential to be wearable computers and not mere input/output devices for the human body. The present thesis focuses on introducing a new wearable computing paradigm, where the processing elements are closely coupled with the sensors that are distributed using Instruction Systolic Array (ISA) architecture. The thesis describes a novel, multiple sensor, multiple processor system architecture prototype based on the Instruction Systolic Array paradigm for distributed computing on fabrics. The thesis introduces new programming model to implement the distributed computer on fabrics. The implementation of the concept has been validated using parallel algorithms. A real-time shape sensing and reconstruction application has been implemented on this architecture and has demonstrated a physical design for a wearable system based on the ISA concept constructed from off-the-shelf microcontrollers and sensors. Results demonstrate that the real time application executes on the prototype ISA implementation thus confirming the viability of the proposed architecture for fabric-resident computing devices

    High performance graph analysis on parallel architectures

    Get PDF
    PhD ThesisOver the last decade pharmacology has been developing computational methods to enhance drug development and testing. A computational method called network pharmacology uses graph analysis tools to determine protein target sets that can lead on better targeted drugs for diseases as Cancer. One promising area of network-based pharmacology is the detection of protein groups that can produce better e ects if they are targeted together by drugs. However, the e cient prediction of such protein combinations is still a bottleneck in the area of computational biology. The computational burden of the algorithms used by such protein prediction strategies to characterise the importance of such proteins consists an additional challenge for the eld of network pharmacology. Such computationally expensive graph algorithms as the all pairs shortest path (APSP) computation can a ect the overall drug discovery process as needed network analysis results cannot be given on time. An ideal solution for these highly intensive computations could be the use of super-computing. However, graph algorithms have datadriven computation dictated by the structure of the graph and this can lead to low compute capacity utilisation with execution times dominated by memory latency. Therefore, this thesis seeks optimised solutions for the real-world graph problems of critical node detection and e ectiveness characterisation emerged from the collaboration with a pioneer company in the eld of network pharmacology as part of a Knowledge Transfer Partnership (KTP) / Secondment (KTS). In particular, we examine how genetic algorithms could bene t the prediction of protein complexes where their removal could produce a more e ective 'druggable' impact. Furthermore, we investigate how the problem of all pairs shortest path (APSP) computation can be bene ted by the use of emerging parallel hardware architectures as GPU- and FPGA- desktop-based accelerators. In particular, we address the problem of critical node detection with the development of a heuristic search method. It is based on a genetic algorithm that computes optimised node combinations where their removal causes greater impact than common impact analysis strategies. Furthermore, we design a general pattern for parallel network analysis on multi-core architectures that considers graph's embedded properties. It is a divide and conquer approach that decomposes a graph into smaller subgraphs based on its strongly connected components and computes the all pairs shortest paths concurrently on GPU. Furthermore, we use linear algebra to design an APSP approach based on the BFS algorithm. We use algebraic expressions to transform the problem of path computation to multiple independent matrix-vector multiplications that are executed concurrently on FPGA. Finally, we analyse how the optimised solutions of perturbation analysis and parallel graph processing provided in this thesis will impact the drug discovery process.This research was part of a Knowledge Transfer Partnership (KTP) and Knowledge Transfer Secondment (KTS) between e-therapeutics PLC and Newcastle University. It was supported as a collaborative project by e-therapeutics PLC and Technology Strategy boar

    The application of parallel computer technology to the dynamic analysis of suspension bridges

    Get PDF
    This research is concerned with the application of distributed computer technology to the solution of non-linear structural dynamic problems, in particular the onset of aerodynamic instabilities in long span suspension bridge structures, such as flutter which is a catastrophic aeroelastic phenomena. The thesis is set out in two distinct parts:- Part I, presents the theoretical background of the main forms of aerodynamic instabilities, presenting in detail the main solution techniques used to solve the flutter problem. The previously written analysis package ANSUSP is presented which has been specifically developed to predict numerically the onset of flutter instability. The various solution techniques which were employed to predict the onset of flutter for the Severn Bridge are discussed. All the results presented in Part I were obtained using a 486DX2 66MHz serial personal computer. Part II, examines the main solution techniques in detail and goes on to apply them to a large distributed supercomputer, which allows the solution of the problem to be achieved considerably faster than is possible using the serial computer system. The solutions presented in Part II are represented as Performance Indices (PI) which quote the ratio of time to performing a specific calculation using a serial algorithm compared to a parallel algorithm running on the same computer system

    Interactive message debugger for parallel message passing programs using Lam-Mpi

    Full text link
    Many complex and computation intensive problems can be solved efficiently using parallel programs on a network of processors. One of the most widely used software platforms for such cluster computing is LAM-MPI. To aid develop robust parallel programs using LAM-MPI we need efficient debugging tools. The challenges in debugging parallel programs are unique and different from those of sequential programs. Unfortunately available parallel debuggers do not address these challenges adequately; This thesis introduces IDLI, a parallel message debugger for LAM-MPI, designed on the concepts of multi-level debugging. IDLI provides a new paradigm for distributed debugging while avoiding many of the pitfalls of present tools of its genre. Through its powerful yet customizable query mechanism, adequate data abstraction, granularity, user-friendly interface, and a fast novel technique to simultaneously replay and sequentially debug one or more processes from a distributed application, IDLI provides an effective environment for debugging parallel LAM-MPI programs

    Doctor of Philosophy

    Get PDF
    dissertationIn the static analysis of functional programs, control- ow analysis (k-CFA) is a classic method of approximating program behavior as a infinite state automata. CFA2 and abstract garbage collection are two recent, yet orthogonal improvements, on k-CFA. CFA2 approximates program behavior as a pushdown system, using summarization for the stack. CFA2 can accurately approximate arbitrarily-deep recursive function calls, whereas k-CFA cannot. Abstract garbage collection removes unreachable values from the store/heap. If unreachable values are not removed from a static analysis, they can become reachable again, which pollutes the final analysis and makes it less precise. Unfortunately, as these two techniques were originally formulated, they are incompatible. CFA2's summarization technique for managing the stack obscures the stack such that abstract garbage collection is unable to examine the stack for reachable values. This dissertation presents introspective pushdown control-flow analysis, which manages the stack explicitly through stack changes (pushes and pops). Because this analysis is able to examine the stack by how it has changed, abstract garbage collection is able to examine the stack for reachable values. Thus, introspective pushdown control-flow analysis merges successfully the benefits of CFA2 and abstract garbage collection to create a more precise static analysis. Additionally, the high-performance computing community has viewed functional programming techniques and tools as lacking the efficiency necessary for their applications. Nebo is a declarative domain-specific language embedded in C++ for discretizing partial differential equations for transport phenomena. For efficient execution, Nebo exploits a version of expression templates, based on the C++ template system, which is a type-less, completely-pure, Turing-complete functional language with burdensome syntax. Nebo's declarative syntax supports functional tools, such as point-wise lifting of complex expressions and functional composition of stencil operators. Nebo's primary abstraction is mathematical assignment, which separates what a calculation does from how that calculation is executed. Currently Nebo supports single-core execution, multicore (thread-based) parallel execution, and GPU execution. With single-core execution, Nebo performs on par with the loops and code that it replaces in Wasatch, a pre-existing high-performance simulation project. With multicore (thread-based) execution, Nebo can linearly scale (with roughly 90% efficiency) up to 6 processors, compared to its single-core execution. Moreover, Nebo's GPU execution can be up to 37x faster than its single-core execution. Finally, Wasatch (the pre-existing high-performance simulation project which uses Nebo) can scale up to 262K cores

    Організація рішення диференціальних рівнянь в багатоядерній комп’ютерній системі

    Get PDF
    В даній бакалаврській роботі проводиться дослідження способів організації рішення диференціальних рівнянь в багатоядерній комп’ютерній системі. Як практична сторона проекту реалізована програма, яка призначена для рішення диференціальних рівнянь в багатоядерній комп’ютерній системі. Програма дозволяє проаналізувати та оцінити результати роботи комп’ютерної системи, що паралельно виконує математичні розрахунки, з метою визначення ефективності та доцільності використання цього методу для рішення диференціальних рівнянь. Програмний продукт був створений на мові C++ з використанням базових функцій інтерфейсів програмування Win32 (Windows API) у візуальному середовищі Microsoft Visual Studio 2019.This bachelor's work investigates ways to organize the solution of differential equations in a multi-core computer system. As a practical aspect of the project, a program has been implemented to solve differential equations in a multi-core computer system. The program allows you to analyze and evaluate the results of a computer system work that performs mathematical calculations in parallel, in order to determine the effectiveness and feasibility of using this method to solve differential equations. The software product was created in C++ with the use of basic functions of the programming interface Win32 (Windows API) in the Microsoft Visual Studio 2019 visual environment

    Implementación de la iteración lanczos en arquitectura CUDA

    Get PDF
    Los autovalores y autovectores son elementos muy utilizados en diversos problemas como análisis de estructuras, reconocimiento de imágenes, compresión de datos, solución de problemas electrodinámicos, entre otros. Existen muchos algoritmos para calcular y tratar con autovalores y autovectores mediante el uso de computadoras, sin embargo, cuando solo se requiere uno o unos pocos autovalores (los más significativos) y autovectores, se puede optar por Power Method o la Iteración Lanczos. Por otro lado, factores como la cantidad de información a procesar o la precisión deseada pueden significar tiempos de ejecución no aceptables para ciertas aplicaciones, surgiendo la alternativa de realizar implementaciones paralelas, siendo la arquitectura CUDA una de la mejores opciones actualmente. En la presente tesis se propone diseñar e implementar un algoritmo paralelo para la iteración Lancos en arquitectura CUDA, el cual es un método para el cálculo del mayor autovalor y su correspondiente autovector. La propuesta esta dividia en tres bloques principales. El primer bloque realiza la tridiagonalización parcial de una matriz cuadrada simétrica. El segundo bloque calcula la descomposición de Schur de la matriz tridiagonal obteniendo los autovectores y autovalores de esta. El tercer bloque calcula el mayor autovalor y su correspondiente autovector de la matriz inicial a partir de lo obtenido en etapas anteriores y determinará si es necesario seguir realizando cálculos. Los bloques trabajan iterativamente hasta encontrar resultados que se ajusten a la precisión deseada. Además de la implementación paralela en CUDA, se realizaron implementaciones en el entorno de simulación MATLAB y en lenguaje C secuencial, con el propósito de comparar y verificar una correcta y eficiente implementación paralela. Los resultados computacionales evaluados para una matriz de 4000 _ 4000 elementos reflejan un rendimiento de 13;4 y 5;8 al compararse la implementación en CUDA con MATLAB y C secuencial respectivamente. Estos rendimientos tienden a crecer mientras mayor sea el tamaño de la matriz. La organización de la tesis es: en el primer capítulo se describe la problemática del tema. En el segundo capítulo se explica la teoría correspondiente a Power Method y Lanczos, así como los algoritmos necesarios. En el capítulo tres se exponen conceptos fundamentales sobre arquitectura CUDA. El diseño del algoritmo paralelo se desarrolla en el capítulo cuatro. Finalmente, en el capítulo cinco, se muestran y analizan los resultados computacionales, seguidos de las conclusiones, recomendaciones y bibliografía.Tesi
    corecore