96 research outputs found

    Identifying and Harnessing Concurrency for Parallel and Distributed Network Simulation

    Get PDF
    Although computer networks are inherently parallel systems, the parallel execution of network simulations on interconnected processors frequently yields only limited benefits. In this thesis, methods are proposed to estimate and understand the parallelization potential of network simulations. Further, mechanisms and architectures for exploiting the massively parallel processing resources of modern graphics cards to accelerate network simulations are proposed and evaluated

    Identifying and Harnessing Concurrency for Parallel and Distributed Network Simulation

    Get PDF
    Although computer networks are inherently parallel systems, the parallel execution of network simulations on interconnected processors frequently yields only limited benefits. In this thesis, methods are proposed to estimate and understand the parallelization potential of network simulations. Further, mechanisms and architectures for exploiting the massively parallel processing resources of modern graphics cards to accelerate network simulations are proposed and evaluated

    ExaFlexHH: an exascale-ready, flexible multi-FPGA library for biologically plausible brain simulations

    Get PDF
    IntroductionIn-silico simulations are a powerful tool in modern neuroscience for enhancing our understanding of complex brain systems at various physiological levels. To model biologically realistic and detailed systems, an ideal simulation platform must possess: (1) high performance and performance scalability, (2) flexibility, and (3) ease of use for non-technical users. However, most existing platforms and libraries do not meet all three criteria, particularly for complex models such as the Hodgkin-Huxley (HH) model or for complex neuron-connectivity modeling such as gap junctions.MethodsThis work introduces ExaFlexHH, an exascale-ready, flexible library for simulating HH models on multi-FPGA platforms. Utilizing FPGA-based Data-Flow Engines (DFEs) and the dataflow programming paradigm, ExaFlexHH addresses all three requirements. The library is also parameterizable and compliant with NeuroML, a prominent brain-description language in computational neuroscience. We demonstrate the performance scalability of the platform by implementing a highly demanding extended-Hodgkin-Huxley (eHH) model of the Inferior Olive using ExaFlexHH.ResultsModel simulation results show linear scalability for unconnected networks and near-linear scalability for networks with complex synaptic plasticity, with a 1.99 × performance increase using two FPGAs compared to a single FPGA simulation, and 7.96 × when using eight FPGAs in a scalable ring topology. Notably, our results also reveal consistent performance efficiency in GFLOPS per watt, further facilitating exascale-ready computing speeds and pushing the boundaries of future brain-simulation platforms.DiscussionThe ExaFlexHH library shows superior resource efficiency, quantified in FLOPS per hardware resources, benchmarked against other competitive FPGA-based brain simulation implementations

    FPGAs in Bioinformatics: Implementation and Evaluation of Common Bioinformatics Algorithms in Reconfigurable Logic

    Get PDF
    Life. Much effort is taken to grant humanity a little insight in this fascinating and complex but fundamental topic. In order to understand the relations and to derive consequences humans have begun to sequence their genomes, i.e. to determine their DNA sequences to infer information, e.g. related to genetic diseases. The process of DNA sequencing as well as subsequent analysis presents a computational challenge for recent computing systems due to the large amounts of data alone. Runtimes of more than one day for analysis of simple datasets are common, even if the process is already run on a CPU cluster. This thesis shows how this general problem in the area of bioinformatics can be tackled with reconfigurable hardware, especially FPGAs. Three compute intensive problems are highlighted: sequence alignment, SNP interaction analysis and genotype imputation. In the area of sequence alignment the software BLASTp for protein database searches is exemplarily presented, implemented and evaluated.SNP interaction analysis is presented with three applications performing an exhaustive search for interactions including the corresponding statistical tests: BOOST, iLOCi and the mutual information measurement. All applications are implemented in FPGA-hardware and evaluated, resulting in an impressive speedup of more than in three orders of magnitude when compared to standard computers. The last topic of genotype imputation presents a two-step process composed of the phasing step and the actual imputation step. The focus lies on the phasing step which is targeted by the SHAPEIT2 application. SHAPEIT2 is discussed with its underlying mathematical methods in detail, and finally implemented and evaluated. A remarkable speedup of 46 is reached here as well

    Doctor of Philosophy

    Get PDF
    dissertationStochastic methods, dense free-form mapping, atlas construction, and total variation are examples of advanced image processing techniques which are robust but computationally demanding. These algorithms often require a large amount of computational power as well as massive memory bandwidth. These requirements used to be ful lled only by supercomputers. The development of heterogeneous parallel subsystems and computation-specialized devices such as Graphic Processing Units (GPUs) has brought the requisite power to commodity hardware, opening up opportunities for scientists to experiment and evaluate the in uence of these techniques on their research and practical applications. However, harnessing the processing power from modern hardware is challenging. The di fferences between multicore parallel processing systems and conventional models are signi ficant, often requiring algorithms and data structures to be redesigned signi ficantly for efficiency. It also demands in-depth knowledge about modern hardware architectures to optimize these implementations, sometimes on a per-architecture basis. The goal of this dissertation is to introduce a solution for this problem based on a 3D image processing framework, using high performance APIs at the core level to utilize parallel processing power of the GPUs. The design of the framework facilitates an efficient application development process, which does not require scientists to have extensive knowledge about GPU systems, and encourages them to harness this power to solve their computationally challenging problems. To present the development of this framework, four main problems are described, and the solutions are discussed and evaluated: (1) essential components of a general 3D image processing library: data structures and algorithms, as well as how to implement these building blocks on the GPU architecture for optimal performance; (2) an implementation of unbiased atlas construction algorithms|an illustration of how to solve a highly complex and computationally expensive algorithm using this framework; (3) an extension of the framework to account for geometry descriptors to solve registration challenges with large scale shape changes and high intensity-contrast di fferences; and (4) an out-of-core streaming model, which enables developers to implement multi-image processing techniques on commodity hardware

    Simulating Nonlinear Neutrino Oscillations on Next-Generation Many-Core Architectures

    Get PDF
    In this work an astrophysical simulation code, XFLAT, is developed to study neutrino oscillations in supernovae. XFLAT is a hybrid modular code which was designed to utilize multiple levels of parallelism through MPI, OpenMP, and SIMD instructions (vectorization). It can run on both the CPU and the Xeon Phi co-processor, the latter of which is based on the Intel Many Integrated Core Architecture (MIC). The performance of XFLAT on various system configurations and physics scenarios has been analyzed. In addition, the impact of I/O and the multi-node configuration on the Xeon Phi-equipped heterogeneous supercomputers such as Stampede at the Texas Advanced Computing Center (TACC) was investigated

    Heterogeneous multicore systems for signal processing

    Get PDF
    This thesis explores the capabilities of heterogeneous multi-core systems, based on multiple Graphics Processing Units (GPUs) in a standard desktop framework. Multi-GPU accelerated desk side computers are an appealing alternative to other high performance computing (HPC) systems: being composed of commodity hardware components fabricated in large quantities, their price-performance ratio is unparalleled in the world of high performance computing. Essentially bringing “supercomputing to the masses”, this opens up new possibilities for application fields where investing in HPC resources had been considered unfeasible before. One of these is the field of bioelectrical imaging, a class of medical imaging technologies that occupy a low-cost niche next to million-dollar systems like functional Magnetic Resonance Imaging (fMRI). In the scope of this work, several computational challenges encountered in bioelectrical imaging are tackled with this new kind of computing resource, striving to help these methods approach their true potential. Specifically, the following main contributions were made: Firstly, a novel dual-GPU implementation of parallel triangular matrix inversion (TMI) is presented, addressing an crucial kernel in computation of multi-mesh head models of encephalographic (EEG) source localization. This includes not only a highly efficient implementation of the routine itself achieving excellent speedups versus an optimized CPU implementation, but also a novel GPU-friendly compressed storage scheme for triangular matrices. Secondly, a scalable multi-GPU solver for non-hermitian linear systems was implemented. It is integrated into a simulation environment for electrical impedance tomography (EIT) that requires frequent solution of complex systems with millions of unknowns, a task that this solution can perform within seconds. In terms of computational throughput, it outperforms not only an highly optimized multi-CPU reference, but related GPU-based work as well. Finally, a GPU-accelerated graphical EEG real-time source localization software was implemented. Thanks to acceleration, it can meet real-time requirements in unpreceeded anatomical detail running more complex localization algorithms. Additionally, a novel implementation to extract anatomical priors from static Magnetic Resonance (MR) scansions has been included

    Dynamically and partially reconfigurable hardware architectures for high performance microarray bioinformatics data analysis

    Get PDF
    The field of Bioinformatics and Computational Biology (BCB) is a multidisciplinary field that has emerged due to the computational demands of current state-of-the-art biotechnology. BCB deals with the storage, organization, retrieval, and analysis of biological datasets, which have grown in size and complexity in recent years especially after the completion of the human genome project. The advent of Microarray technology in the 1990s has resulted in the new concept of high throughput experiment, which is a biotechnology that measures the gene expression profiles of thousands of genes simultaneously. As such, Microarray requires high computational power to extract the biological relevance from its high dimensional data. Current general purpose processors (GPPs) has been unable to keep-up with the increasing computational demands of Microarrays and reached a limit in terms of clock speed. Consequently, Field Programmable Gate Arrays (FPGAs) have been proposed as a low power viable solution to overcome the computational limitations of GPPs and other methods. The research presented in this thesis harnesses current state-of-the-art FPGAs and tools to accelerate some of the most widely used data mining methods used for the analysis of Microarray data in an effort to investigate the viability of the technology as an efficient, low power, and economic solution for the analysis of Microarray data. Three widely used methods have been selected for the FPGA implementations: one is the un-supervised Kmeans clustering algorithm, while the other two are supervised classification methods, namely, the K-Nearest Neighbour (K-NN) and Support Vector Machines (SVM). These methods are thought to benefit from parallel implementation. This thesis presents detailed designs and implementations of these three BCB applications on FPGA captured in Verilog HDL, whose performance are compared with equivalent implementations running on GPPs. In addition to acceleration, the benefits of current dynamic partial reconfiguration (DPR) capability of modern Xilinx’ FPGAs are investigated with reference to the aforementioned data mining methods. Implementing K-means clustering on FPGA using non-DPR design flow has outperformed equivalent implementations in GPP and GPU in terms of speed-up by two orders and one order of magnitude, respectively; while being eight times more power efficient than GPP and four times more than a GPU implementation. As for the energy efficiency, the FPGA implementation was 615 times more energy efficient than GPPs, and 31 times more than GPUs. Over and above, the FPGA implementation outperformed the GPP and GPU implementations in terms of speed-up as the dimensionality of the Microarray data increases. Additionally, the DPR implementations of the K-means clustering have shown speed-up in partial reconfiguration time of ~5x and 17x over full chip reconfiguration for single-core and eight-core implementations, respectively. Two architectures of the K-NN classifier have been implemented on FPGA, namely, A1 and A2. The K-NN implementation based on A1 architecture achieved a speed-up of ~76x over an equivalent GPP implementation whereas the A2 architecture achieved ~68x speedup. Furthermore, the FPGA implementation outperformed the equivalent GPP implementation when the dimensionality of data was increased. In addition, The DPR implementations of the K-NN classifier have achieved speed-ups in reconfiguration time between ~4x to 10x over full chip reconfiguration when reconfiguring portion of the classifier or the complete classifier. Similar to K-NN, two architectures of the SVM classifier were implemented on FPGA whereby the former outperformed an equivalent GPP implementation by ~61x and the latter by ~49x. As for the DPR implementation of the SVM classifier, it has shown a speed-up of ~8x in reconfiguration time when reconfiguring the complete core or when exchanging it with a K-NN core forming a multi-classifier. The aforementioned implementations clearly show FPGAs to be an efficacious, efficient and economic solution for bioinformatics Microarrays data analysis

    Heterogeneity, High Performance Computing, Self-Organization and the Cloud

    Get PDF
    application; blueprints; self-management; self-organisation; resource management; supply chain; big data; PaaS; Saas; HPCaa

    Heterogeneity, High Performance Computing, Self-Organization and the Cloud

    Get PDF
    application; blueprints; self-management; self-organisation; resource management; supply chain; big data; PaaS; Saas; HPCaa
    corecore