217 research outputs found

    ACCELERATION OF SPIKING NEURAL NETWORKS ON SINGLE-GPU AND MULTI-GPU SYSTEMS

    Get PDF
    There has been a strong interest in modeling a mammalian brain in order to study the architectural and functional principles of the brain and offer tools to neuroscientists and medical researchers for related studies. Artificial Neural Networks (ANNs) are compute models that try to simulate the structure and/or the functional behavior of neurons and process information using the connectionist approach to computation. Hence, the ANNs are the viable options for such studies. Of many classes of ANNs, Spiking Neuron Network models (SNNs) have been employed to simulate mammalian brain, capturing its functionality and inference capabilities. In this class of neuron models, some of the biologically accurate models are the Hodgkin Huxley (HH) model, Morris Lecar (ML) model, Wilson model, and the Izhikevich model. The HH model is the oldest, most biologically accurate and the most compute intensive of the listed models. The Izhikevich model, a more recent development, is sufficiently accurate and involves the least computations. Accurate modeling of the neurons calls for compute intensive models and hence single core processors are not suitable for large scale SNN simulations due to their serial computation and low memory bandwidth. Graphical Processing Units have been used for general purpose computing as they offer raw computing power, with a majority of logic solely dedicated for computing purpose. The work presented in this thesis implements two-level character recognition networks using the four previously mentioned SNN models in Nvidia\u27s Tesla C870 card and investigates performance improvements over the equivalent software implementation on a 2.66 GHz Intel Core 2 Quad. The work probes some of the important parameters such as the kernel time, memory transfer time and flops offered by the GPU device for the implementations. In this work, we report speed-ups as high as 576x on a single GPU device for the most compute-intensive, highly biologically realistic Hodgkin Huxley model. These results demonstrate the potential of GPUs for large-scale, accurate modeling of the mammalian brain. The research in this thesis also presents several optimization techniques and strategies, and discusses the major bottlenecks that must be avoided in order to achieve maximum performance benefits for applications involving complex computations. The research also investigates an initial multi-GPU implementation to study the problem partitioning for simulating biological-scale neuron networks on a cluster of GPU devices

    EXPLORING MULTIPLE LEVELS OF PERFORMANCE MODELING FOR HETEROGENEOUS SYSTEMS

    Get PDF
    The current trend in High-Performance Computing (HPC) is to extract concurrency from clusters that include heterogeneous resources such as General Purpose Graphical Processing Units (GPGPUs) and Field Programmable Gate Array (FPGAs). Although these heterogeneous systems can provide substantial performance for massively parallel applications, much of the available computing resources are often under-utilized due to inefficient application mapping, load balancing, and tuning. While several performance prediction models exist to efficiently tune applications, they often require significant computing architecture knowledge for reliable prediction. In addition, they do not address multiple levels of design space abstraction and it is often difficult to choose a reliable prediction model for a given design. In this research, we develop a multi-level suite of performance prediction models for heterogeneous systems that primarily targets Synchronous Iterative Algorithms (SIAs). The modeling suite aims to produce accurate and straightforward application runtime prediction prior to the actual large-scale implementation. This suite addresses two levels of system abstraction: 1) low-level where partial knowledge of the application implementation is present along with the system specifications and 2) high-level where the implementation details are minimum and only high-level computing system specifications are given. The performance prediction modeling suite is developed using our proposed Synchronous Iterative GPGPU Execution (SIGE) model for GPGPU clusters, motivated by the RC Amenability Test for Scalable Systems (RATSS) model for FPGA clusters. The low-level abstraction for GPGPU clusters consists of a regression-based performance prediction framework that statistically abstracts system architecture characteristics, enabling performance prediction without detailed architecture knowledge. In this framework, the overall execution time of an application is predicted using regression models developed for host-device computations and network-level communications performed in the algorithm. We have used a family of Spiking Neural Network (SNN) models and an Anisotropic Diffusion Filter (ADF) algorithm as SIA case studies for verification of the regression-based framework and achieved over 90% prediction accuracy compared to the actual implementations for several GPGPU cluster configurations tested. The results establish the adequacy of the low-level abstraction model for advanced, fine-grained performance prediction and design space exploration (DSE). The high-level abstraction consists of the following two primary modeling approaches: qualitative modeling that uses existing subjective-analytical models for computation and communication; and quantitative modeling that predicts computation and communication performance by measuring hardware events associated with objective-analytical models using micro-benchmarks. The performance prediction provided by the high-level abstraction approaches, albeit coarse-grained, delivers useful insight into application performance on the chosen heterogeneous system. A blend of the two high-level modeling approaches, labeled as hybrid modeling, is explored for insightful preliminary performance prediction. The performance prediction models in the multi-level suite are verified and compared for their accuracy and ease-of-use, allowing developers to choose a model that best satisfies their design space abstraction. We also construct a roadmap that guides user from optimal Application-to-Accelerator (A2A) mapping to fine-grained performance prediction, thereby providing a hierarchical approach to optimal application porting on the target heterogeneous system. The end goal of this dissertation research is to offer the HPC community a thorough, non-architecture specific, performance prediction framework in the form of a hierarchical modeling suite that enables them to optimally utilize the heterogeneous resources

    Optimization of an Elastic Network Augmented Coarse Grained Model to Study Ccmv Capsid Deformation

    Get PDF
    <div><p>The major protective coat of most viruses is a highly symmetric protein capsid that forms spontaneously from many copies of identical proteins. Structural and mechanical properties of such capsids, as well as their self-assembly process, have been studied experimentally and theoretically, including modeling efforts by computer simulations on various scales. Atomistic models include specific details of local protein binding but are limited in system size and accessible time, while coarse grained (CG) models do get access to longer time and length scales but often lack the specific local interactions. Multi-scale models aim at bridging this gap by systematically connecting different levels of resolution. Here, a CG model for CCMV (Cowpea Chlorotic Mottle Virus), a virus with an icosahedral shell of 180 identical protein monomers, is developed, where parameters are derived from atomistic simulations of capsid protein dimers in aqueous solution. In particular, a new method is introduced to combine the MARTINI CG model with a supportive elastic network based on structural fluctuations of individual monomers. In the parametrization process, both network connectivity and strength are optimized. This elastic-network optimized CG model, which solely relies on atomistic data of small units (dimers), is able to correctly predict inter-protein conformational flexibility and properties of larger capsid fragments of 20 and more subunits. Furthermore, it is shown that this CG model reproduces experimental (Atomic Force Microscopy) indentation measurements of the entire viral capsid. Thus it is shown that one obvious goal for hierarchical modeling, namely predicting mechanical properties of larger protein complexes from models that are carefully parametrized on elastic properties of smaller units, is achievable.</p></div

    Prediction-based Routing with Packet Scheduling under Temporal Constraint in Delay Tolerant Networks

    Get PDF
    is a challenging problem due to the intermittent connectivity between the nodes. Researchers have proposed many routing protocols that adapt to the temporary connections of DTNs. One classification of routing protocols makes use of historical information to predict future contact patterns for any pair of nodes. However, most existing protocols focus on the probability of a path from the source to the destination without considering the information in a packet which includes the source, destination, size, TTL (Time-To-Live) and limited resources such as available buffer size and bandwidth. In this paper, we propose a new prediction-based routing algorithm that takes into account packet information under the conditions of limited transmission opportunities. The goal of this protocol is to increase the overall delivery ratio through scheduling packets at each node. Meanwhile, this protocol may sacrifice some messages ’ delivery delay time to some extent. Extensive simulation results with real traces show that our protocol with packet scheduling has better performance than the pure probabilistic routing algorithms in term of delivery ratio. Our protocol’s performance advantage is more obvious for nodes with higher packet intensity and shorter TTL in packets. I

    The Effect of Ventricular Assist Devices on Post-Transplant Mortality An Analysis of the United Network for Organ Sharing Thoracic Registry

    Get PDF
    ObjectivesThis study sought to determine the relationship between pre-transplant ventricular assist device (VAD) support and mortality after heart transplantation.BackgroundIncreasingly, VADs are being used to bridge patients to heart transplantation. The effect of these devices on post-transplant mortality is unclear.MethodsPatients 18 years or older who underwent first-time, single-organ heart transplantation in the U.S. between 1995 and 2004 were included in the analyses. This study compared 1,433 patients bridged with intracorporeal and 448 patients bridged with extracorporeal VADs with 9,455 United Network for Organ Sharing status 1 patients not bridged with a VAD with respect to post-transplant mortality. Because the proportional hazards assumption was not met, hazard ratios (HRs) for different time periods were estimated.ResultsIntracorporeal VADs were associated with an HR of 1.20 (95% confidence interval [CI]: 1.02 to 1.43; p = 0.03) for mortality in the first 6 months after transplant and an HR of 1.99 (95% CI: 1.44 to 2.75; p < 0.0001) beyond 5 years. Between 6 months and 5 years, the HRs were not significantly different from 1. Extracorporeal VADs were associated with an HR of 1.91 (95% CI: 1.53 to 2.37; p < 0.0001) for mortality in the first 6 months and an HR of 2.93 (95% CI: 1.19 to 7.25; p = 0.02) beyond 5 years. The HRs were not significantly different from 1 between 6 months and 5 years, except for an HR of 0.23 (95% CI: 0.06 to 0.91; p = 0.04) between 24 and 36 months.ConclusionsExtracorporeal VADs are associated with higher mortality within 6 months and again beyond 5 years after transplantation. Intracorporeal VADs are associated with a small increase in mortality in the first 6 months and a clinically significant increase in mortality beyond 5 years. These data do not provide evidence supporting VAD implantation in stable United Network for Organ Sharing status I patients awaiting heart transplantation

    Formulation and optimization of fenofibrate lipospheres using Taguchi\u27s experimental design

    Get PDF
    Fenofibrate lipospheres were prepared by melt dispersion technique. Critical parameters influencing particle size and entrapment efficiency were optimized by applying L9 Taguchi experimental design. Entrapment efficiency of up to 87 % was obtained for the optimized formulations on increasing the olive oil up to 30 % in the lipid carrier. Particle size analysis by microscopy and SEM revealed narrow particle size distribution and formation of discrete lipospheres of superior morphology. In vitro dissolution data best fitted Higuchi model indicating diffusion controlled release from the porous lipid matrices. Prolonged release was obtained from stearic acid-olive oil lipospheres compared to cetyl alcohol-olive oil lipospheres due to the relatively hydrophobic matrix formed by stearic acid. Lipid lowering studies in Triton induced hyperlipidemia rat model demonstrated higher lipid lowering ability for the fenofibrate lipospheres compared to marketed product and plain drug
    • 

    corecore