5 research outputs found

    ACCELERATION OF SPIKING NEURAL NETWORKS ON SINGLE-GPU AND MULTI-GPU SYSTEMS

    Get PDF
    There has been a strong interest in modeling a mammalian brain in order to study the architectural and functional principles of the brain and offer tools to neuroscientists and medical researchers for related studies. Artificial Neural Networks (ANNs) are compute models that try to simulate the structure and/or the functional behavior of neurons and process information using the connectionist approach to computation. Hence, the ANNs are the viable options for such studies. Of many classes of ANNs, Spiking Neuron Network models (SNNs) have been employed to simulate mammalian brain, capturing its functionality and inference capabilities. In this class of neuron models, some of the biologically accurate models are the Hodgkin Huxley (HH) model, Morris Lecar (ML) model, Wilson model, and the Izhikevich model. The HH model is the oldest, most biologically accurate and the most compute intensive of the listed models. The Izhikevich model, a more recent development, is sufficiently accurate and involves the least computations. Accurate modeling of the neurons calls for compute intensive models and hence single core processors are not suitable for large scale SNN simulations due to their serial computation and low memory bandwidth. Graphical Processing Units have been used for general purpose computing as they offer raw computing power, with a majority of logic solely dedicated for computing purpose. The work presented in this thesis implements two-level character recognition networks using the four previously mentioned SNN models in Nvidia\u27s Tesla C870 card and investigates performance improvements over the equivalent software implementation on a 2.66 GHz Intel Core 2 Quad. The work probes some of the important parameters such as the kernel time, memory transfer time and flops offered by the GPU device for the implementations. In this work, we report speed-ups as high as 576x on a single GPU device for the most compute-intensive, highly biologically realistic Hodgkin Huxley model. These results demonstrate the potential of GPUs for large-scale, accurate modeling of the mammalian brain. The research in this thesis also presents several optimization techniques and strategies, and discusses the major bottlenecks that must be avoided in order to achieve maximum performance benefits for applications involving complex computations. The research also investigates an initial multi-GPU implementation to study the problem partitioning for simulating biological-scale neuron networks on a cluster of GPU devices

    EXPLORING MULTIPLE LEVELS OF PERFORMANCE MODELING FOR HETEROGENEOUS SYSTEMS

    Get PDF
    The current trend in High-Performance Computing (HPC) is to extract concurrency from clusters that include heterogeneous resources such as General Purpose Graphical Processing Units (GPGPUs) and Field Programmable Gate Array (FPGAs). Although these heterogeneous systems can provide substantial performance for massively parallel applications, much of the available computing resources are often under-utilized due to inefficient application mapping, load balancing, and tuning. While several performance prediction models exist to efficiently tune applications, they often require significant computing architecture knowledge for reliable prediction. In addition, they do not address multiple levels of design space abstraction and it is often difficult to choose a reliable prediction model for a given design. In this research, we develop a multi-level suite of performance prediction models for heterogeneous systems that primarily targets Synchronous Iterative Algorithms (SIAs). The modeling suite aims to produce accurate and straightforward application runtime prediction prior to the actual large-scale implementation. This suite addresses two levels of system abstraction: 1) low-level where partial knowledge of the application implementation is present along with the system specifications and 2) high-level where the implementation details are minimum and only high-level computing system specifications are given. The performance prediction modeling suite is developed using our proposed Synchronous Iterative GPGPU Execution (SIGE) model for GPGPU clusters, motivated by the RC Amenability Test for Scalable Systems (RATSS) model for FPGA clusters. The low-level abstraction for GPGPU clusters consists of a regression-based performance prediction framework that statistically abstracts system architecture characteristics, enabling performance prediction without detailed architecture knowledge. In this framework, the overall execution time of an application is predicted using regression models developed for host-device computations and network-level communications performed in the algorithm. We have used a family of Spiking Neural Network (SNN) models and an Anisotropic Diffusion Filter (ADF) algorithm as SIA case studies for verification of the regression-based framework and achieved over 90% prediction accuracy compared to the actual implementations for several GPGPU cluster configurations tested. The results establish the adequacy of the low-level abstraction model for advanced, fine-grained performance prediction and design space exploration (DSE). The high-level abstraction consists of the following two primary modeling approaches: qualitative modeling that uses existing subjective-analytical models for computation and communication; and quantitative modeling that predicts computation and communication performance by measuring hardware events associated with objective-analytical models using micro-benchmarks. The performance prediction provided by the high-level abstraction approaches, albeit coarse-grained, delivers useful insight into application performance on the chosen heterogeneous system. A blend of the two high-level modeling approaches, labeled as hybrid modeling, is explored for insightful preliminary performance prediction. The performance prediction models in the multi-level suite are verified and compared for their accuracy and ease-of-use, allowing developers to choose a model that best satisfies their design space abstraction. We also construct a roadmap that guides user from optimal Application-to-Accelerator (A2A) mapping to fine-grained performance prediction, thereby providing a hierarchical approach to optimal application porting on the target heterogeneous system. The end goal of this dissertation research is to offer the HPC community a thorough, non-architecture specific, performance prediction framework in the form of a hierarchical modeling suite that enables them to optimally utilize the heterogeneous resources

    A multi-level Implementation of Image Amplification on the General Purpose Graphical Processing Unit

    No full text
    With rapid advances in high-impact fields such as medical imaging, dental imaging, navigation, and microbiology, the amount of information stored in images has increased drastically. Although the images generated by these fields are comprehensive, scientists are often interested in the smallest details. In this research, we efficiently parallelize a state-of-art image amplification algorithm that allows users to amplify images to obtain minute details. The proposed algorithm aims at preserving the edges of the image, thereby capturing a rich representation of the image. The algorithm comprises four computationally intensive stages: 1) edge detection via Canny algorithm; 2) Edge preservation in the vertical direction (vertical edge-keeping); 3) Edge preservation in the horizontal direction (horizontal edge-keeping); and 4) interpolation of the remaining pixels via mean-keeping. The computationally intensive nature of these steps makes the algorithm a better match for massively parallel architectures such as GPGPU devices. We construct an effective implementation hierarchy that maps the algorithm stages step-by-step to Nvidia Tesla GPGPU device using the Compute Unified Device Architecture (CUDA) programming model. Our step-by-step exposition of the algorithm stage mapping not only elucidates the various CUDA optimization techniques, but also enables users to relate parallelization strategies with their respective applications

    ChatReview: A ChatGPT-enabled natural language processing framework to study domain-specific user reviews

    No full text
    We present ChatReview, a ChatGPT-enabled natural language processing framework that effectively studies domain-specific user reviews to offer relevant and personalized search results at multiple levels of granularity. The framework accomplishes this task using four phases including data collection, tokenization, query construction, and response generation. The data collection phase involves gathering domain-specific user reviews from public and private repositories. In the tokenization phase, ChatReview applies sentiment analysis to extract keywords and categorize them into various sentiment classes. This process creates a token repository that best describes the user sentiments for a given user-review data. In the query construction phase, the framework uses the token repository and domain knowledge to construct three types of ChatGPT prompts including explicit, implicit, and creative. In the response generation phase, ChatReview pipelines these prompts into ChatGPT to generate search results at varying levels of granularity

    Increasing SARS-Cov2 Cases, Hospitalizations, and Deaths among the Vaccinated Populations during the Omicron (B.1.1.529) Variant Surge in UK

    Get PDF
    Background: There were increased SARS-CoV2 hospitalizations and deaths noted during Omicron (B.1.1.529) variant surge in the UK despite decreased cases, and the reasons are unclear. Methods: In this retrospective observational study, we analyzed reported SARS-CoV2 cases, hospitalizations, deaths, and variables that affect the outcomes (including ethnicity, deprivation score, vaccination disparities, and pre-existing conditions during the COVID-19 pandemic in the UK. The vaccine effectiveness among those ≥ 18 years of age was also analyzed (August 16, 2021-March 27, 2022).Results: During the latter part of the Omicron variant surge (February 28-May 1, 2022 a significantly increased proportion of cases (23.7% vs 40.31.70 [1.70-1.71]; p\u3c0.001) and hospitalizations (39.3% vs 50.3%; RR 1.28 [1.27- 1.30]; p\u3c0.001) among ≥ 50 years of age, and deaths (67.89% vs 80.07%; RR 1.18 [1.16-1.20]; p\u3c0.001) among ≥ 75 years of age was observed compared to the earlier period (December 6, 2021-February 27, 2022). There was a significant decline in case fatality rate (all ages [0.21% vs 0.39%; RR 0.54 (0.52-0.55); p\u3c0.001], ≥ 18 years of age [0.25% vs 0.58%; RR 0.44 (0.43-0.45); p\u3c0.001], and ≥ 50 years of age [0.72% vs 1.57%; RR 0.46 (0.45-0.47); p\u3c0.001]) and the risk of hospitalizations (all ages [0.62% vs 0.99%; RR 0.63 (0.62-0.64); p\u3c0.001], ≥ 18 years of age [0.67% vs 1.38%; RR 0.484 (0.476-0.492); p\u3c0.001], and ≥ 50 years of age [1.45% vs 2.81%; RR 0.52 (0.51-0.53); p\u3c0.001] during the Omicron variant surge (December 27, 2021-March 20, 2022) compared to the Delta variant surge (August 16-December 5, 2021). Both the unvaccinated (0.41% vs 0.77%; RR 0.54 (0.51-0.57); p\u3c0.001) and vaccinated (0.25% vs 0.59%; RR 0.43 (0.42-0.44); p\u3c0.001) populations of ≥ 18 years of age showed a significant decline in the case fatality rate during the Omicron variant surge versus the Delta variant surge. In summary, a significant decline in the risk of hospitalizations was observed among both the unvaccinated (1.27% vs 2.92%; RR 0.44 (0.42-0.45); p\u3c0.001) and vaccinated (0.65% vs 1.19%; RR 0.54 (0.53-0.55); p\u3c0.001) populations of ≥ 18 years of age during the same period. We observed negative vaccine effectiveness for the third dose since December 20, 2021, with a significantly increased proportion of SARS-CoV2 cases hospitalizations, and deaths among the vaccinated; and a decreased proportion of cases, hospitalizations, and deaths among the unvaccinated. The preexisting conditions were present in 95.6% of all COVID-19 deaths. We also observed various ethnicity, deprivation score, and vaccination rate disparities that can adversely affect hospitalizations and deaths among the compared groups based on the vaccination status. Conclusion: There is no discernable optimal vaccine effectiveness among ≥ 18 years of age and vaccinated third dose population since the beginning (December 20, 2021) of the Omicron variant surge. Other data including preexisting conditions, ethnicity, deprivation score, and vaccination rate disparities need to be adjusted by developing validated models for evaluating vaccine effectiveness against hospitalizations and deaths. Both the vaccinated and unvaccinated populations showed favorable outcomes during the Omicron variant surge. The increased proportion of cases among the vaccinated population with suboptimal vaccine effectiveness was associated with a significantly increased proportion of hospitalizations and deaths during the Omicron variant surge. This underscores the need to prevent infections, especially in the elderly vaccinated population irrespective of vaccination status by employing uniform screening protocols and protective measures
    corecore