17 research outputs found

    BrainFrame: A node-level heterogeneous accelerator platform for neuron simulations

    Get PDF
    Objective. The advent of high-performance computing (HPC) in recent years has led to its increasing use in brain studies through computational models. The scale and complexity of such models are constantly increasing, leading to challenging computational requirements. Even though modern HPC platforms can often deal with such challenges, the vast diversity of the modeling field does not permit for a homogeneous acceleration platform to effectively address the complete array of modeling requirements. Approach. In this paper we propose and build BrainFrame, a heterogeneous acceleration platform that incorporates three distinct acceleration technologies, an Intel Xeon-Phi CPU

    Comparison of Psychological Distress between Type 2 Diabetes Patients with and without Proteinuria

    Get PDF
    We investigated the link between proteinuria and psychological distress among patients with type 2 diabetes mellitus (T2DM). A total of 130 patients with T2DM aged 69.1±10.3 years were enrolled in this cross-sectional study. Urine and blood parameters, age, height, body weight, and medications were analyzed, and each patient’s psychological distress was measured using the six-item Kessler Psychological Distress Scale (K6). We compared the K6 scores between the patients with and without proteinuria. Forty-two patients (32.3%) had proteinuria (≥±) and the level of HbA1c was 7.5±1.3%. The K6 scores of the patients with proteinuria were significantly higher than those of the patients without proteinuria even after adjusting for age and sex. The clinical impact of proteinuria rather than age, sex and HbA1c was demonstrated by a multiple regression analysis. Proteinuria was closely associated with higher psychological distress. Preventing and improving proteinuria may reduce psychological distress in patients with T2DM

    An Adaptive Defect-Tolerant Multiprocessor Array Architecture

    No full text
    Recent trends in transistor technology have dictated the constant reduction of device size. One negative effect stemming from the reduction in size and increased complexity is reduced reliability. This thesis is centered around the matter of fault recovery, in the subject of device fault-tolerance, and graceful system degradation in the presence of hard faults. Using a sparing strategy to re-use functional pipeline stages of faulty cores, we take advantage of the natural redundancy of multi-cores. This is done by the incorporation of a re-configurable network in which the cores of the system sit upon and has the ability to re-direct the data flow from the faulty pipeline stages of damaged cores to spare functional ones. The implementation requires the absence of global signals and thus pipeline stage operation needs to be decoupled. We also develop the bi-directional switch required for the network and implement a 4-core working example of our architecture as proof of concept and to evaluate the design. The 4-core design can guarantee correct functionality with 75% of system non-functional, in the best case scenario. The Defect-Tolerant pipeline has overhead of 1.92% in execution cycles and 14.4% in terms of operating frequency, for our custom made stress-marks. Such a system implemented with un-pipelined interconnect would lead to a pipeline with 50% lower frequency and x 2.1 longer overall execution time when the system has no faults. With our architecture and pipelined interconnect the frequency overhead is reduced by 34% and the overall execution time cost by 28% in the full 4-core system. The total execution time overhead, for our stress-marks, in the complete system ranges from x 1.5 to x 3.8 compared to the baseline, depending on the number of defects in the array. The area overhead is around 69% and power consumption, without incorporating any advanced power saving technique, is estimated between x 4 to x 5 times higher compared to the baseline.Computer EngineeringComputer EngineeringElectrical Engineering, Mathematics and Computer Scienc

    Heuristic Search for Adaptive, Defect-Tolerant Multiprocessor Arrays

    No full text
    In this article, new heuristic-search methods and algorithms are presented for enabling highly efficient and adaptive, defect-tolerant multiprocessor arrays. We consider systems where a homogeneous multiprocessor array lies on top of reconfigurable interconnects which allow the pipeline stages of the processors to be connected in all possible configurations. Considering the multiprocessor array partitioned in substitutable units at the granularity of pipeline stages, we employ a variety of heuristic-search methods and algorithms to isolate and replace defective units. The proposed heuristics are designed for off-line execution and aim at minimizing the performance overhead necessarily introduced to the array by the interconnects\u27 latency. An empirical evaluation of the designed algorithms is then carried out, in order to assess the targeted problem and the efficacy of our approach. Our findings indicate this to be a NP-complete computational problem, however, our heuristic-search methods can achieve, for the problem sizes we exhaustively searched, 100% accuracy in finding the optimal solution among 10(19) possible candidates within 2.5 seconds. Alternatively, they can provide near-optimal solutions at an accuracy which consistently exceeds 70% (compared to the optimal solution) in only 10(-4) seconds

    FPGA-based biophysically-meaningful modeling of olivocerebellar neurons

    No full text
    The Inferior-Olivary nucleus (ION) is a well-charted region of the brain, heavily associated with sensorimotor control of the body. It comprises ION cells with unique properties which facilitate sensory processing and motor-learning skills. Various simulation models of ION-cell networks have been written in an attempt to unravel their mysteries. However, simulations become rapidly intractable when biophysically plausible models and meaningful network sizes (100 cells) are modeled. To overcome this problem, in this work we port a highly detailed ION cell network model, originally coded in Matlab, onto an FPGA chip. It was first converted to ANSI C code and extensively profiled. It was, then, translated to HLS C code for the Xilinx Vivado toolflow and various algorithmic and arithmetic optimizations were applied. The design was implemented in a Virtex 7 (XC7VX485T) device and can simulate a 96-cell network at real-time speed, yielding a speedup of 700 compared to the original Matlab code and 12.5 compared to the reference C implementation running on a Intel Xeon 2.66GHz machine with 20GB RAM. For a 1,056-cell network (non-real-time), an FPGA speedup of 45 against the C code can be achieved, demonstrating the design\u27s usefulness in accelerating neuroscience research. Limited by the available on-chip memory, the FPGA can maximally support a 14,400-cell network (non-real-time) with online parameter configurability for cell state and network size. The maximum throughput of the FPGA IONnetwork accelerator can reach 2.13 GFLOPS

    GPU Implementation of Neural-Network Simulations Based on Adaptive-Exponential Models

    No full text
    Detailed brain modeling has been presenting significant challenges to the world of high-performance computing (HPC), posing computational problems that can benefit from modern hardware-acceleration technologies. We explore the capacity of GPUs for simulating large-scale neuronal networks based on the Adaptive Exponential neuron-model, which is widely used in the neuroscientific community. Our GPU-powered simulator acts as a benchmark to evaluate the strengths and limitations of modern GPUs, as well as to explore their scaling properties when simulating large neural networks. This work presents an optimized GPU implementation that outperforms a reference multicore implementation by 50x, whereas utilizing a dual-GPU configuration can deliver a speedup of 90x for networks of 20,000 fully interconnected AdEx neurons

    A dependable coarse-grain reconfigurable multicore array

    No full text
    © 2014 IEEE. Recent trends in semiconductor technology have dictated the constant reduction of device size. One negative effect stemming from the reduction in size and increased complexity is the reduced device reliability. This paper is centered around the matter of permanent fault tolerance and graceful system degradation in the presence of permanent faults. We take advantage of the natural redundancy of homogeneous multicores following a sparing strategy to reuse functional pipeline stages of faulty cores. This is done by incorporating reconfigurable interconnects next to which the cores of the system are placed, providing the flexibility to redirect the data-flow from the faulty pipeline stages of damaged cores to spare (still) functional ones. Several micro-architectural changes are introduced to decouple the processor stages and allow them to be interchangeable. The proposed approach is a clear departure from previous ones by offering full flexibility as well as highly graceful performance degradation at reasonable costs. More specifically, our coarsegrain faulttolerant multicore array provides up to ×4 better availability compared to a conventional multicore and up to ×2 higher probability to deliver at least one functioning core in high fault densities. For our benchmarks, our design (synthesized for STM 65nm SP technology) incurs a total execution-time overhead for the complete system ranging from ×1.37 to ×3.3 compared to a (baseline) non-fault-tolerant system, depending on the permanent-fault density. The area overhead is 19.5% and the energy consumption, without incorporating any power/energy- saving technique, is estimated on average to be 20.9% higher compared to the baseline, unprotected design

    Reducing the performance overhead of resilient CMPs with substitutable resources

    No full text
    Permanent faults on a chip are often tolerated using spare resources. In the past, sparing has been applied to Chip Multiprocessors (CMPs) at various granularities of substitutable units (SUs). Entire processors, pipeline stages or even individual functional units are isolated when faulty and replaced by spare ones using flexible, reconfigurable interconnects. Although spare resources increase systems fault tolerance, the extra delay imposed by the reconfigurable interconnects limits performance. In this paper, we study two options for dealing with this delay: (i) pipelining the reconfigurable interconnects and (ii) scaling down operating frequency. The former keeps a frequency close to the one of the baseline processor, but increases the number of cycles required for executing a program. The latter maintains the number of execution cycles constant, but requires a slower clock. We investigate the above performance tradeoff using an adaptive 4-core CMP design with substitutable pipeline stages. We retrieve post place and route results of different designs running two sets of benchmarks and evaluate their performance. Our experiments indicate that adding reconfigurable interconnects for wiring the SUs of a 4-core CMP pose significant delay increasing the critical path of the design almost by 3.5 times. On the other hand, pipelining the reconfigurable interconnects increases cycle time by 41% and - depending on the processor configuration - reduces performance overhead to 1.4-2.9× the execution time of the baseline

    Resilient chip multiprocessors with mixed-grained reconfigurability

    No full text
    This article presents a chip multiprocessor (CMP) design that mixes coarse- and fine-grained reconfigurability to increase core availability of safety-critical embedded systems in the presence of hard errors. The authors conducted a comprehensive design-space exploration to identify the granularity mixes that maximize CMP fault tolerance and minimize performance and energy overheads. The authors added fine-grained reconfigurable logic to a coarse-grained sparing approach. Their resulting design can tolerate 3 times more hard errors than core redundancy and 1.5 times more than any other purely coarse-grained solution

    Real-time olivary neuron simulations on dataflow computing machines

    No full text
    The Inferior-Olivary nucleus (ION) is a well-charted brain region, heavily associated with the sensorimotor control of the body. It comprises neural cells with unique properties which facilitate sensory processing and motor-learning skills. Simulations of such neurons become rapidly intractable when biophysically plausible models and meaningful network sizes (at least in the order of some hundreds of cells) are modeled. To overcome this problem, we accelerate a highly detailed ION network model using a Maxeler Dataflow Computing Machine. The design simulates a 330-cell network at real-time speed and achieves maximum throughputs of 24.7 GFLOPS. The Maxeler machine, integrating a Virtex-6 FPGA, yields speedups of 792-102, and 72-8 compared to a reference-C implementation, running on a Intel Xeon 2.66GHz, and a pure Virtex-7 FPGA implementation, respectively
    corecore