402 research outputs found

    Workload-aware Scheduling Techniques for General Purpose Applications on Graphics Processing Units

    Get PDF
    In the last decade, there has been a wide scale adoption of Graphics Processing Units (GPUs) as a co-processor for accelerating data-parallel general purpose applications. A primary driver of this adoption is that GPUs offer orders of magnitude higher floating point arithmetic throughput and memory bandwidth compared to their CPU counterparts. As GPU architectures are designed as throughput processors, they adopt a manycore architecture with 10 to 100s of cores, each with multiple vector processing pipelines. A significant amount of the die area is dedicated to floating point units, at the expense of not having hardware units used for memory latency hiding in conventional CPU architectures. The quintessential technique used for memory latency tolerance is exploiting data-level parallelism in the workload, and interleaving execution of multiple SIMD threads, to overlap the latency of threads waiting on data from memory with computation from other threads. With each architecture generation, GPU architectures are providing an increasing amount of floating point throughput and memory bandwidth. Alongside, the architectures support an increasing number of simultaneously active threads. We envision that to continue making advancements in GPU computing, workload-aware scheduling techniques are required. In the GPU computing work flow, scheduling is performed at three levels - the system or chip level, the core level and the thread level. The work proposed in the research aims at designing novel workload aware scheduling techniques at each of the three levels of scheduling. We show that GPU computing workloads have significantly varying characteristics, and design techniques that monitor the hardware state to aide at each of the three levels of scheduling. Each technique is implemented in a cycle level GPU architecture simulator, and their effect on performance is analyzed against state of the art scheduling techniques used in GPU architectures

    Classification-driven search for effective sm partitioning in multitasking GPUs

    Get PDF
    Graphics processing units (GPUs) feature an increasing number of streaming multiprocessors (SMs) with each successive generation. At the same time, GPUs are increasingly widely adopted in cloud services and data centers to accelerate general-purpose workloads. Running multiple applications on a GPU in such environments requires effective multitasking support. Spatial multitasking in which independent applications co-execute on different sets of SMs is a promising solution to share GPU resources. Unfortunately, how to effectively partition SMs is an open problem. In this paper, we observe that compared to widely-used even partitioning, dynamic SM partitioning based on the characteristics of the co-executing applications can significantly improve performance and power efficiency. Unfortunately, finding an effective SM partition is challenging because the number of possible combinations increases exponentially with the number of SMs and co-executing applications. Through offline analysis, we find that first classifying workloads, and then searching an effective SM partition based on the workload characteristics can significantly reduce the search space, making dynamic SM partitioning tractable. Based on these insights, we propose Classification-Driven search (CD-search) for low-overhead dynamic SM partitioning in multitasking GPUs. CD-search first classifies workloads using a novel off-SM bandwidth model, after which it enters the performance mode or power mode depending on the workload's characteristics. Both modes follow a specific search strategy to quickly determine the optimum SM partition. Our evaluation shows that CD-search improves system throughput by 10.4% on average (and up to 62.9%) over even partitioning for workloads that are classified for the performance mode. For workloads classified for the power mode, CD-search reduces power consumption by 25% on average (and up to 41.2%). CD-search incurs limited runtime overhead

    Defining Midbrain Dopaminergic Neuron Diversity by Single-Cell Gene Expression Profiling

    Get PDF
    SummaryEffective approaches to neuropsychiatric disorders require detailed understanding of the cellular composition and circuitry of the complex mammalian brain. Here, we present a paradigm for deconstructing the diversity of neurons defined by a specific neurotransmitter using a microfluidic dynamic array to simultaneously evaluate the expression of 96 genes in single neurons. With this approach, we successfully identified multiple molecularly distinct dopamine neuron subtypes and localized them in the adult mouse brain. To validate the anatomical and functional correlates of molecular diversity, we provide evidence that one Vip+ subtype, located in the periaqueductal region, has a discrete projection field within the extended amygdala. Another Aldh1a1+ subtype, located in the substantia nigra, is especially vulnerable in the 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) model of Parkinson’s disease. Overall, this rapid, cost-effective approach enables the identification and classification of multiple dopamine neuron subtypes, with distinct molecular, anatomical, and functional properties

    Reliability of Synaptic Transmission at the Synapses of Held In Vivo under Acoustic Stimulation

    Get PDF
    BACKGROUND:The giant synapses of Held play an important role in high-fidelity auditory processing and provide a model system for synaptic transmission at central synapses. Whether transmission of action potentials can fail at these synapses has been investigated in recent studies. At the endbulbs of Held in the anteroventral cochlear nucleus (AVCN) a consistent picture emerged, whereas at the calyx of Held in the medial nucleus of the trapezoid body (MNTB) results on the reliability of transmission remain inconsistent. In vivo this discrepancy could be due to the difficulty in identifying failures of transmission. METHODS/FINDINGS:We introduce a novel method for detecting unreliable transmission in vivo. Based on the temporal relationship between a cells' waveform and other potentials in the recordings, a statistical test is developed that provides a balanced decision between the presence and the absence of failures. Its performance is quantified using simulated voltage recordings and found to exhibit a high level of accuracy. The method was applied to extracellular recordings from the synapses of Held in vivo. At the calyces of Held failures of transmission were found only rarely. By contrast, at the endbulbs of Held in the AVCN failures were found under spontaneous, excited, and suppressed conditions. In accordance with previous studies, failures occurred most abundantly in the suppressed condition, suggesting a role for inhibition. CONCLUSIONS/SIGNIFICANCE:Under the investigated activity conditions/anesthesia, transmission seems to remain largely unimpeded in the MNTB, whereas in the AVCN the occurrence of failures is related to inhibition and could be the basis/result of computational mechanisms for temporal processing. More generally, our approach provides a formal tool for studying the reliability of transmission with high statistical accuracy under typical in vivo recording conditions

    Crossover inhibition generates sustained visual responses in the inner retina

    Get PDF
    In daylight, the input to the retinal circuit is provided primarily by cone photoreceptors acting as band-pass filters, but the retinal output also contains neuronal populations transmitting sustained signals. Using in vivo imaging of genetically encoded calcium reporters, we investigated the circuits that generate these sustained channels within the inner retina of zebrafish. In OFF bipolar cells, sustained transmission was found to depend on crossover inhibition from the ON pathway through GABAergic amacrine cells. In ON bipolar cells, the amplitude of low-frequency signals was regulated by glycinergic amacrine cells, while GABAergic inhibition regulated the gain of band-pass signals. We also provide the first functional description of a subset of sustained ON bipolar cells in which synaptic activity was suppressed by fluctuations at frequencies above ∼0.2 Hz. These results map out the basic circuitry by which the inner retina generates sustained visual signals and describes a new function of crossover inhibition

    Neuron–astrocyte interactions in the medial nucleus of the trapezoid body

    Get PDF
    The calyx of Held (CoH) synapse serves as a model system to analyze basic mechanisms of synaptic transmission. Astrocyte processes are part of the synaptic structure and contact both pre- and postsynaptic membranes. In the medial nucleus of the trapezoid body (MNTB), midline stimulation evoked a current response that was not mediated by glutamate receptors or glutamate uptake, despite the fact that astrocytes express functional receptors and transporters. However, astrocytes showed spontaneous Ca2+ responses and neuronal slow inward currents (nSICs) were recorded in the postsynaptic principal neurons (PPNs) of the MNTB. These currents were correlated with astrocytic Ca2+ activity because dialysis of astrocytes with BAPTA abolished nSICs. Moreover, the frequency of these currents was increased when Ca2+ responses in astrocytes were elicited. NMDA antagonists selectively blocked nSICs while D-serine degradation significantly reduced NMDA-mediated currents. In contrast to previous studies in the hippocampus, these NMDA-mediated currents were rarely synchronized
    • …
    corecore