Search CORE

402 research outputs found

Workload-aware Scheduling Techniques for General Purpose Applications on Graphics Processing Units

Author: Awatramani Mihir
Publication venue: Iowa State University Digital Repository
Publication date: 01/01/2017
Field of study

In the last decade, there has been a wide scale adoption of Graphics Processing Units (GPUs) as a co-processor for accelerating data-parallel general purpose applications. A primary driver of this adoption is that GPUs offer orders of magnitude higher floating point arithmetic throughput and memory bandwidth compared to their CPU counterparts. As GPU architectures are designed as throughput processors, they adopt a manycore architecture with 10 to 100s of cores, each with multiple vector processing pipelines. A significant amount of the die area is dedicated to floating point units, at the expense of not having hardware units used for memory latency hiding in conventional CPU architectures. The quintessential technique used for memory latency tolerance is exploiting data-level parallelism in the workload, and interleaving execution of multiple SIMD threads, to overlap the latency of threads waiting on data from memory with computation from other threads. With each architecture generation, GPU architectures are providing an increasing amount of floating point throughput and memory bandwidth. Alongside, the architectures support an increasing number of simultaneously active threads. We envision that to continue making advancements in GPU computing, workload-aware scheduling techniques are required. In the GPU computing work flow, scheduling is performed at three levels - the system or chip level, the core level and the thread level. The work proposed in the research aims at designing novel workload aware scheduling techniques at each of the three levels of scheduling. We show that GPU computing workloads have significantly varying characteristics, and design techniques that monitor the hardware state to aide at each of the three levels of scheduling. Each technique is implemented in a cycle level GPU architecture simulator, and their effect on performance is analyzed against state of the art scheduling techniques used in GPU architectures

Digital Repository @ Iowa State University (ISU)

Recommended from our members

On the Military’s Normative Commitment to Assist in Revolution

Author: Awatramani Sebastian
Publication venue: University of Colorado Boulder
Publication date: 01/01/2018
Field of study

Locke held that the People reserve a right of revolution in the case that a government egregiously violates the social contract to such an extent that it renders itself illegitimate. At the time of his writing, revolution was a potentially feasible endeavor insofar as the technological war making capabilities of a citizenry were on, or at least on something close to, equal footing with those of the government. As technology advanced over the following centuries, however, a disparity between the war making capability of advanced states and their citizenry arose, wherein the governments of modern, advanced states gained access to weaponry, tracking capability, and a violence apparatus that far outrivaled the capabilities of the citizenry, in essence rendering Locke’s right of revolution impracticable though the sort of means Locke might have envisioned. Assuming that the People do continue to hold a right of revolution, how can it be reconciled with its own impracticability in the modern era

CU Scholar Institutional Repository

Classification-driven search for effective sm partitioning in multitasking GPUs

Author: Awatramani M.
Grauer-Gray S.
Jadidi A.
Li X.
Suzuki Y.
Vijaykumar N.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Graphics processing units (GPUs) feature an increasing number of streaming multiprocessors (SMs) with each successive generation. At the same time, GPUs are increasingly widely adopted in cloud services and data centers to accelerate general-purpose workloads. Running multiple applications on a GPU in such environments requires effective multitasking support. Spatial multitasking in which independent applications co-execute on different sets of SMs is a promising solution to share GPU resources. Unfortunately, how to effectively partition SMs is an open problem. In this paper, we observe that compared to widely-used even partitioning, dynamic SM partitioning based on the characteristics of the co-executing applications can significantly improve performance and power efficiency. Unfortunately, finding an effective SM partition is challenging because the number of possible combinations increases exponentially with the number of SMs and co-executing applications. Through offline analysis, we find that first classifying workloads, and then searching an effective SM partition based on the workload characteristics can significantly reduce the search space, making dynamic SM partitioning tractable. Based on these insights, we propose Classification-Driven search (CD-search) for low-overhead dynamic SM partitioning in multitasking GPUs. CD-search first classifies workloads using a novel off-SM bandwidth model, after which it enters the performance mode or power mode depending on the workload's characteristics. Both modes follow a specific search strategy to quickly determine the optimum SM partition. Our evaluation shows that CD-search improves system throughput by 10.4% on average (and up to 62.9%) over even partitioning for workloads that are classified for the performance mode. For workloads classified for the power mode, CD-search reduces power consumption by 25% on average (and up to 41.2%). CD-search incurs limited runtime overhead

Crossref

Ghent University Academic Bibliography

Defining Midbrain Dopaminergic Neuron Diversity by Single-Cell Gene Expression Profiling

Author: Awatramani Rajeshwar B.
Cicchetti Francesca
Drouin-Ouellet Janelle
Kim Kwang-Youn A.
Poulin Jean-Francois
Zou Jian
Publication venue: The Authors. Published by Elsevier Inc.
Publication date: 06/11/2014
Field of study

SummaryEffective approaches to neuropsychiatric disorders require detailed understanding of the cellular composition and circuitry of the complex mammalian brain. Here, we present a paradigm for deconstructing the diversity of neurons defined by a specific neurotransmitter using a microfluidic dynamic array to simultaneously evaluate the expression of 96 genes in single neurons. With this approach, we successfully identified multiple molecularly distinct dopamine neuron subtypes and localized them in the adult mouse brain. To validate the anatomical and functional correlates of molecular diversity, we provide evidence that one Vip+ subtype, located in the periaqueductal region, has a discrete projection field within the extended amygdala. Another Aldh1a1+ subtype, located in the substantia nigra, is especially vulnerable in the 1-methyl-4-phenyl-1,2,3,6-tetrahydropyridine (MPTP) model of Parkinson’s disease. Overall, this rapid, cost-effective approach enables the identification and classification of multiple dopamine neuron subtypes, with distinct molecular, anatomical, and functional properties

Elsevier - Publisher Connector

Directory of Open Access Journals

PubMed Central

Parallel Mechanisms Encode Direction in the Retina

Author: Awatramani Gautam B.
Johnson Kyle
Li Xiao
Smith Robert G.
Trenholm Stuart
Publication venue: Elsevier Inc.
Publication date: 09/01/2013
Field of study

Elsevier - Publisher Connector

Post-tetanic potentiation involves the presynaptic binding of calcium to calmodulin

Author: Atwood
Awatramani
Borst
Feng
Forsythe
Geetha Srinivasan
Habets
Habets
Henrique von Gersdorff
Kohama
Korogod
Korogod
Lee
Lee
Mochida
Neher
Rosenthal
Sakaba
Srinivasan
Veeramuthu Balakrishnan
von Gersdorff
Zucker
Publication venue: The Rockefeller University Press
Publication date
Field of study

Crossref

PubMed Central

Reliability of Synaptic Transmission at the Synapses of Held In Vivo under Acoustic Stimulation

BACKGROUND:The giant synapses of Held play an important role in high-fidelity auditory processing and provide a model system for synaptic transmission at central synapses. Whether transmission of action potentials can fail at these synapses has been investigated in recent studies. At the endbulbs of Held in the anteroventral cochlear nucleus (AVCN) a consistent picture emerged, whereas at the calyx of Held in the medial nucleus of the trapezoid body (MNTB) results on the reliability of transmission remain inconsistent. In vivo this discrepancy could be due to the difficulty in identifying failures of transmission. METHODS/FINDINGS:We introduce a novel method for detecting unreliable transmission in vivo. Based on the temporal relationship between a cells' waveform and other potentials in the recordings, a statistical test is developed that provides a balanced decision between the presence and the absence of failures. Its performance is quantified using simulated voltage recordings and found to exhibit a high level of accuracy. The method was applied to extracellular recordings from the synapses of Held in vivo. At the calyces of Held failures of transmission were found only rarely. By contrast, at the endbulbs of Held in the AVCN failures were found under spontaneous, excited, and suppressed conditions. In accordance with previous studies, failures occurred most abundantly in the suppressed condition, suggesting a role for inhibition. CONCLUSIONS/SIGNIFICANCE:Under the investigated activity conditions/anesthesia, transmission seems to remain largely unimpeded in the MNTB, whereas in the AVCN the occurrence of failures is related to inhibition and could be the basis/result of computational mechanisms for temporal processing. More generally, our approach provides a formal tool for studying the reliability of transmission with high statistical accuracy under typical in vivo recording conditions

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

Crossover inhibition generates sustained visual responses in the inner retina

Author: Asari
Awatramani
Baccus
Baden
Baden
Borghuis
Buldyrev
Burkhardt
Burrone
Burrone
Cafaro
Caldwell
Cleland
Connaughton
de Monasterio
Demb
DeVries
Dorostkar
Dreosti
Dreosti
Dreosti
Eggers
Endeman
Esposti
Hoshi
Hsueh
Huayu Ding
Juliana M. Rosa
Jusuf
Lee
Leon Lagnado
Levick
Li
Manookin
Masland
Masland
Mastronarde
Molnar
Molnar
Nikolaev
Nikolaou
Nüsslein-Volhard
Odermatt
Pang
Pologruto
Protti
Puthussery
Rodieck
Roska
Sabine Ruehle
Schnapf
Sivyer
Tallini
Umino
Werblin
Werblin
Wässle
Wässle
Publication venue: 'Elsevier BV'
Publication date: 01/04/2016
Field of study

In daylight, the input to the retinal circuit is provided primarily by cone photoreceptors acting as band-pass filters, but the retinal output also contains neuronal populations transmitting sustained signals. Using in vivo imaging of genetically encoded calcium reporters, we investigated the circuits that generate these sustained channels within the inner retina of zebrafish. In OFF bipolar cells, sustained transmission was found to depend on crossover inhibition from the ON pathway through GABAergic amacrine cells. In ON bipolar cells, the amplitude of low-frequency signals was regulated by glycinergic amacrine cells, while GABAergic inhibition regulated the gain of band-pass signals. We also provide the first functional description of a subset of sustained ON bipolar cells in which synaptic activity was suppressed by fluctuations at frequencies above ∼0.2 Hz. These results map out the basic circuitry by which the inner retina generates sustained visual signals and describes a new function of crossover inhibition

Elsevier - Publisher Connector

Crossref

PubMed Central

Sussex Research Online

Neuron–astrocyte interactions in the medial nucleus of the trapezoid body

Author: Angulo
Anja Scheller
Awatramani
Barnes-Davies
Bellamy
Bergles
Bergles
Bruno Benedetti
Christiane Nolte
Daniel Reyes-Haro
D’Ascenzo
Elezgarai
Fellin
Fellin
Forsythe
Ge
Ge
Grosche
Hamann
Hamill
Hashimoto
Helmut Kettenmann
Hirrlinger
Hoffpauir
Horiike
Isaacson
Jabs
Jochen Müller
Joshi
Jourdain
Kandler
Kang
Katz
Kim
Kuwabara
Margarethe Boresch
Matthias
Miledi
Müller
Navarrete
Nolte
Panatier
Perea
Renden
Rollenhagen
Schell
Schell
Schikorski
Schneggenburger
Serrano
Shigetomi
Stout
Syková
Sätzler
Takahashi
Tatjyana Pivneva
Turecek
von Gersdorff
Wang
Publication venue: The Rockefeller University Press
Publication date: 01/01/2010
Field of study

The calyx of Held (CoH) synapse serves as a model system to analyze basic mechanisms of synaptic transmission. Astrocyte processes are part of the synaptic structure and contact both pre- and postsynaptic membranes. In the medial nucleus of the trapezoid body (MNTB), midline stimulation evoked a current response that was not mediated by glutamate receptors or glutamate uptake, despite the fact that astrocytes express functional receptors and transporters. However, astrocytes showed spontaneous Ca2+ responses and neuronal slow inward currents (nSICs) were recorded in the postsynaptic principal neurons (PPNs) of the MNTB. These currents were correlated with astrocytic Ca2+ activity because dialysis of astrocytes with BAPTA abolished nSICs. Moreover, the frequency of these currents was increased when Ca2+ responses in astrocytes were elicited. NMDA antagonists selectively blocked nSICs while D-serine degradation significantly reduced NMDA-mediated currents. In contrast to previous studies in the hippocampus, these NMDA-mediated currents were rarely synchronized