38 research outputs found
An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks
Edge TPUs are a domain of accelerators for low-power, edge devices and are
widely used in various Google products such as Coral and Pixel devices. In this
paper, we first discuss the major microarchitectural details of Edge TPUs.
Then, we extensively evaluate three classes of Edge TPUs, covering different
computing ecosystems, that are either currently deployed in Google products or
are the product pipeline, across 423K unique convolutional neural networks.
Building upon this extensive study, we discuss critical and interpretable
microarchitectural insights about the studied classes of Edge TPUs. Mainly, we
discuss how Edge TPU accelerators perform across convolutional neural networks
with different structures. Finally, we present our ongoing efforts in
developing high-accuracy learned machine learning models to estimate the major
performance metrics of accelerators such as latency and energy consumption.
These learned models enable significantly faster (in the order of milliseconds)
evaluations of accelerators as an alternative to time-consuming cycle-accurate
simulators and establish an exciting opportunity for rapid hard-ware/software
co-design.Comment: 11 pages, 15 figures, submitted to ISCA 202
Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask
Sparsity has become one of the promising methods to compress and accelerate
Deep Neural Networks (DNNs). Among different categories of sparsity, structured
sparsity has gained more attention due to its efficient execution on modern
accelerators. Particularly, N:M sparsity is attractive because there are
already hardware accelerator architectures that can leverage certain forms of
N:M structured sparsity to yield higher compute-efficiency. In this work, we
focus on N:M sparsity and extensively study and evaluate various training
recipes for N:M sparsity in terms of the trade-off between model accuracy and
compute cost (FLOPs). Building upon this study, we propose two new decay-based
pruning methods, namely "pruning mask decay" and "sparse structure decay". Our
evaluations indicate that these proposed methods consistently deliver
state-of-the-art (SOTA) model accuracy, comparable to unstructured sparsity, on
a Transformer-based model for a translation task. The increase in the accuracy
of the sparse model using the new training recipes comes at the cost of
marginal increase in the total training compute (FLOPs).Comment: 11 pages, 2 figures, and 9 tables. Published at the ICML Workshop on
Sparsity in Neural Networks Advancing Understanding and Practice, 2022. First
two authors contributed equall
The Effect of Magnesium Sulfate on Renal Colic Pain Relief; a Randomized Clinical Trial
Introduction: Renal colic can be managed by preventing the contraction movements of ureter muscles. By reducing acetylcholine in the nerve terminals, magnesium sulfate could be effective in this regard. The aim of this study is to investigate the effect of magnesium sulfate on acute renal colic pain relief. Method: The present study was a double-blind clinical trial in which the patients suffering from acute renal colic were randomly divided into 2 groups of who either received standard protocol (intravenous infusion of 0.1 mg/Kg morphine sulfate, 30 mg of Ketorolac, and 100 ml normal saline as placebo/15 minutes) or standard protocol plus 15 mg/Kg of intravenous magnesium sulfate 50%/100 ml normal saline/15 minutes. Severity of patients’ pain was measured by visual analogue scale (VAS) at baseline, and 30 and 60 minutes after infusion. The collected data were analyzed using STATA statistical software. Results: 100 cases were randomly allocated to intervention or control group. The two groups were similar in baseline pain score and demographic characteristics. At 30 and 60 minutes, mean pain score was less in the intervention group compared to the control group. Moreover, the difference between the two groups was statistically significant regarding the additional amount of morphine, suggesting that the intervention group needed less additional morphine than the control group. Conclusion: The results of this study showed that Magnesium sulfate can be used as an adjunct drug in treatment of patients suffering from renal colic. It not only alleviates the pain in the patients, but also diminishes the need for pain medications
AxBench: A Benchmark Suite for Approximate Computing Across the System Stack
Research areas: Approximate computing, Computer architectureAs the end of Dennard scaling looms, both the semiconductor
industry and the research community are exploring for innovative
solutions that allow energy efficiency and performance to continue
to scale. Approximation computing has become one of the
viable techniques to perpetuate the historical improvements in the
computing landscape. As approximate computing attracts more
attention in the community, having a general, diverse, and representative
set of benchmarks to evaluate different approximation
techniques becomes necessary. In this paper, we develop and introduce AxBench, a general,
diverse and representative multi-framework set of benchmarks
for CPUs, GPUs, and hardware design with the total number of
29 benchmarks. We judiciously select and develop each benchmark
to cover a diverse set of domains such as machine learning,
scientific computation, signal processing, image processing,
robotics, and compression. AxBench comes with the necessary
annotations to mark the approximable region of code and the
application-specific quality metric to assess the output quality of
each application. AxBenchwith these set of annotations facilitate
the evaluation of different approximation techniques. To demonstrate its effectiveness, we evaluate three previously
proposed approximation techniques using AxBench benchmarks:
loop perforation [1] and neural processing units (NPUs) [2–4] on
CPUs and GPUs, and Axilog [5] on dedicated hardware. We find
that (1) NPUs offer higher performance and energy efficiency as
compared to loop perforation on both CPUs and GPUs, (2) while
NPUs provide considerable efficiency gains on CPUs, there still
remains significant opportunity to be explored by other approximation
techniques, (3) Unlike on CPUs, NPUs offer full benefits
of approximate computations on GPUs, and (4) considerable opportunity
remains to be explored by innovative approximate computation
techniques at the hardware level after applying Axilog
Self-Refine: Iterative Refinement with Self-Feedback
Like people, LLMs do not always generate the best text for a given generation
problem on their first try (e.g., summaries, answers, explanations). Just as
people then refine their text, we introduce SELF-REFINE, a framework for
similarly improving initial outputs from LLMs through iterative feedback and
refinement. The main idea is to generate an output using an LLM, then allow the
same model to provide multi-aspect feedback for its own output; finally, the
same model refines its previously generated output given its own feedback.
Unlike earlier work, our iterative refinement framework does not require
supervised training data or reinforcement learning, and works with a single
LLM. We experiment with 7 diverse tasks, ranging from review rewriting to math
reasoning, demonstrating that our approach outperforms direct generation. In
all tasks, outputs generated with SELF-REFINE are preferred by humans and by
automated metrics over those generated directly with GPT-3.5 and GPT-4,
improving on average by absolute 20% across tasks.Comment: Code, data, and demo at https://selfrefine.info
JaxPruner: A concise library for sparsity research
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse
training library for machine learning research. JaxPruner aims to accelerate
research on sparse neural networks by providing concise implementations of
popular pruning and sparse training algorithms with minimal memory and latency
overhead. Algorithms implemented in JaxPruner use a common API and work
seamlessly with the popular optimization library Optax, which, in turn, enables
easy integration with existing JAX based libraries. We demonstrate this ease of
integration by providing examples in four different codebases: Scenic, t5x,
Dopamine and FedJAX and provide baseline experiments on popular benchmarks.Comment: Jaxpruner is hosted at http://github.com/google-research/jaxprune