21 research outputs found
EFFICIENTLY ACCELERATING SPARSE PROBLEMS BY ENABLING STREAM ACCESSES TO MEMORY USING HARDWARE/SOFTWARE TECHNIQUES
The objective of this research is to improve the performance of sparse problems that have a wide range of applications but still, suffer from serious challenges when running on modern computers. In summary, the challenges include the underutilization of available memory bandwidth because of lack of spatial locality, dependencies in computation, or slow mechanisms for decompressing the sparse data, and the underutilization of concurrent compute engines because of the distribution of non-zero values in sparse data. Our key insight to address the aforementioned challenges is that based on the type of the problem, we either use an intelligent reduction tree near memory to process data while gathering them from random locations of memory, transform the computations mathematically to extract more parallelism, modify the distribution of non-zero elements, or change the representation of sparse data. By applying such techniques, the execution adapts more effectively to given hardware resources. To this end, this research introduces hardware/software techniques to enable stream accesses to memory for accelerating four main categories of sparse problems including the inference of recommendation systems, iterative solvers of partial differential equations (PDEs), deep neural networks (DNNs), and graph algorithms.Ph.D
An Effiecient Approach for Resource Auto-Scaling in Cloud Environments
Cloud services have become more popular among users these days. Automatic resource provisioning for cloud services is one of the important challenges in cloud environments. In the cloud computing environment, resource providers shall offer required resources to users automatically without any limitations. It means whenever a user needs more resources, the required resources should be dedicated to the users without any problems. On the other hand, if resources are more than user’s needs extra resources should be turn off temporarily and turn back on whenever they needed. In this paper, we propose an automatic resource provisioning approach based on reinforcement learning for auto-scaling resources according to Markov Decision Process (MDP). Simulation Results show that the rate of Service Level Agreement (SLA) violation and stability that the proposed approach better performance compared to the similar approaches
Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube
Three-dimensional (3D)-stacking technology, which enables the integration of
DRAM and logic dies, offers high bandwidth and low energy consumption. This
technology also empowers new memory designs for executing tasks not
traditionally associated with memories. A practical 3D-stacked memory is Hybrid
Memory Cube (HMC), which provides significant access bandwidth and low power
consumption in a small area. Although several studies have taken advantage of
the novel architecture of HMC, its characteristics in terms of latency and
bandwidth or their correlation with temperature and power consumption have not
been fully explored. This paper is the first, to the best of our knowledge, to
characterize the thermal behavior of HMC in a real environment using the AC-510
accelerator and to identify temperature as a new limitation for this
state-of-the-art design space. Moreover, besides bandwidth studies, we
deconstruct factors that contribute to latency and reveal their sources for
high- and low-load accesses. The results of this paper demonstrates essential
behaviors and performance bottlenecks for future explorations of
packet-switched and 3D-stacked memories.Comment: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-
Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube
Memories that exploit three-dimensional (3D)-stacking technology, which
integrate memory and logic dies in a single stack, are becoming popular. These
memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC)
design for connecting their internal structural organizations. This novel usage
of NoC, in addition to aiding processing-in-memory capabilities, enables
numerous benefits such as high bandwidth and memory-level parallelism. However,
the implications of NoCs on the characteristics of 3D-stacked memories in terms
of memory access latency and bandwidth have not been fully explored. This paper
addresses this knowledge gap by (i) characterizing an HMC prototype on the
AC-510 accelerator board and revealing its access latency behaviors, and (ii)
by investigating the implications of such behaviors on system and software
designs
Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads
Sparse matrices are the key ingredients of several application domains, from
scientific computation to machine learning. The primary challenge with sparse
matrices has been efficiently storing and transferring data, for which many
sparse formats have been proposed to significantly eliminate zero entries. Such
formats, essentially designed to optimize memory footprint, may not be as
successful in performing faster processing. In other words, although they allow
faster data transfer and improve memory bandwidth utilization -- the classic
challenge of sparse problems -- their decompression mechanism can potentially
create a computation bottleneck. Not only is this challenge not resolved, but
also it becomes more serious with the advent of domain-specific architectures
(DSAs), as they intend to more aggressively improve performance. The
performance implications of using various formats along with DSAs, however, has
not been extensively studied by prior work. To fill this gap of knowledge, we
characterize the impact of using seven frequently used sparse formats on
performance, based on a DSA for sparse matrix-vector multiplication (SpMV),
implemented on an FPGA using high-level synthesis (HLS) tools, a growing and
popular method for developing DSAs. Seeking a fair comparison, we tailor and
optimize the HLS implementation of decompression for each format. We thoroughly
explore diverse metrics, including decompression overhead, latency, balance
ratio, throughput, memory bandwidth utilization, resource utilization, and
power consumption, on a variety of real-world and synthetic sparse workloads.Comment: 11 pages, 14 figures, 2 table
Mapping local patterns of childhood overweight and wasting in low- and middle-income countries between 2000 and 2017
A double burden of malnutrition occurs when individuals, household members or communities experience both undernutrition and overweight. Here, we show geospatial estimates of overweight and wasting prevalence among children under 5 years of age in 105 low- and middle-income countries (LMICs) from 2000 to 2017 and aggregate these to policy-relevant administrative units. Wasting decreased overall across LMICs between 2000 and 2017, from 8.4% (62.3 (55.1–70.8) million) to 6.4% (58.3 (47.6–70.7) million), but is predicted to remain above the World Health Organization’s Global Nutrition Target of <5% in over half of LMICs by 2025. Prevalence of overweight increased from 5.2% (30 (22.8–38.5) million) in 2000 to 6.0% (55.5 (44.8–67.9) million) children aged under 5 years in 2017. Areas most affected by double burden of malnutrition were located in Indonesia, Thailand, southeastern China, Botswana, Cameroon and central Nigeria. Our estimates provide a new perspective to researchers, policy makers and public health agencies in their efforts to address this global childhood syndemic
Mapping local patterns of childhood overweight and wasting in low- and middle-income countries between 2000 and 2017
A double burden of malnutrition occurs when individuals, household members or communities experience both undernutrition and overweight. Here, we show geospatial estimates of overweight and wasting prevalence among children under 5 years of age in 105 low- and middle-income countries (LMICs) from 2000 to 2017 and aggregate these to policy-relevant administrative units. Wasting decreased overall across LMICs between 2000 and 2017, from 8.4% (62.3 (55.1–70.8) million) to 6.4% (58.3 (47.6–70.7) million), but is predicted to remain above the World Health Organization’s Global Nutrition Target of <5% in over half of LMICs by 2025. Prevalence of overweight increased from 5.2% (30 (22.8–38.5) million) in 2000 to 6.0% (55.5 (44.8–67.9) million) children aged under 5 years in 2017. Areas most affected by double burden of malnutrition were located in Indonesia, Thailand, southeastern China, Botswana, Cameroon and central Nigeria. Our estimates provide a new perspective to researchers, policy makers and public health agencies in their efforts to address this global childhood syndemic