99 research outputs found
Demystifying the Characteristics of 3D-Stacked Memories: A Case Study for Hybrid Memory Cube
Three-dimensional (3D)-stacking technology, which enables the integration of
DRAM and logic dies, offers high bandwidth and low energy consumption. This
technology also empowers new memory designs for executing tasks not
traditionally associated with memories. A practical 3D-stacked memory is Hybrid
Memory Cube (HMC), which provides significant access bandwidth and low power
consumption in a small area. Although several studies have taken advantage of
the novel architecture of HMC, its characteristics in terms of latency and
bandwidth or their correlation with temperature and power consumption have not
been fully explored. This paper is the first, to the best of our knowledge, to
characterize the thermal behavior of HMC in a real environment using the AC-510
accelerator and to identify temperature as a new limitation for this
state-of-the-art design space. Moreover, besides bandwidth studies, we
deconstruct factors that contribute to latency and reveal their sources for
high- and low-load accesses. The results of this paper demonstrates essential
behaviors and performance bottlenecks for future explorations of
packet-switched and 3D-stacked memories.Comment: EEE Catalog Number: CFP17236-USB ISBN 13: 978-1-5386-1232-
Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube
Memories that exploit three-dimensional (3D)-stacking technology, which
integrate memory and logic dies in a single stack, are becoming popular. These
memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC)
design for connecting their internal structural organizations. This novel usage
of NoC, in addition to aiding processing-in-memory capabilities, enables
numerous benefits such as high bandwidth and memory-level parallelism. However,
the implications of NoCs on the characteristics of 3D-stacked memories in terms
of memory access latency and bandwidth have not been fully explored. This paper
addresses this knowledge gap by (i) characterizing an HMC prototype on the
AC-510 accelerator board and revealing its access latency behaviors, and (ii)
by investigating the implications of such behaviors on system and software
designs
Genetic characterization of chicken infectious anaemia viruses isolated in Korea and their pathogenicity in chicks
Chicken infectious anaemia virus (CIAV) causes severe anemia and immunosuppression through horizontal or vertical transmission in young chickens. Especially, vertical transmission of virus through the egg can lead to significantly economic losses due to the increased mortality in the broiler industry. Here, 28 CIAV complete sequences circulating in Korea were first characterized using the newly designed primers. Phylogenetic analysis based on the complete sequences revealed that CIAV isolates were divided into four groups, IIa (2/28, 7.1%), IIb (9/28, 32.1%), IIIa (8/28, 28.6%) and IIIb (9/28, 32.1%), and exhibited a close relationship to each other. The major groups were IIb, IIIa and IIIb, and no strains were clustered with a vaccine strain available in Korea. Also, for viral titration, we newly developed a quantitative PCR assay that is highly sensitive, reliable and simple. To investigate the pathogenicity of three major genotypes, 18R001(IIb), 08AQ017A(IIIa), and 17AD008(IIIb) isolates were challenged into one-day-old specific-pathogen-free (SPF) chicks. Each CIAV strain caused anaemia, severe growth retardation and immunosuppression in chickens regardless of CIAV genotypes. Notably, a 17AD008 strain showed stable cellular adaptability and higher virus titer in vitro as well as higher pathogenicity in vivo. Taken together, our study provides valuable information to understand molecular characterization, genetic diversity and pathogenicity of CIAV to improve management and control of CIA in poultry farm
Copernicus: Characterizing the Performance Implications of Compression Formats Used in Sparse Workloads
Sparse matrices are the key ingredients of several application domains, from
scientific computation to machine learning. The primary challenge with sparse
matrices has been efficiently storing and transferring data, for which many
sparse formats have been proposed to significantly eliminate zero entries. Such
formats, essentially designed to optimize memory footprint, may not be as
successful in performing faster processing. In other words, although they allow
faster data transfer and improve memory bandwidth utilization -- the classic
challenge of sparse problems -- their decompression mechanism can potentially
create a computation bottleneck. Not only is this challenge not resolved, but
also it becomes more serious with the advent of domain-specific architectures
(DSAs), as they intend to more aggressively improve performance. The
performance implications of using various formats along with DSAs, however, has
not been extensively studied by prior work. To fill this gap of knowledge, we
characterize the impact of using seven frequently used sparse formats on
performance, based on a DSA for sparse matrix-vector multiplication (SpMV),
implemented on an FPGA using high-level synthesis (HLS) tools, a growing and
popular method for developing DSAs. Seeking a fair comparison, we tailor and
optimize the HLS implementation of decompression for each format. We thoroughly
explore diverse metrics, including decompression overhead, latency, balance
ratio, throughput, memory bandwidth utilization, resource utilization, and
power consumption, on a variety of real-world and synthetic sparse workloads.Comment: 11 pages, 14 figures, 2 table
HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms
Research areas: Computer architecture, Programming analysisHeterogeneous computing has emerged as one of
the major computing platforms in many domains. Although
there have been several proposals to aid programming for
heterogeneous computing platforms, optimizing applications
on heterogeneous computing platforms is not an easy task.
Identifying which parallel regions (or tasks) should run on
GPUs or CPUs is one of the critical decisions to improve
performance. In this paper, we propose a profiler, HPerf, to identify
an efficient task distribution on CPUs+GPUs system with
low profiling overhead. HPerf is a hierarchical profiler. First
it performs lightweight profiling and then if necessary, it
performs detailed profiling to measure caching and data
transfer cost. Compared to a brute-force approach, HPerf
reduces the profiling overhead significantly and compared to
a naive decision, HPerf improves the performance of OpenCL
applications up to 25%
- …