283 research outputs found
GraphR: Accelerating Graph Processing Using ReRAM
This paper presents GRAPHR, the first ReRAM-based graph processing
accelerator. GRAPHR follows the principle of near-data processing and explores
the opportunity of performing massive parallel analog operations with low
hardware and energy cost. The analog computation is suit- able for graph
processing because: 1) The algorithms are iterative and could inherently
tolerate the imprecision; 2) Both probability calculation (e.g., PageRank and
Collaborative Filtering) and typical graph algorithms involving integers (e.g.,
BFS/SSSP) are resilient to errors. The key insight of GRAPHR is that if a
vertex program of a graph algorithm can be expressed in sparse matrix vector
multiplication (SpMV), it can be efficiently performed by ReRAM crossbar. We
show that this assumption is generally true for a large set of graph
algorithms. GRAPHR is a novel accelerator architecture consisting of two
components: memory ReRAM and graph engine (GE). The core graph computations are
performed in sparse matrix format in GEs (ReRAM crossbars). The
vector/matrix-based graph computation is not new, but ReRAM offers the unique
opportunity to realize the massive parallelism with unprecedented energy
efficiency and low hardware cost. With small subgraphs processed by GEs, the
gain of performing parallel operations overshadows the wastes due to sparsity.
The experiment results show that GRAPHR achieves a 16.01x (up to 132.67x)
speedup and a 33.82x energy saving on geometric mean compared to a CPU baseline
system. Com- pared to GPU, GRAPHR achieves 1.69x to 2.19x speedup and consumes
4.77x to 8.91x less energy. GRAPHR gains a speedup of 1.16x to 4.12x, and is
3.67x to 10.96x more energy efficiency compared to PIM-based architecture.Comment: Accepted to HPCA 201
Do Clonal Plants Show Greater Division of Labour Morphologically and Physiologically at Higher Patch Contrasts?
When growing in reciprocal patches in terms of availability of different resources, connected ramets of clonal plants will specialize to acquire and exchange locally abundant resources more efficiently. This has been termed division of labour. We asked whether division of labour can occur physiologically as well as morphologically and will increase with patch contrasts.We subjected connected and disconnected ramet pairs of Potentilla anserina to Control, Low, Medium and High patch contrast by manipulating light and nutrient levels for ramets in each pair. Little net benefit of inter-ramet connection in terms of biomass was detected. Shoot-root ratio did not differ significantly between paired ramets regardless of connection under Control, Low and Medium. Under High, however, disconnected shaded ramets with ample nutrients showed significantly larger shoot-root ratios (2.8∼6.5 fold) than fully-lit but nutrient-deficient ramets, and than their counterparts under any other treatment; conversely, fully-lit but nutrient-deficient ramets, when connected to shaded ramets with ample nutrients, had significantly larger shoot-root ratios (2.0∼4.9 fold) than the latter and than their counterparts under any other treatment. Only under High patch contrast, fully-lit ramets, if connected to shaded ones, had 8.9% higher chlorophyll content than the latter, and 22.4% higher chlorophyll content than their isolated counterparts; the similar pattern held for photosynthetic capacity under all heterogeneous treatments.Division of labour in clonal plants can be realized by ramet specialization in morphology and in physiology. However, modest ramet specialization especially in morphology among patch contrasts may suggest that division of labour will occur when the connected ramets grow in reciprocal patches between which the contrast exceeds a threshold. Probably, this threshold patch contrast is the outcome of the clone-wide cost-benefit tradeoff and is significant for risk-avoidance, especially in the disturbance-prone environments
Low-Cost Floating-Point Processing in ReRAM for Scientific Computing
We propose ReFloat, a principled approach for low-cost floating-point
processing in ReRAM. The exponent offsets based on a base are stored by a
flexible and fine-grained floating-point number representation. The key
motivation is that, while the number of exponent bits must be reduced due to
the exponential relation to the computation latency and hardware cost, the
convergence still requires sufficient accuracy for exponents. Our design
reconciles the conflicting goals by storing the exponent offsets from a common
base among matrix values in a block, which is the granularity of computation in
ReRAM. Due to the value locality, the differences among the exponents in a
block are small, thus the offsets require much less number of bits to represent
exponents. In essence, ReFloat enables the principled local fine-tuning of
floating-point representation. Based on the idea, we define a flexible ReFloat
format that specifies matrix block size, and the number of bits for exponent
and fraction. To determine the base for each block, we propose an optimization
method that minimizes the difference between the exponents of the original
matrix block and the converted block. We develop the conversion scheme from
default double-precision floating-point format to ReFloat format, the
computation procedure, and the low-cost floating-point processing architecture
in ReRAM
HyPar: Towards Hybrid Parallelism for Deep Learning Accelerator Array
With the rise of artificial intelligence in recent years, Deep Neural
Networks (DNNs) have been widely used in many domains. To achieve high
performance and energy efficiency, hardware acceleration (especially inference)
of DNNs is intensively studied both in academia and industry. However, we still
face two challenges: large DNN models and datasets, which incur frequent
off-chip memory accesses; and the training of DNNs, which is not well-explored
in recent accelerator designs. To truly provide high throughput and energy
efficient acceleration for the training of deep and large models, we inevitably
need to use multiple accelerators to explore the coarse-grain parallelism,
compared to the fine-grain parallelism inside a layer considered in most of the
existing architectures. It poses the key research question to seek the best
organization of computation and dataflow among accelerators. In this paper, we
propose a solution HyPar to determine layer-wise parallelism for deep neural
network training with an array of DNN accelerators. HyPar partitions the
feature map tensors (input and output), the kernel tensors, the gradient
tensors, and the error tensors for the DNN accelerators. A partition
constitutes the choice of parallelism for weighted layers. The optimization
target is to search a partition that minimizes the total communication during
training a complete DNN. To solve this problem, we propose a communication
model to explain the source and amount of communications. Then, we use a
hierarchical layer-wise dynamic programming method to search for the partition
for each layer.Comment: To appear in the 2019 25th International Symposium on
High-Performance Computer Architecture (HPCA 2019
The effects of daily fasting hours on shaping gut microbiota in mice
BACKGROUND: It has recently been reported that intermittent fasting shapes the gut microbiota to benefit health, but this effect may be influenced to the exact fasting protocols. The purpose of this study was to assess the effects of different daily fasting hours on shaping the gut microbiota in mice. Healthy C57BL/6 J male mice were subjected to 12, 16 or 20 h fasting per day for 1 month, and then fed ad libitum for an extended month. Gut microbiota was analyzed by 16S rRNA gene-based sequencing and food intake was recorded as well. RESULTS: We found that cumulative food intake was not changed in the group with 12 h daily fasting, but significantly decreased in the 16 and 20 h fasting groups. The composition of gut microbiota was altered by all these types of intermittent fasting. At genus level, 16 h fasting led to increased level of Akkermansia and decreased level of Alistipes, but these effects disappeared after the cessation of fasting. No taxonomic differences were identified in the other two groups. CONCLUSIONS: These data indicated that intermittent fasting shapes gut microbiota in healthy mice, and the length of daily fasting interval may influence the outcome of intermittent fasting
Nonadditive effects of litter mixtures on decomposition and correlation with initial litter N and P concentrations in grassland plant species of northern China
Abstract We studied the occurrence of nonadditive effects of litter mixtures on the decomposition (the deviation of decomposition rate of litter mixtures from the expected values based on the arithmetic means of individual litter types) of litters from three plant species (i.e., Stipa krylovii Roshev., Artemisia frigida Willd., and Allium bidentatum Fisch. ex Prokh. & Ikonn.-Gal.) endemic to the grassland ecosystems of Inner Mongolia, northern China and the possible role of initial litter N and P on such effects. We mixed litters of the same plant species that differed in N and P concentrations (four gradients for each species) in litterbags and measured mass losses of these paired mixtures after 30 and 80 days under field conditions. We found the occurrence of positive, nonadditive effects of litter mixtures and showed that the magnitude of the nonadditive effects were related to the relative difference in the initial litter N and P concentrations of the paired litters
- …