54,365 research outputs found
SCANN: Synthesis of Compact and Accurate Neural Networks
Deep neural networks (DNNs) have become the driving force behind recent
artificial intelligence (AI) research. An important problem with implementing a
neural network is the design of its architecture. Typically, such an
architecture is obtained manually by exploring its hyperparameter space and
kept fixed during training. This approach is time-consuming and inefficient.
Another issue is that modern neural networks often contain millions of
parameters, whereas many applications and devices require small inference
models. However, efforts to migrate DNNs to such devices typically entail a
significant loss of classification accuracy. To address these challenges, we
propose a two-step neural network synthesis methodology, called DR+SCANN, that
combines two complementary approaches to design compact and accurate DNNs. At
the core of our framework is the SCANN methodology that uses three basic
architecture-changing operations, namely connection growth, neuron growth, and
connection pruning, to synthesize feed-forward architectures with arbitrary
structure. SCANN encapsulates three synthesis methodologies that apply a
repeated grow-and-prune paradigm to three architectural starting points.
DR+SCANN combines the SCANN methodology with dataset dimensionality reduction
to alleviate the curse of dimensionality. We demonstrate the efficacy of SCANN
and DR+SCANN on various image and non-image datasets. We evaluate SCANN on
MNIST and ImageNet benchmarks. In addition, we also evaluate the efficacy of
using dimensionality reduction alongside SCANN (DR+SCANN) on nine small to
medium-size datasets. We also show that our synthesis methodology yields neural
networks that are much better at navigating the accuracy vs. energy efficiency
space. This would enable neural network-based inference even on
Internet-of-Things sensors.Comment: 13 pages, 8 figure
HiHGNN: Accelerating HGNNs through Parallelism and Data Reusability Exploitation
Heterogeneous graph neural networks (HGNNs) have emerged as powerful
algorithms for processing heterogeneous graphs (HetGs), widely used in many
critical fields. To capture both structural and semantic information in HetGs,
HGNNs first aggregate the neighboring feature vectors for each vertex in each
semantic graph and then fuse the aggregated results across all semantic graphs
for each vertex. Unfortunately, existing graph neural network accelerators are
ill-suited to accelerate HGNNs. This is because they fail to efficiently tackle
the specific execution patterns and exploit the high-degree parallelism as well
as data reusability inside and across the processing of semantic graphs in
HGNNs.
In this work, we first quantitatively characterize a set of representative
HGNN models on GPU to disclose the execution bound of each stage,
inter-semantic-graph parallelism, and inter-semantic-graph data reusability in
HGNNs. Guided by our findings, we propose a high-performance HGNN accelerator,
HiHGNN, to alleviate the execution bound and exploit the newfound parallelism
and data reusability in HGNNs. Specifically, we first propose a bound-aware
stage-fusion methodology that tailors to HGNN acceleration, to fuse and
pipeline the execution stages being aware of their execution bounds. Second, we
design an independency-aware parallel execution design to exploit the
inter-semantic-graph parallelism. Finally, we present a similarity-aware
execution scheduling to exploit the inter-semantic-graph data reusability.
Compared to the state-of-the-art software framework running on NVIDIA GPU T4
and GPU A100, HiHGNN respectively achieves an average 41.5 and
8.6 speedup as well as 106 and 73 energy efficiency
with quarter the memory bandwidth of GPU A100
Towards Accurate and High-Speed Spiking Neuromorphic Systems with Data Quantization-Aware Deep Networks
Deep Neural Networks (DNNs) have gained immense success in cognitive
applications and greatly pushed today's artificial intelligence forward. The
biggest challenge in executing DNNs is their extremely data-extensive
computations. The computing efficiency in speed and energy is constrained when
traditional computing platforms are employed in such computational hungry
executions. Spiking neuromorphic computing (SNC) has been widely investigated
in deep networks implementation own to their high efficiency in computation and
communication. However, weights and signals of DNNs are required to be
quantized when deploying the DNNs on the SNC, which results in unacceptable
accuracy loss. %However, the system accuracy is limited by quantizing data
directly in deep networks deployment. Previous works mainly focus on weights
discretize while inter-layer signals are mainly neglected. In this work, we
propose to represent DNNs with fixed integer inter-layer signals and
fixed-point weights while holding good accuracy. We implement the proposed DNNs
on the memristor-based SNC system as a deployment example. With 4-bit data
representation, our results show that the accuracy loss can be controlled
within 0.02% (2.3%) on MNIST (CIFAR-10). Compared with the 8-bit dynamic
fixed-point DNNs, our system can achieve more than 9.8x speedup, 89.1% energy
saving, and 30% area saving.Comment: 6 pages, 4 figure
- …