270 research outputs found
KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow
Dataflow scheduling decisions are of vital importance to neural network (NN)
accelerators. Recent scalable NN accelerators support a rich set of advanced
dataflow techniques. The problems of comprehensively representing and quickly
finding optimized dataflow schemes thus become significantly more complicated
and challenging. In this work, we first propose comprehensive and pragmatic
dataflow representations for temporal and spatial scheduling on scalable
multi-node NN architectures. An informal hierarchical taxonomy highlights the
tight coupling across different levels of the dataflow space as the major
difficulty for fast design exploration. A set of formal tensor-centric
directives accurately express various inter-layer and intra-layer schemes, and
allow for quickly determining their validity and efficiency. We then build a
generic, optimized, and fast dataflow solver, KAPLA, which makes use of the
pragmatic directives to explore the design space with effective validity check
and efficiency estimation. KAPLA decouples the upper inter-layer level for fast
pruning, and solves the lower intra-layer schemes with a novel bottom-up cost
descending method. KAPLA achieves within only 2.2% and 7.7% energy overheads on
the result dataflow for training and inference, respectively, compared to the
exhaustively searched optimal schemes. It also outperforms random and
machine-learning-based approaches, with more optimized results and orders of
magnitude faster search speedup
Canvas: End-to-End Kernel Architecture Search in Neural Networks
The demands for higher performance and accuracy in neural networks (NNs)
never end. Existing tensor compilation and Neural Architecture Search (NAS)
techniques orthogonally optimize the two goals but actually share many
similarities in their concrete strategies. We exploit such opportunities by
combining the two into one and make a case for Kernel Architecture Search
(KAS). KAS reviews NAS from a system perspective and zooms into a more
fine-grained level to generate neural kernels with both high performance and
good accuracy. To demonstrate the potential of KAS, we build an end-to-end
framework, Canvas, to find high-quality kernels as convolution replacements.
Canvas samples from a rich set of fine-grained primitives to stochastically and
iteratively construct new kernels and evaluate them according to user-specified
constraints. Canvas supports freely adjustable tensor dimension sizes inside
the kernel and uses two levels of solvers to satisfy structural legality and
fully utilize model budgets. The evaluation shows that by replacing standard
convolutions with generated new kernels in common NNs, Canvas achieves average
1.5x speedups compared to the previous state-of-the-art with acceptable
accuracy loss and search efficiency. Canvas verifies the practicability of KAS
by rediscovering many manually designed kernels in the past and producing new
structures that may inspire future machine learning innovations. For source
code and implementation, we open-sourced Canvas at
https://github.com/tsinghua-ideal/Canvas
Hybrid algorithms to solve linear systems of equations with limited qubit resources
The solution of linear systems of equations is a very frequent operation and
thus important in many fields. The complexity using classical methods increases
linearly with the size of equations. The HHL algorithm proposed by Harrow et
al. achieves exponential acceleration compared with the best classical
algorithm. However, it has a relatively high demand for qubit resources and the
solution is in a normalized form. Assuming that the
eigenvalues of the coefficient matrix of the linear systems of equations can be
represented perfectly by finite binary number strings, three hybrid iterative
phase estimation algorithms (HIPEA) are designed based on the iterative phase
estimation algorithm in this paper. The complexity is transferred to the
measurement operation in an iterative way, and thus the demand of qubit
resources is reduced in our hybrid algorithms. Moreover, the solution is stored
in a classical register instead of a quantum register, so the exact
unnormalized solution can be obtained. The required qubit resources in the
three HIPEA algorithms are different. HIPEA-1 only needs one single ancillary
qubit. The number of ancillary qubits in HIPEA-2 is equal to the number of
nondegenerate eigenvalues of the coefficient matrix of linear systems of
equations. HIPEA-3 is designed with a flexible number of ancillary qubits. The
HIPEA algorithms proposed in this paper broadens the application range of
quantum computation in solving linear systems of equations by avoiding the
problem that quantum programs may not be used to solve linear systems of
equations due to the lack of qubit resources.Comment: 22 pages, 6 figures, 6 tables, 48 equation
GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds
Current 3D single object tracking methods are typically based on VoteNet, a
3D region proposal network. Despite the success, using a single seed point
feature as the cue for offset learning in VoteNet prevents high-quality 3D
proposals from being generated. Moreover, seed points with different importance
are treated equally in the voting process, aggravating this defect. To address
these issues, we propose a novel global-local transformer voting scheme to
provide more informative cues and guide the model pay more attention on
potential seed points, promoting the generation of high-quality 3D proposals.
Technically, a global-local transformer (GLT) module is employed to integrate
object- and patch-aware prior into seed point features to effectively form
strong feature representation for geometric positions of the seed points, thus
providing more robust and accurate cues for offset learning. Subsequently, a
simple yet effective training strategy is designed to train the GLT module. We
develop an importance prediction branch to learn the potential importance of
the seed points and treat the output weights vector as a training constraint
term. By incorporating the above components together, we exhibit a superior
tracking method GLT-T. Extensive experiments on challenging KITTI and NuScenes
benchmarks demonstrate that GLT-T achieves state-of-the-art performance in the
3D single object tracking task. Besides, further ablation studies show the
advantages of the proposed global-local transformer voting scheme over the
original VoteNet. Code and models will be available at
https://github.com/haooozi/GLT-T.Comment: Accepted to AAAI 2023. The source code and models will be available
at https://github.com/haooozi/GLT-
Model Design on Emergency Power Supply of Electric Vehicle
According to the mobile storage characteristic of electric vehicles, an emergency power supply model about the electric vehicles is presented through analyzing its storage characteristic. The model can ensure important consumer loss minimization during power failure or emergency and can make electric vehicles cost minimization about running, scheduling, and vindicating. In view of the random dispersion feature in one area, an emergency power supply scheme using the electric vehicles is designed based on the K-means algorithm. The purpose is to improve the electric vehicles initiative gathering ability and reduce the electric vehicles gathering time. The study can reduce the number of other emergency power supply equipment and improve the urban electricity reliability
- …