    KAPLA: Pragmatic Representation and Fast Solving of Scalable NN Accelerator Dataflow

    Dataflow scheduling decisions are of vital importance to neural network (NN) accelerators. Recent scalable NN accelerators support a rich set of advanced dataflow techniques. The problems of comprehensively representing and quickly finding optimized dataflow schemes thus become significantly more complicated and challenging. In this work, we first propose comprehensive and pragmatic dataflow representations for temporal and spatial scheduling on scalable multi-node NN architectures. An informal hierarchical taxonomy highlights the tight coupling across different levels of the dataflow space as the major difficulty for fast design exploration. A set of formal tensor-centric directives accurately express various inter-layer and intra-layer schemes, and allow for quickly determining their validity and efficiency. We then build a generic, optimized, and fast dataflow solver, KAPLA, which makes use of the pragmatic directives to explore the design space with effective validity check and efficiency estimation. KAPLA decouples the upper inter-layer level for fast pruning, and solves the lower intra-layer schemes with a novel bottom-up cost descending method. KAPLA achieves within only 2.2% and 7.7% energy overheads on the result dataflow for training and inference, respectively, compared to the exhaustively searched optimal schemes. It also outperforms random and machine-learning-based approaches, with more optimized results and orders of magnitude faster search speedup

    Canvas: End-to-End Kernel Architecture Search in Neural Networks

    The demands for higher performance and accuracy in neural networks (NNs) never end. Existing tensor compilation and Neural Architecture Search (NAS) techniques orthogonally optimize the two goals but actually share many similarities in their concrete strategies. We exploit such opportunities by combining the two into one and make a case for Kernel Architecture Search (KAS). KAS reviews NAS from a system perspective and zooms into a more fine-grained level to generate neural kernels with both high performance and good accuracy. To demonstrate the potential of KAS, we build an end-to-end framework, Canvas, to find high-quality kernels as convolution replacements. Canvas samples from a rich set of fine-grained primitives to stochastically and iteratively construct new kernels and evaluate them according to user-specified constraints. Canvas supports freely adjustable tensor dimension sizes inside the kernel and uses two levels of solvers to satisfy structural legality and fully utilize model budgets. The evaluation shows that by replacing standard convolutions with generated new kernels in common NNs, Canvas achieves average 1.5x speedups compared to the previous state-of-the-art with acceptable accuracy loss and search efficiency. Canvas verifies the practicability of KAS by rediscovering many manually designed kernels in the past and producing new structures that may inspire future machine learning innovations. For source code and implementation, we open-sourced Canvas at https://github.com/tsinghua-ideal/Canvas

    Hybrid algorithms to solve linear systems of equations with limited qubit resources

    The solution of linear systems of equations is a very frequent operation and thus important in many fields. The complexity using classical methods increases linearly with the size of equations. The HHL algorithm proposed by Harrow et al. achieves exponential acceleration compared with the best classical algorithm. However, it has a relatively high demand for qubit resources and the solution ∣x⟩\left| x \right\rangle is in a normalized form. Assuming that the eigenvalues of the coefficient matrix of the linear systems of equations can be represented perfectly by finite binary number strings, three hybrid iterative phase estimation algorithms (HIPEA) are designed based on the iterative phase estimation algorithm in this paper. The complexity is transferred to the measurement operation in an iterative way, and thus the demand of qubit resources is reduced in our hybrid algorithms. Moreover, the solution is stored in a classical register instead of a quantum register, so the exact unnormalized solution can be obtained. The required qubit resources in the three HIPEA algorithms are different. HIPEA-1 only needs one single ancillary qubit. The number of ancillary qubits in HIPEA-2 is equal to the number of nondegenerate eigenvalues of the coefficient matrix of linear systems of equations. HIPEA-3 is designed with a flexible number of ancillary qubits. The HIPEA algorithms proposed in this paper broadens the application range of quantum computation in solving linear systems of equations by avoiding the problem that quantum programs may not be used to solve linear systems of equations due to the lack of qubit resources.Comment: 22 pages, 6 figures, 6 tables, 48 equation

    GLT-T: Global-Local Transformer Voting for 3D Single Object Tracking in Point Clouds

    Current 3D single object tracking methods are typically based on VoteNet, a 3D region proposal network. Despite the success, using a single seed point feature as the cue for offset learning in VoteNet prevents high-quality 3D proposals from being generated. Moreover, seed points with different importance are treated equally in the voting process, aggravating this defect. To address these issues, we propose a novel global-local transformer voting scheme to provide more informative cues and guide the model pay more attention on potential seed points, promoting the generation of high-quality 3D proposals. Technically, a global-local transformer (GLT) module is employed to integrate object- and patch-aware prior into seed point features to effectively form strong feature representation for geometric positions of the seed points, thus providing more robust and accurate cues for offset learning. Subsequently, a simple yet effective training strategy is designed to train the GLT module. We develop an importance prediction branch to learn the potential importance of the seed points and treat the output weights vector as a training constraint term. By incorporating the above components together, we exhibit a superior tracking method GLT-T. Extensive experiments on challenging KITTI and NuScenes benchmarks demonstrate that GLT-T achieves state-of-the-art performance in the 3D single object tracking task. Besides, further ablation studies show the advantages of the proposed global-local transformer voting scheme over the original VoteNet. Code and models will be available at https://github.com/haooozi/GLT-T.Comment: Accepted to AAAI 2023. The source code and models will be available at https://github.com/haooozi/GLT-

    Model Design on Emergency Power Supply of Electric Vehicle

    According to the mobile storage characteristic of electric vehicles, an emergency power supply model about the electric vehicles is presented through analyzing its storage characteristic. The model can ensure important consumer loss minimization during power failure or emergency and can make electric vehicles cost minimization about running, scheduling, and vindicating. In view of the random dispersion feature in one area, an emergency power supply scheme using the electric vehicles is designed based on the K-means algorithm. The purpose is to improve the electric vehicles initiative gathering ability and reduce the electric vehicles gathering time. The study can reduce the number of other emergency power supply equipment and improve the urban electricity reliability
