4,186 research outputs found
Securing NextG networks with physical-layer key generation: A survey
As the development of next-generation (NextG) communication networks continues, tremendous devices are accessing the network and the amount of information is exploding. However, with the increase of sensitive data that requires confidentiality to be transmitted and stored in the network, wireless network security risks are further amplified. Physical-layer key generation (PKG) has received extensive attention in security research due to its solid information-theoretic security proof, ease of implementation, and low cost. Nevertheless, the applications of PKG in the NextG networks are still in the preliminary exploration stage. Therefore, we survey existing research and discuss (1) the performance advantages of PKG compared to cryptography schemes, (2) the principles and processes of PKG, as well as research progresses in previous network environments, and (3) new application scenarios and development potential for PKG in NextG communication networks, particularly analyzing the effect and prospects of PKG in massive multiple-input multiple-output (MIMO), reconfigurable intelligent surfaces (RISs), artificial intelligence (AI) enabled networks, integrated space-air-ground network, and quantum communication. Moreover, we summarize open issues and provide new insights into the development trends of PKG in NextG networks
Scaled Quantization for the Vision Transformer
Quantization using a small number of bits shows promise for reducing latency
and memory usage in deep neural networks. However, most quantization methods
cannot readily handle complicated functions such as exponential and square
root, and prior approaches involve complex training processes that must
interact with floating-point values. This paper proposes a robust method for
the full integer quantization of vision transformer networks without requiring
any intermediate floating-point computations. The quantization techniques can
be applied in various hardware or software implementations, including
processor/memory architectures and FPGAs.Comment: 9 pages, 0 figur
GNNBuilder: An Automated Framework for Generic Graph Neural Network Accelerator Generation, Simulation, and Optimization
There are plenty of graph neural network (GNN) accelerators being proposed.
However, they highly rely on users' hardware expertise and are usually
optimized for one specific GNN model, making them challenging for practical use
. Therefore, in this work, we propose GNNBuilder, the first automated, generic,
end-to-end GNN accelerator generation framework. It features four advantages:
(1) GNNBuilder can automatically generate GNN accelerators for a wide range of
GNN models arbitrarily defined by users; (2) GNNBuilder takes standard PyTorch
programming interface, introducing zero overhead for algorithm developers; (3)
GNNBuilder supports end-to-end code generation, simulation, accelerator
optimization, and hardware deployment, realizing a push-button fashion for GNN
accelerator design; (4) GNNBuilder is equipped with accurate performance models
of its generated accelerator, enabling fast and flexible design space
exploration (DSE). In the experiments, first, we show that our accelerator
performance model has errors within for latency prediction and
for BRAM count prediction. Second, we show that our generated accelerators can
outperform CPU by and GPU by . This framework is
open-source, and the code is available at
https://anonymous.4open.science/r/gnn-builder-83B4/.Comment: 10 pages, 7 figures, 4 tables, 3 listing
An evaluation of a microprocessor with two independent hardware execution threads coupled through a shared cache
We investigate the utility of augmenting a microprocessor with a single
execution pipeline by adding a second copy of the execution pipeline in
parallel with the existing one. The resulting dual-hardware-threaded
microprocessor has two identical, independent, single-issue in-order execution
pipelines (hardware threads) which share a common memory sub-system (consisting
of instruction and data caches together with a memory management unit). From a
design perspective, the assembly and verification of the dual threaded
processor is simplified by the use of existing verified implementations of the
execution pipeline and a memory unit. Because the memory unit is shared by the
two hardware threads, the relative area overhead of adding the second hardware
thread is 25\% of the area of the existing single threaded processor. Using an
FPGA implementation we evaluate the performance of the dual threaded processor
relative to the single threaded one. On applications which can be parallelized,
we observe speedups of 1.6X to 1.88X. For applications that are not
parallelizable, the speedup is more modest. We also observe that the dual
threaded processor performance is degraded on applications which generate large
numbers of cache misses
- …