Deep learning (DL) for network models have achieved excellent performance in
the field and are becoming a promising component in future intelligent network
system. Programmable in-network computing device has great potential to deploy
DL for network models, however, existing device cannot afford to run a DL
model. The main challenges of data-plane supporting DL-based network models lie
in computing power, task granularity, model generality and feature extracting.
To address above problems, we propose Octopus: a heterogeneous in-network
computing accelerator enabling DL for network models. A feature extractor is
designed for fast and efficient feature extracting. Vector accelerator and
systolic array work in a heterogeneous collaborative way, offering
low-latency-highthroughput general computing ability for packet-and-flow-based
tasks. Octopus also contains on-chip memory fabric for storage and connecting,
and Risc-V core for global controlling. The proposed Octopus accelerator design
is implemented on FPGA. Functionality and performance of Octopus are validated
in several use-cases, achieving performance of 31Mpkt/s feature extracting,
207ns packet-based computing latency, and 90kflow/s flow-based computing
throughput