With the cross-fertilization of applications and the ever-increasing scale of
models, the efficiency and productivity of hardware computing architectures
have become inadequate. This inadequacy further exacerbates issues in design
flexibility, design complexity, development cycle, and development costs (4-d
problems) in divergent scenarios. To address these challenges, this paper
proposed a flexible design flow called DIAG based on plugin techniques. The
proposed flow guides hardware development through four layers: definition(D),
implementation(I), application(A), and generation(G). Furthermore, a versatile
CGRA generator called WindMill is implemented, allowing for agile generation of
customized hardware accelerators based on specific application demands.
Applications and algorithm tasks from three aspects is experimented. In the
case of reinforcement learning algorithm, a significant performance improvement
of 2.3× compared to GPU is achieved.Comment: 7 pages, 10 figure