335 research outputs found
LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting
This paper proposes a hardware-efficient architecture, Linearized Convolution
Network (LiCo-Net) for keyword spotting. It is optimized specifically for
low-power processor units like microcontrollers. ML operators exhibit
heterogeneous efficiency profiles on power-efficient hardware. Given the exact
theoretical computation cost, int8 operators are more computation-effective
than float operators, and linear layers are often more efficient than other
layers. The proposed LiCo-Net is a dual-phase system that uses the efficient
int8 linear operators at the inference phase and applies streaming convolutions
at the training phase to maintain a high model capacity. The experimental
results show that LiCo-Net outperforms single-value decomposition filter (SVDF)
on hardware efficiency with on-par detection performance. Compared to SVDF,
LiCo-Net reduces cycles by 40% on HiFi4 DSP
Deep Spoken Keyword Spotting:An Overview
Spoken keyword spotting (KWS) deals with the identification of keywords in
audio streams and has become a fast-growing technology thanks to the paradigm
shift introduced by deep learning a few years ago. This has allowed the rapid
embedding of deep KWS in a myriad of small electronic devices with different
purposes like the activation of voice assistants. Prospects suggest a sustained
growth in terms of social use of this technology. Thus, it is not surprising
that deep KWS has become a hot research topic among speech scientists, who
constantly look for KWS performance improvement and computational complexity
reduction. This context motivates this paper, in which we conduct a literature
review into deep spoken KWS to assist practitioners and researchers who are
interested in this technology. Specifically, this overview has a comprehensive
nature by covering a thorough analysis of deep KWS systems (which includes
speech features, acoustic modeling and posterior handling), robustness methods,
applications, datasets, evaluation metrics, performance of deep KWS systems and
audio-visual KWS. The analysis performed in this paper allows us to identify a
number of directions for future research, including directions adopted from
automatic speech recognition research and directions that are unique to the
problem of spoken KWS
- …