741 research outputs found
RF-Transformer: A Unified Backscatter Radio Hardware Abstraction
This paper presents RF-Transformer, a unified backscatter radio hardware
abstraction that allows a low-power IoT device to directly communicate with
heterogeneous wireless receivers at the minimum power consumption. Unlike
existing backscatter systems that are tailored to a specific wireless
communication protocol, RF-Transformer provides a programmable interface to the
micro-controller, allowing IoT devices to synthesize different types of
protocol-compliant backscatter signals sharing radically different PHY-layer
designs. To show the efficacy of our design, we implement a PCB prototype of
RF-Transformer on 2.4 GHz ISM band and showcase its capability on generating
standard ZigBee, Bluetooth, LoRa, and Wi-Fi 802.11b/g/n/ac packets. Our
extensive field studies show that RF-Transformer achieves 23.8 Mbps, 247.1
Kbps, 986.5 Kbps, and 27.3 Kbps throughput when generating standard Wi-Fi,
ZigBee, Bluetooth, and LoRa signals while consuming 7.6-74.2 less power than
their active counterparts. Our ASIC simulation based on the 65-nm CMOS process
shows that the power gain of RF-Transformer can further grow to 92-678. We
further integrate RF-Transformer with pressure sensors and present a case study
on detecting foot traffic density in hallways. Our 7-day case studies
demonstrate RFTransformer can reliably transmit sensor data to a commodity
gateway by synthesizing LoRa packets on top of Wi-Fi signals. Our experimental
results also verify the compatibility of RF-Transformer with commodity
receivers. Code and hardware schematics can be found at:
https://github.com/LeFsCC/RF-Transformer
Learning a Dual-Mode Speech Recognition Model via Self-Pruning
There is growing interest in unifying the streaming and full-context
automatic speech recognition (ASR) networks into a single end-to-end ASR model
to simplify the model training and deployment for both use cases. While in
real-world ASR applications, the streaming ASR models typically operate under
more storage and computational constraints - e.g., on embedded devices - than
any server-side full-context models. Motivated by the recent progress in
Omni-sparsity supernet training, where multiple subnetworks are jointly
optimized in one single model, this work aims to jointly learn a compact sparse
on-device streaming ASR model, and a large dense server non-streaming model, in
a single supernet. Next, we present that, performing supernet training on both
wav2vec 2.0 self-supervised learning and supervised ASR fine-tuning can not
only substantially improve the large non-streaming model as shown in prior
works, and also be able to improve the compact sparse streaming model.Comment: 7 pages, 1 figure. Accepted for publication at IEEE Spoken Language
Technology Workshop (SLT), 202
Towards Selection of Text-to-speech Data to Augment ASR Training
This paper presents a method for selecting appropriate synthetic speech
samples from a given large text-to-speech (TTS) dataset as supplementary
training data for an automatic speech recognition (ASR) model. We trained a
neural network, which can be optimised using cross-entropy loss or Arcface
loss, to measure the similarity of a synthetic data to real speech. We found
that incorporating synthetic samples with considerable dissimilarity to real
speech, owing in part to lexical differences, into ASR training is crucial for
boosting recognition performance. Experimental results on Librispeech test sets
indicate that, in order to maintain the same speech recognition accuracy as
when using all TTS data, our proposed solution can reduce the size of the TTS
data down below its , which is superior to several baseline methods
Saiyan: Design and Implementation of a Low-power Demodulator for LoRa Backscatter Systems
The radio range of backscatter systems continues growing as new wireless
communication primitives are continuously invented. Nevertheless, both the bit
error rate and the packet loss rate of backscatter signals increase rapidly
with the radio range, thereby necessitating the cooperation between the access
point and the backscatter tags through a feedback loop. Unfortunately, the
low-power nature of backscatter tags limits their ability to demodulate
feedback signals from a remote access point and scales down to such
circumstances. This paper presents Saiyan, an ultra-low-power demodulator for
long-range LoRa backscatter systems. With Saiyan, a backscatter tag can
demodulate feedback signals from a remote access point with moderate power
consumption and then perform an immediate packet retransmission in the presence
of packet loss. Moreover, Saiyan enables rate adaption and channel hopping-two
PHY-layer operations that are important to channel efficiency yet unavailable
on long-range backscatter systems. We prototype Saiyan on a two-layer PCB board
and evaluate its performance in different environments. Results show that
Saiyan achieves 5 gain on the demodulation range, compared with
state-of-the-art systems. Our ASIC simulation shows that the power consumption
of Saiyan is around 93.2 uW. Code and hardware schematics can be found at:
https://github.com/ZangJac/Saiyan
Efficient Ambient LoRa Backscatter with On-Off Keying Modulation
Backscatter communication holds potential for ubiquitous and low-cost
connectivity among low-power IoT devices. To avoid interference between the
carrier signal and the backscatter signal, recent works propose a
frequency-shifting technique to separate these two signals in the frequency
domain. Such proposals, however, have to occupy the precious wireless spectrum
that is already overcrowded, and increase the power, cost, and complexity of
the backscatter tag. In this paper, we revisit the classic ON-OFF Keying (OOK)
modulation and propose Aloba, a backscatter system that takes the ambient LoRa
transmissions as the excitation and piggybacks the in-band OOK modulated
signals over the LoRa transmissions. Our design enables the backsactter signal
to work in the same frequency band of the carrier signal, meanwhile achieving
flexible data rate at different transmission range. The key contributions of
Aloba include: (1) the design of a low-power backscatter tag that can pick up
the ambient LoRa signals from other signals. (2) a novel decoding algorithm to
demodulate both the carrier signal and the backscatter signal from their
superposition. We further adopt link coding mechanism and interleave operation
to enhance the reliability of backscatter signal decoding. We implement Aloba
and conduct head-to-head comparison with the state-of-the-art LoRa backscatter
system PLoRa in various settings. The experiment results show Aloba can achieve
199.4 Kbps data rate at various distances, 52.4 times higher than PLoRa
Folding Attention: Memory and Power Optimization for On-Device Transformer-based Streaming Speech Recognition
Transformer-based models excel in speech recognition. Existing efforts to
optimize Transformer inference, typically for long-context applications, center
on simplifying attention score calculations. However, streaming speech
recognition models usually process a limited number of tokens each time, making
attention score calculation less of a bottleneck. Instead, the bottleneck lies
in the linear projection layers of multi-head attention and feedforward
networks, constituting a substantial portion of the model size and contributing
significantly to computation, memory, and power usage.
To address this bottleneck, we propose folding attention, a technique
targeting these linear layers, significantly reducing model size and improving
memory and power efficiency. Experiments on on-device Transformer-based
streaming speech recognition models show that folding attention reduces model
size (and corresponding memory consumption) by up to 24% and power consumption
by up to 23%, all without compromising model accuracy or computation overhead
Towards General-Purpose Speech Abilities for Large Language Models Using Unpaired Data
In this work, we extend the instruction-tuned Llama-2 model with end-to-end
general-purpose speech processing and reasoning abilities while maintaining the
wide range of LLM capabilities, without using any carefully curated paired
data. The proposed model can utilize audio prompts as a replacement for text
and sustain a conversation. Such a model also has extended cross-modal
capabilities such as being able to perform speech question answering, speech
translation, and audio summarization amongst many other closed and open-domain
tasks. This is unlike prior approaches in speech, in which LLMs are extended to
handle audio for a limited number of pre-designated tasks. Experiments show
that our end-to-end approach is on par with or outperforms a cascaded system
(speech recognizer + LLM) in terms of modeling the response to a prompt.
Furthermore, unlike a cascade, our approach shows the ability to interchange
text and audio modalities and utilize the prior context in a conversation to
provide better results
- …