123 research outputs found
Speeding up Context-based Sentence Representation Learning with Non-autoregressive Convolutional Decoding
Context plays an important role in human language understanding, thus it may
also be useful for machines learning vector representations of language. In
this paper, we explore an asymmetric encoder-decoder structure for unsupervised
context-based sentence representation learning. We carefully designed
experiments to show that neither an autoregressive decoder nor an RNN decoder
is required. After that, we designed a model which still keeps an RNN as the
encoder, while using a non-autoregressive convolutional decoder. We further
combine a suite of effective designs to significantly improve model efficiency
while also achieving better performance. Our model is trained on two different
large unlabelled corpora, and in both cases the transferability is evaluated on
a set of downstream NLP tasks. We empirically show that our model is simple and
fast while producing rich sentence representations that excel in downstream
tasks
Non-autoregressive Transformer-based End-to-end ASR using BERT
Transformer-based models have led to a significant innovation in various
classic and practical subjects, including speech processing, natural language
processing, and computer vision. On top of the transformer, the attention-based
end-to-end automatic speech recognition (ASR) models have become a popular
fashion in recent years. Specifically, the non-autoregressive modeling, which
can achieve fast inference speed and comparable performance when compared to
conventional autoregressive methods, is an emergent research topic. In the
context of natural language processing, the bidirectional encoder
representations from transformers (BERT) model has received widespread
attention, partially due to its ability to infer contextualized word
representations and to obtain superior performances of downstream tasks by
performing only simple fine-tuning. In order to not only inherit the advantages
of non-autoregressive ASR modeling, but also receive benefits from a
pre-trained language model (e.g., BERT), a non-autoregressive transformer-based
end-to-end ASR model based on BERT is presented in this paper. A series of
experiments conducted on the AISHELL-1 dataset demonstrates competitive or
superior results of the proposed model when compared to state-of-the-art ASR
systems
Efficient Deep Speech Understanding at the Edge
In contemporary speech understanding (SU), a sophisticated pipeline is
employed, encompassing the ingestion of streaming voice input. The pipeline
executes beam search iteratively, invoking a deep neural network to generate
tentative outputs (referred to as hypotheses) in an autoregressive manner.
Periodically, the pipeline assesses attention and Connectionist Temporal
Classification (CTC) scores.
This paper aims to enhance SU performance on edge devices with limited
resources. Adopting a hybrid strategy, our approach focuses on accelerating
on-device execution and offloading inputs surpassing the device's capacity.
While this approach is established, we tackle SU's distinctive challenges
through innovative techniques: (1) Late Contextualization: This involves the
parallel execution of a model's attentive encoder during input ingestion. (2)
Pilot Inference: Addressing temporal load imbalances in the SU pipeline, this
technique aims to mitigate them effectively. (3) Autoregression Offramps:
Decisions regarding offloading are made solely based on hypotheses, presenting
a novel approach.
These techniques are designed to seamlessly integrate with existing speech
models, pipelines, and frameworks, offering flexibility for independent or
combined application. Collectively, they form a hybrid solution for edge SU.
Our prototype, named XYZ, has undergone testing on Arm platforms featuring 6 to
8 cores, demonstrating state-of-the-art accuracy. Notably, it achieves a 2x
reduction in end-to-end latency and a corresponding 2x decrease in offloading
requirements
Recommended from our members
Machine Learning Models for Efficient and Robust Natural Language Processing
Natural language processing (NLP) has come of age. For example, semantic role labeling (SRL), which automatically annotates sentences with a labeled graph representing who did what to whom, has in the past ten years seen nearly 40% reduction in error, bringing it to useful accuracy. As a result, a myriad of practitioners now want to deploy NLP systems on billions of documents across many domains. However, state-of-the-art NLP systems are typically not optimized for cross-domain robustness nor computational efficiency. In this dissertation I develop machine learning methods to facilitate fast and robust inference across many common NLP tasks.
First, I describe paired learning and inference algorithms for dynamic feature selection which accelerate inference in linear classifiers, the heart of the fastest NLP models, by 5-10 times. I then present iterated dilated convolutional neural networks (ID-CNNs), a distinct combination of network structure, parameter sharing and training procedures that increase inference speed by 14-20 times with accuracy matching bidirectional LSTMs, the most accurate models for NLP sequence labeling. Finally, I describe linguistically-informed self-attention (LISA), a neural network model that combines multi-head self-attention with multi-task learning to facilitate improved generalization to new domains. We show that incorporating linguistic structure in this way leads to substantial improvements over the previous state-of-the-art (syntax-free) neural network models for SRL, especially when evaluating out-of-domain. I conclude with a brief discussion of potential future directions stemming from my thesis work
- …