6 research outputs found
A Re-ranker Scheme for Integrating Large Scale NLU models
Large scale Natural Language Understanding (NLU) systems are typically
trained on large quantities of data, requiring a fast and scalable training
strategy. A typical design for NLU systems consists of domain-level NLU modules
(domain classifier, intent classifier and named entity recognizer). Hypotheses
(NLU interpretations consisting of various intent+slot combinations) from these
domain specific modules are typically aggregated with another downstream
component. The re-ranker integrates outputs from domain-level recognizers,
returning a scored list of cross domain hypotheses. An ideal re-ranker will
exhibit the following two properties: (a) it should prefer the most relevant
hypothesis for the given input as the top hypothesis and, (b) the
interpretation scores corresponding to each hypothesis produced by the
re-ranker should be calibrated. Calibration allows the final NLU interpretation
score to be comparable across domains. We propose a novel re-ranker strategy
that addresses these aspects, while also maintaining domain specific
modularity. We design optimization loss functions for such a modularized
re-ranker and present results on decreasing the top hypothesis error rate as
well as maintaining the model calibration. We also experiment with an extension
involving training the domain specific re-rankers on datasets curated
independently by each domain to allow further asynchronization. %The proposed
re-ranker design showcases the following: (i) improved NLU performance over an
unweighted aggregation strategy, (ii) cross-domain calibrated performance and,
(iii) support for use cases involving training each re-ranker on datasets
curated by each domain independently.Comment: 7 pages, Accepted to IEEE SLT-201
Active Learning for New Domains in Natural Language Understanding
We explore active learning (AL) for improving the accuracy of new domains in
a natural language understanding (NLU) system. We propose an algorithm called
Majority-CRF that uses an ensemble of classification models to guide the
selection of relevant utterances, as well as a sequence labeling model to help
prioritize informative examples. Experiments with three domains show that
Majority-CRF achieves 6.6%-9% relative error rate reduction compared to random
sampling with the same annotation budget, and statistically significant
improvements compared to other AL approaches. Additionally, case studies with
human-in-the-loop AL on six new domains show 4.6%-9% improvement on an existing
NLU system.Comment: NAACL 201
F10-SGD: Fast Training of Elastic-net Linear Models for Text Classification and Named-entity Recognition
Voice-assistants text classification and named-entity recognition (NER)
models are trained on millions of example utterances. Because of the large
datasets, long training time is one of the bottlenecks for releasing improved
models. In this work, we develop F10-SGD, a fast optimizer for text
classification and NER elastic-net linear models. On internal datasets, F10-SGD
provides 4x reduction in training time compared to the OWL-QN optimizer without
loss of accuracy or increase in model size. Furthermore, we incorporate biased
sampling that prioritizes harder examples towards the end of the training. As a
result, in addition to faster training, we were able to obtain statistically
significant accuracy improvements for NER.
On public datasets, F10-SGD obtains 22% faster training time compared to
FastText for text classification. And, 4x reduction in training time compared
to CRFSuite OWL-QN for NER
One-vs-All Models for Asynchronous Training: An Empirical Analysis
Any given classification problem can be modeled using multi-class or
One-vs-All (OVA) architecture. An OVA system consists of as many OVA models as
the number of classes, providing the advantage of asynchrony, where each OVA
model can be re-trained independent of other models. This is particularly
advantageous in settings where scalable model training is a consideration (for
instance in an industrial environment where multiple and frequent updates need
to be made to the classification system). In this paper, we conduct empirical
analysis on realizing independent updates to OVA models and its impact on the
accuracy of the overall OVA system. Given that asynchronous updates lead to
differences in training datasets for OVA models, we first define a metric to
quantify the differences in datasets. Thereafter, using Natural Language
Understanding as a task of interest, we estimate the impact of three factors:
(i) number of classes, (ii) number of data points and, (iii) divergences in
training datasets across OVA models; on the OVA system accuracy. Finally, we
observe the accuracy impact of increased asynchrony in a Spoken Language
Understanding system. We analyze the results and establish that the proposed
metric correlates strongly with the model performances in both the experimental
settings.Comment: 5 pages, Accepted to Interspeech 201
Efficient Semi-Supervised Learning for Natural Language Understanding by Optimizing Diversity
Expanding new functionalities efficiently is an ongoing challenge for
single-turn task-oriented dialogue systems. In this work, we explore
functionality-specific semi-supervised learning via self-training. We consider
methods that augment training data automatically from unlabeled data sets in a
functionality-targeted manner. In addition, we examine multiple techniques for
efficient selection of augmented utterances to reduce training time and
increase diversity. First, we consider paraphrase detection methods that
attempt to find utterance variants of labeled training data with good coverage.
Second, we explore sub-modular optimization based on n-grams features for
utterance selection. Experiments show that functionality-specific self-training
is very effective for improving system performance. In addition, methods
optimizing diversity can reduce training data in many cases to 50% with little
impact on performance.Comment: IEEE Copyright. To appear at ASRU 201
Does Robustness Improve Fairness? Approaching Fairness with Word Substitution Robustness Methods for Text Classification
Existing bias mitigation methods to reduce disparities in model outcomes
across cohorts have focused on data augmentation, debiasing model embeddings,
or adding fairness-based optimization objectives during training. Separately,
certified word substitution robustness methods have been developed to decrease
the impact of spurious features and synonym substitutions on model predictions.
While their end goals are different, they both aim to encourage models to make
the same prediction for certain changes in the input. In this paper, we
investigate the utility of certified word substitution robustness methods to
improve equality of odds and equality of opportunity on multiple text
classification tasks. We observe that certified robustness methods improve
fairness, and using both robustness and bias mitigation methods in training
results in an improvement in both front